1. 16 11月, 2022 1 次提交
    • D
      zonefs: fix zone report size in __zonefs_io_error() · 7dd12d65
      Damien Le Moal 提交于
      When an IO error occurs, the function __zonefs_io_error() is used to
      issue a zone report to obtain the latest zone information from the
      device. This function gets a zone report for all zones used as storage
      for a file, which is always 1 zone except for files representing
      aggregated conventional zones.
      
      The number of zones of a zone report for a file is calculated in
      __zonefs_io_error() by doing a bit-shift of the inode i_zone_size field,
      which is equal to or larger than the device zone size. However, this
      calculation does not take into account that the last zone of a zoned
      device may be smaller than the zone size reported by bdev_zone_sectors()
      (which is used to set the bit shift size). As a result, if an error
      occurs for an IO targetting such last smaller zone, the zone report will
      ask for 0 zones, leading to an invalid zone report.
      
      Fix this by using the fact that all files require a 1 zone report,
      except if the inode i_zone_size field indicates a zone size larger than
      the device zone size. This exception case corresponds to a mount with
      aggregated conventional zones.
      
      A check for this exception is added to the file inode initialization
      during mount. If an invalid setup is detected, emit an error and fail
      the mount (check contributed by Johannes Thumshirn).
      Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      7dd12d65
  2. 03 8月, 2022 1 次提交
  3. 23 7月, 2022 1 次提交
  4. 15 7月, 2022 1 次提交
  5. 07 7月, 2022 1 次提交
  6. 06 7月, 2022 1 次提交
  7. 27 6月, 2022 2 次提交
    • C
      attr: port attribute changes to new types · b27c82e1
      Christian Brauner 提交于
      Now that we introduced new infrastructure to increase the type safety
      for filesystems supporting idmapped mounts port the first part of the
      vfs over to them.
      
      This ports the attribute changes codepaths to rely on the new better
      helpers using a dedicated type.
      
      Before this change we used to take a shortcut and place the actual
      values that would be written to inode->i_{g,u}id into struct iattr. This
      had the advantage that we moved idmappings mostly out of the picture
      early on but it made reasoning about changes more difficult than it
      should be.
      
      The filesystem was never explicitly told that it dealt with an idmapped
      mount. The transition to the value that needed to be stored in
      inode->i_{g,u}id appeared way too early and increased the probability of
      bugs in various codepaths.
      
      We know place the same value in struct iattr no matter if this is an
      idmapped mount or not. The vfs will only deal with type safe
      vfs{g,u}id_t. This makes it massively safer to perform permission checks
      as the type will tell us what checks we need to perform and what helpers
      we need to use.
      
      Fileystems raising FS_ALLOW_IDMAP can't simply write ia_vfs{g,u}id to
      inode->i_{g,u}id since they are different types. Instead they need to
      use the dedicated vfs{g,u}id_to_k{g,u}id() helpers that map the
      vfs{g,u}id into the filesystem.
      
      The other nice effect is that filesystems like overlayfs don't need to
      care about idmappings explicitly anymore and can simply set up struct
      iattr accordingly directly.
      
      Link: https://lore.kernel.org/lkml/CAHk-=win6+ahs1EwLkcq8apqLi_1wXFWbrPf340zYEhObpz4jA@mail.gmail.com [1]
      Link: https://lore.kernel.org/r/20220621141454.2914719-9-brauner@kernel.org
      Cc: Seth Forshee <sforshee@digitalocean.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      CC: linux-fsdevel@vger.kernel.org
      Reviewed-by: NSeth Forshee <sforshee@digitalocean.com>
      Signed-off-by: NChristian Brauner (Microsoft) <brauner@kernel.org>
      b27c82e1
    • C
      quota: port quota helpers mount ids · 71e7b535
      Christian Brauner 提交于
      Port the is_quota_modification() and dqout_transfer() helper to type
      safe vfs{g,u}id_t. Since these helpers are only called by a few
      filesystems don't introduce a new helper but simply extend the existing
      helpers to pass down the mount's idmapping.
      
      Note, that this is a non-functional change, i.e. nothing will have
      happened here or at the end of this series to how quota are done! This
      a change necessary because we will at the end of this series make
      ownership changes easier to reason about by keeping the original value
      in struct iattr for both non-idmapped and idmapped mounts.
      
      For now we always pass the initial idmapping which makes the idmapping
      functions these helpers call nops.
      
      This is done because we currently always pass the actual value to be
      written to i_{g,u}id via struct iattr. While this allowed us to treat
      the {g,u}id values in struct iattr as values that can be directly
      written to inode->i_{g,u}id it also increases the potential for
      confusion for filesystems.
      
      Now that we are have dedicated types to prevent this confusion we will
      ultimately only map the value from the idmapped mount into a filesystem
      value that can be written to inode->i_{g,u}id when the filesystem
      actually updates the inode. So pass down the initial idmapping until we
      finished that conversion at which point we pass down the mount's
      idmapping.
      
      Since struct iattr uses an anonymous union with overlapping types as
      supported by the C standard, filesystems that haven't converted to
      ia_vfs{g,u}id won't see any difference and things will continue to work
      as before. In other words, no functional changes intended with this
      change.
      
      Link: https://lore.kernel.org/r/20220621141454.2914719-7-brauner@kernel.org
      Cc: Seth Forshee <sforshee@digitalocean.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      CC: linux-fsdevel@vger.kernel.org
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NSeth Forshee <sforshee@digitalocean.com>
      Signed-off-by: NChristian Brauner (Microsoft) <brauner@kernel.org>
      71e7b535
  8. 11 6月, 2022 1 次提交
  9. 08 6月, 2022 3 次提交
    • D
      zonefs: fix zonefs_iomap_begin() for reads · c1c1204c
      Damien Le Moal 提交于
      If a readahead is issued to a sequential zone file with an offset
      exactly equal to the current file size, the iomap type is set to
      IOMAP_UNWRITTEN, which will prevent an IO, but the iomap length is
      calculated as 0. This causes a WARN_ON() in iomap_iter():
      
      [17309.548939] WARNING: CPU: 3 PID: 2137 at fs/iomap/iter.c:34 iomap_iter+0x9cf/0xe80
      [...]
      [17309.650907] RIP: 0010:iomap_iter+0x9cf/0xe80
      [...]
      [17309.754560] Call Trace:
      [17309.757078]  <TASK>
      [17309.759240]  ? lock_is_held_type+0xd8/0x130
      [17309.763531]  iomap_readahead+0x1a8/0x870
      [17309.767550]  ? iomap_read_folio+0x4c0/0x4c0
      [17309.771817]  ? lockdep_hardirqs_on_prepare+0x400/0x400
      [17309.778848]  ? lock_release+0x370/0x750
      [17309.784462]  ? folio_add_lru+0x217/0x3f0
      [17309.790220]  ? reacquire_held_locks+0x4e0/0x4e0
      [17309.796543]  read_pages+0x17d/0xb60
      [17309.801854]  ? folio_add_lru+0x238/0x3f0
      [17309.807573]  ? readahead_expand+0x5f0/0x5f0
      [17309.813554]  ? policy_node+0xb5/0x140
      [17309.819018]  page_cache_ra_unbounded+0x27d/0x450
      [17309.825439]  filemap_get_pages+0x500/0x1450
      [17309.831444]  ? filemap_add_folio+0x140/0x140
      [17309.837519]  ? lock_is_held_type+0xd8/0x130
      [17309.843509]  filemap_read+0x28c/0x9f0
      [17309.848953]  ? zonefs_file_read_iter+0x1ea/0x4d0 [zonefs]
      [17309.856162]  ? trace_contention_end+0xd6/0x130
      [17309.862416]  ? __mutex_lock+0x221/0x1480
      [17309.868151]  ? zonefs_file_read_iter+0x166/0x4d0 [zonefs]
      [17309.875364]  ? filemap_get_pages+0x1450/0x1450
      [17309.881647]  ? __mutex_unlock_slowpath+0x15e/0x620
      [17309.888248]  ? wait_for_completion_io_timeout+0x20/0x20
      [17309.895231]  ? lock_is_held_type+0xd8/0x130
      [17309.901115]  ? lock_is_held_type+0xd8/0x130
      [17309.906934]  zonefs_file_read_iter+0x356/0x4d0 [zonefs]
      [17309.913750]  new_sync_read+0x2d8/0x520
      [17309.919035]  ? __x64_sys_lseek+0x1d0/0x1d0
      
      Furthermore, this causes iomap_readahead() to loop forever as
      iomap_readahead_iter() always returns 0, making no progress.
      
      Fix this by treating reads after the file size as access to holes,
      setting the iomap type to IOMAP_HOLE, the iomap addr to IOMAP_NULL_ADDR
      and using the length argument as is for the iomap length. To simplify
      the code with this change, zonefs_iomap_begin() is split into the read
      variant, zonefs_read_iomap_begin() and zonefs_read_iomap_ops, and the
      write variant, zonefs_write_iomap_begin() and zonefs_write_iomap_ops.
      Reported-by: NJorgen Hansen <Jorgen.Hansen@wdc.com>
      Fixes: 8dcc1a9d ("fs: New zonefs file system")
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NJorgen Hansen <Jorgen.Hansen@wdc.com>
      c1c1204c
    • D
      zonefs: Do not ignore explicit_open with active zone limit · 96eca145
      Damien Le Moal 提交于
      A zoned device may have no limit on the number of open zones but may
      have a limit on the number of active zones it can support. In such
      case, the explicit_open mount option should not be ignored to ensure
      that the open() system call activates the zone with an explicit zone
      open command, thus guaranteeing that the zone can be written.
      
      Enforce this by ignoring the explicit_open mount option only for
      devices that have both the open and active zone limits equal to 0.
      
      Fixes: 87c9ce3f ("zonefs: Add active seq file accounting")
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      96eca145
    • D
      zonefs: fix handling of explicit_open option on mount · a2a513be
      Damien Le Moal 提交于
      Ignoring the explicit_open mount option on mount for devices that do not
      have a limit on the number of open zones must be done after the mount
      options are parsed and set in s_mount_opts. Move the check to ignore
      the explicit_open option after the call to zonefs_parse_options() in
      zonefs_fill_super().
      
      Fixes: b5c00e97 ("zonefs: open/close zone on file open/close")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      a2a513be
  10. 24 5月, 2022 1 次提交
  11. 16 5月, 2022 1 次提交
  12. 10 5月, 2022 2 次提交
  13. 21 4月, 2022 8 次提交
    • D
      zonefs: Fix management of open zones · 1da18a29
      Damien Le Moal 提交于
      The mount option "explicit_open" manages the device open zone
      resources to ensure that if an application opens a sequential file for
      writing, the file zone can always be written by explicitly opening
      the zone and accounting for that state with the s_open_zones counter.
      
      However, if some zones are already open when mounting, the device open
      zone resource usage status will be larger than the initial s_open_zones
      value of 0. Ensure that this inconsistency does not happen by closing
      any sequential zone that is open when mounting.
      
      Furthermore, with ZNS drives, closing an explicitly open zone that has
      not been written will change the zone state to "closed", that is, the
      zone will remain in an active state. Since this can then cause failures
      of explicit open operations on other zones if the drive active zone
      resources are exceeded, we need to make sure that the zone is not
      active anymore by resetting it instead of closing it. To address this,
      zonefs_zone_mgmt() is modified to change a REQ_OP_ZONE_CLOSE request
      into a REQ_OP_ZONE_RESET for sequential zones that have not been
      written.
      
      Fixes: b5c00e97 ("zonefs: open/close zone on file open/close")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NHans Holmberg <hans.holmberg@wdc.com>
      1da18a29
    • D
      zonefs: Clear inode information flags on inode creation · 694852ea
      Damien Le Moal 提交于
      Ensure that the i_flags field of struct zonefs_inode_info is cleared to
      0 when initializing a zone file inode, avoiding seeing the flag
      ZONEFS_ZONE_OPEN being incorrectly set.
      
      Fixes: b5c00e97 ("zonefs: open/close zone on file open/close")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
      Reviewed-by: NHans Holmberg <hans.holmberg@wdc.com>
      694852ea
    • D
      zonefs: Add active seq file accounting · 87c9ce3f
      Damien Le Moal 提交于
      Modify struct zonefs_sb_info to add the s_active_seq_files atomic to
      count the number of seq files representing a zone that is partially
      written or explicitly open, that is, to count sequential files with
      a zone that is in an active state on the device.
      
      The helper function zonefs_account_active() is introduced to update
      this counter whenever a file is written or truncated. This helper is
      also used in the zonefs_seq_file_write_open() and
      zonefs_seq_file_write_close() functions when the explicit_open mount
      option is used.
      
      The s_active_seq_files counter is exported through sysfs using the
      read-only attribute nr_active_seq_files. The device maximum number of
      active zones is also exported through sysfs with the read-only attribute
      max_active_seq_files.
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NHans Holmberg <hans.holmberg@wdc.com>
      87c9ce3f
    • D
      zonefs: Export open zone resource information through sysfs · 9277a6d4
      Damien Le Moal 提交于
      To allow applications to easily check the current usage status of the
      open zone resources of the mounted device, export through sysfs the
      counter of write open sequential files s_wro_seq_files field of
      struct zonefs_sb_info. The attribute is named nr_wro_seq_files and is
      read only.
      
      The maximum number of write open sequential files (zones) indicated by
      the s_max_wro_seq_files field of struct zonefs_sb_info is also exported
      as the read only attribute max_wro_seq_files.
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NHans Holmberg <hans.holmberg@wdc.com>
      9277a6d4
    • D
      zonefs: Always do seq file write open accounting · 7d6dfbe0
      Damien Le Moal 提交于
      The explicit_open mount option forces an explicitly open of the zone of
      sequential files that are open for writing to ensure that the open file
      can be written without the device failing write operations due to open
      zone resources limit being exceeded. To implement this, zonefs accounts
      all write open seq file when this mount option is used.
      
      This accounting however can be easily performed even when the
      explicit_open mount option is not used, thus allowing applications to
      control zone resources on their own, without relying on open() system
      call failures from zonefs.
      
      To implement this, the helper zonefs_file_use_exp_open() is removed and
      replaced with the helper zonefs_seq_file_need_wro() which test if a file
      is a sequential file being open with write access. zonefs_open_zone()
      and zonefs_close_zone() are renamed respectively to
      zonefs_seq_file_write_open() and zonefs_seq_file_write_close() and
      modified to update the s_wro_seq_files counter regardless of the
      explicit_open mount option use.
      
      If the explicit_open mount option is used, zonefs_seq_file_write_open()
      execute an explicit zone open operation for a sequential file open for
      writing for the first time, as before.
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NHans Holmberg <hans.holmberg@wdc.com>
      7d6dfbe0
    • D
      zonefs: Rename super block information fields · 2b95a23c
      Damien Le Moal 提交于
      The s_open_zones field of struct zonefs_sb_info is used to count the
      number of files that are open for writing and may not necessarilly
      correspond to the number of open zones on the device. For instance, an
      application may open for writing a sequential zone file, fully write it
      and keep the file open. In such case, the zone of the file is not open
      anymore (it is in the full state).
      
      Avoid confusion about this counter meaning by renaming it to
      s_wro_seq_files. To keep things consistent, the field s_max_open_zones
      is renamed to s_max_wro_seq_files.
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NHans Holmberg <hans.holmberg@wdc.com>
      2b95a23c
    • D
      zonefs: Fix management of open zones · 19139539
      Damien Le Moal 提交于
      The mount option "explicit_open" manages the device open zone
      resources to ensure that if an application opens a sequential file for
      writing, the file zone can always be written by explicitly opening
      the zone and accounting for that state with the s_open_zones counter.
      
      However, if some zones are already open when mounting, the device open
      zone resource usage status will be larger than the initial s_open_zones
      value of 0. Ensure that this inconsistency does not happen by closing
      any sequential zone that is open when mounting.
      
      Furthermore, with ZNS drives, closing an explicitly open zone that has
      not been written will change the zone state to "closed", that is, the
      zone will remain in an active state. Since this can then cause failures
      of explicit open operations on other zones if the drive active zone
      resources are exceeded, we need to make sure that the zone is not
      active anymore by resetting it instead of closing it. To address this,
      zonefs_zone_mgmt() is modified to change a REQ_OP_ZONE_CLOSE request
      into a REQ_OP_ZONE_RESET for sequential zones that have not been
      written.
      
      Fixes: b5c00e97 ("zonefs: open/close zone on file open/close")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NHans Holmberg <hans.holmberg@wdc.com>
      19139539
    • D
      zonefs: Clear inode information flags on inode creation · b954ebba
      Damien Le Moal 提交于
      Ensure that the i_flags field of struct zonefs_inode_info is cleared to
      0 when initializing a zone file inode, avoiding seeing the flag
      ZONEFS_ZONE_OPEN being incorrectly set.
      
      Fixes: b5c00e97 ("zonefs: open/close zone on file open/close")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
      Reviewed-by: NHans Holmberg <hans.holmberg@wdc.com>
      b954ebba
  14. 18 4月, 2022 1 次提交
  15. 23 3月, 2022 1 次提交
  16. 15 3月, 2022 2 次提交
  17. 08 3月, 2022 1 次提交
  18. 02 2月, 2022 2 次提交
  19. 17 12月, 2021 1 次提交
  20. 24 10月, 2021 1 次提交
    • A
      iomap: Add done_before argument to iomap_dio_rw · 4fdccaa0
      Andreas Gruenbacher 提交于
      Add a done_before argument to iomap_dio_rw that indicates how much of
      the request has already been transferred.  When the request succeeds, we
      report that done_before additional bytes were tranferred.  This is
      useful for finishing a request asynchronously when part of the request
      has already been completed synchronously.
      
      We'll use that to allow iomap_dio_rw to be used with page faults
      disabled: when a page fault occurs while submitting a request, we
      synchronously complete the part of the request that has already been
      submitted.  The caller can then take care of the page fault and call
      iomap_dio_rw again for the rest of the request, passing in the number of
      bytes already tranferred.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      4fdccaa0
  21. 18 10月, 2021 1 次提交
  22. 16 7月, 2021 1 次提交
  23. 13 7月, 2021 1 次提交
  24. 30 6月, 2021 2 次提交
  25. 19 4月, 2021 1 次提交
  26. 17 3月, 2021 1 次提交