1. 29 10月, 2013 29 次提交
    • W
      NFS: separate passed security flavs from selected · a3f73c27
      Weston Andros Adamson 提交于
      When filling parsed_mount_data, store the parsed sec= mount option in
      the new struct nfs_auth_info and the chosen flavor in selected_flavor.
      
      This patch lays the groundwork for supporting multiple sec= options.
      Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      a3f73c27
    • W
      NFSv4: make nfs_find_best_sec static · 47fd88e6
      Weston Andros Adamson 提交于
      It's not used outside of nfs4namespace.c anymore.
      Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      47fd88e6
    • C
      NFS: Fix possible endless state recovery wait · 0625c2dd
      Chuck Lever 提交于
      In nfs4_wait_clnt_recover(), hold a reference to the clp being
      waited on.  The state manager can reduce clp->cl_count to 1, in
      which case the nfs_put_client() in nfs4_run_state_manager() can
      free *clp before wait_on_bit() returns and allows
      nfs4_wait_clnt_recover() to run again.
      
      The behavior at that point is non-deterministic.  If the waited-on
      bit still happens to be zero, wait_on_bit() will wake the waiter as
      expected.  If the bit is set again (say, if the memory was poisoned
      when freed) wait_on_bit() can leave the waiter asleep.
      
      This is a narrow fix which ensures the safety of accessing *clp in
      nfs4_wait_clnt_recover(), but does not address the continued use
      of a possibly freed *clp after nfs4_wait_clnt_recover() returns
      (see nfs_end_delegation_return(), for example).
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      0625c2dd
    • C
      NFS: Set EXCHGID4_FLAG_SUPP_MOVED_MIGR · cd3fadec
      Chuck Lever 提交于
      Broadly speaking, v4.1 migration is untested.  There are no servers
      in the wild that support NFSv4.1 migration.  However, as server
      implementations become available, we do want to enable testing by
      developers, while leaving it disabled for environments for which
      broken migration support would be an unpleasant surprise.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      cd3fadec
    • C
      NFS: Handle SEQ4_STATUS_LEASE_MOVED · d1c2331e
      Chuck Lever 提交于
      With the advent of NFSv4 sessions in NFSv4.1 and following, a "lease
      moved" condition is reported differently than it is in NFSv4.0.
      
      NFSv4 minor version 0 servers return an error status code,
      NFS4ERR_LEASE_MOVED, to signal that a lease has moved.  This error
      causes the whole compound operation to fail.  Normal compounds
      against this server continue to fail until the client performs
      migration recovery on the migrated share.
      
      Minor version 1 and later servers assert a bit flag in the reply to
      a compound's SEQUENCE operation to signal LEASE_MOVED.  This is not
      a fatal condition: operations against this server continue normally.
      The server asserts this flag until the client performs migration
      recovery on the migrated share.
      
      Note that servers MUST NOT return NFS4ERR_LEASE_MOVED to NFSv4
      clients not using NFSv4.0.
      
      After the server asserts any of the sr_status_flags in the SEQUENCE
      operation in a typical compound, our client initiates standard lease
      recovery.  For NFSv4.1+, a stand-alone SEQUENCE operation is
      performed to discover what recovery is needed.
      
      If SEQ4_STATUS_LEASE_MOVED is asserted in this stand-alone SEQUENCE
      operation, our client attempts to discover which FSIDs have been
      migrated, and then performs migration recovery on each.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      d1c2331e
    • C
      NFS: Handle NFS4ERR_LEASE_MOVED during async RENEW · f8aba1e8
      Chuck Lever 提交于
      With NFSv4 minor version 0, the asynchronous lease RENEW
      heartbeat can return NFS4ERR_LEASE_MOVED.  Error recovery logic for
      async RENEW is a separate code path from the generic NFS proc paths,
      so it must be updated to handle NFS4ERR_LEASE_MOVED as well.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      f8aba1e8
    • C
      NFS: Migration support for RELEASE_LOCKOWNER · 60ea6812
      Chuck Lever 提交于
      Currently the Linux NFS client ignores the operation status code for
      the RELEASE_LOCKOWNER operation.  Like NFSv3's UMNT operation,
      RELEASE_LOCKOWNER is a courtesy to help servers manage their
      resources, and the outcome is not consequential for the client.
      
      During a migration, a server may report NFS4ERR_LEASE_MOVED, in
      which case the client really should retry, since typically
      LEASE_MOVED has nothing to do with the current operation, but does
      prevent it from going forward.
      
      Also, it's important for a client to respond as soon as possible to
      a moved lease condition, since the client's lease could expire on
      the destination without further action by the client.
      
      NFS4ERR_DELAY is not included in the list of valid status codes for
      RELEASE_LOCKOWNER in RFC 3530bis.  However, rfc3530-migration-update
      does permit migration-capable servers to return DELAY to clients,
      but only in the context of an ongoing migration.  In this case the
      server has frozen lock state in preparation for migration, and a
      client retry would help the destination server purge unneeded state
      once migration recovery is complete.
      
      Interestly, NFS4ERR_MOVED is not valid for RELEASE_LOCKOWNER, even
      though lock owners can be migrated with Transparent State Migration.
      
      Note that RFC 3530bis section 9.5 includes RELEASE_LOCKOWNER in the
      list of operations that renew a client's lease on the server if they
      succeed.  Now that our client pays attention to the operation's
      status code, we can note that renewal appropriately.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      60ea6812
    • C
      NFS: Implement support for NFS4ERR_LEASE_MOVED · 8ef2f8d4
      Chuck Lever 提交于
      Trigger lease-moved recovery when a request returns
      NFS4ERR_LEASE_MOVED.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      8ef2f8d4
    • C
      NFS: Support NFS4ERR_LEASE_MOVED recovery in state manager · b7f7a66e
      Chuck Lever 提交于
      A migration on the FSID in play for the current NFS operation
      is reported via the error status code NFS4ERR_MOVED.
      
      "Lease moved" means that a migration has occurred on some other
      FSID than the one for the current operation.  It's a signal that
      the client should take action immediately to handle a migration
      that it may not have noticed otherwise.  This is so that the
      client's lease does not expire unnoticed on the destination server.
      
      In NFSv4.0, a moved lease is reported with the NFS4ERR_LEASE_MOVED
      error status code.
      
      To recover from NFS4ERR_LEASE_MOVED, check each FSID for that server
      to see if it is still present.  Invoke nfs4_try_migration() if the
      FSID is no longer present on the server.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      b7f7a66e
    • C
      NFS: Add method to detect whether an FSID is still on the server · 44c99933
      Chuck Lever 提交于
      Introduce a mechanism for probing a server to determine if an FSID
      is present or absent.
      
      The on-the-wire compound is different between minor version 0 and 1.
      Minor version 0 appends a RENEW operation to identify which client
      ID is probing.  Minor version 1 has a SEQUENCE operation in the
      compound which effectively carries the same information.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      44c99933
    • C
      NFS: Handle NFS4ERR_MOVED during delegation recall · 352297b9
      Chuck Lever 提交于
      When a server returns NFS4ERR_MOVED during a delegation recall,
      trigger the new migration recovery logic in the state manager.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      352297b9
    • C
      NFS: Add migration recovery callouts in nfs4proc.c · 519ae255
      Chuck Lever 提交于
      When a server returns NFS4ERR_MOVED, trigger the new migration
      recovery logic in the state manager.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      519ae255
    • C
      NFS: Rename "stateid_invalid" label · 9f51a78e
      Chuck Lever 提交于
      I'm going to use this exit label also for migration recovery
      failures.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      9f51a78e
    • C
      NFS: Re-use exit code in nfs4_async_handle_error() · f1478c13
      Chuck Lever 提交于
      Clean up.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      f1478c13
    • C
      NFS: Add basic migration support to state manager thread · c9fdeb28
      Chuck Lever 提交于
      Migration recovery and state recovery must be serialized, so handle
      both in the state manager thread.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      c9fdeb28
    • C
      NFS: Add a super_block backpointer to the nfs_server struct · ce6cda18
      Chuck Lever 提交于
      NFS_SB() returns the pointer to an nfs_server struct, given a
      pointer to a super_block.  But we have no way to go back the other
      way.
      
      Add a super_block backpointer field so that, given an nfs_server
      struct, it is easy to get to the filesystem's root dentry.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      ce6cda18
    • C
      NFS: Add method to retrieve fs_locations during migration recovery · b03d735b
      Chuck Lever 提交于
      The nfs4_proc_fs_locations() function is invoked during referral
      processing to perform a GETATTR(fs_locations) on an object's parent
      directory in order to discover the target of the referral.  It
      performs a LOOKUP in the compound, so the client needs to know the
      parent's file handle a priori.
      
      Unfortunately this function is not adequate for handling migration
      recovery.  We need to probe fs_locations information on an FSID, but
      there's no parent directory available for many operations that
      can return NFS4ERR_MOVED.
      
      Another subtlety: recovering from NFS4ERR_LEASE_MOVED is a process
      of walking over a list of known FSIDs that reside on the server, and
      probing whether they have migrated.  Once the server has detected
      that the client has probed all migrated file systems, it stops
      returning NFS4ERR_LEASE_MOVED.
      
      A minor version zero server needs to know what client ID is
      requesting fs_locations information so it can clear the flag that
      forces it to continue returning NFS4ERR_LEASE_MOVED.  This flag is
      set per client ID and per FSID.  However, the client ID is not an
      argument of either the PUTFH or GETATTR operations.  Later minor
      versions have client ID information embedded in the compound's
      SEQUENCE operation.
      
      Therefore, by convention, minor version zero clients send a RENEW
      operation in the same compound as the GETATTR(fs_locations), since
      RENEW's one argument is a clientid4.  This allows a minor version
      zero server to identify correctly the client that is probing for a
      migration.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      b03d735b
    • C
      NFS: Export _nfs_display_fhandle() · 9e6ee76d
      Chuck Lever 提交于
      Allow code in nfsv4.ko to use _nfs_display_fhandle().
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      9e6ee76d
    • C
      NFS: Introduce a vector of migration recovery ops · ec011fe8
      Chuck Lever 提交于
      The differences between minor version 0 and minor version 1
      migration will be abstracted by the addition of a set of migration
      recovery ops.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      ec011fe8
    • C
      NFS: Add functions to swap transports during migration recovery · 800c06a5
      Chuck Lever 提交于
      Introduce functions that can walk through an array of returned
      fs_locations information and connect a transport to one of the
      destination servers listed therein.
      
      Note that NFS minor version 1 introduces "fs_locations_info" which
      extends the locations array sorting criteria available to clients.
      This is not supported yet.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      800c06a5
    • C
      NFS: Add nfs4_update_server · 32e62b7c
      Chuck Lever 提交于
      New function nfs4_update_server() moves an nfs_server to a different
      nfs_client.  This is done as part of migration recovery.
      
      Though it may be appealing to think of them as the same thing,
      migration recovery is not the same as following a referral.
      
      For a referral, the client has not descended into the file system
      yet: it has no nfs_server, no super block, no inodes or open state.
      It is enough to simply instantiate the nfs_server and super block,
      and perform a referral mount.
      
      For a migration, however, we have all of those things already, and
      they have to be moved to a different nfs_client.  No local namespace
      changes are needed here.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      32e62b7c
    • W
      NFSv4: don't reprocess cached open CLAIM_PREVIOUS · d2bfda2e
      Weston Andros Adamson 提交于
      Cached opens have already been handled by _nfs4_opendata_reclaim_to_nfs4_state
      and can safely skip being reprocessed, but must still call update_open_stateid
      to make sure that all active fmodes are recovered.
      Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
      Cc: stable@vger.kernel.org # 3.7.x: f494a607: NFSv4: fix NULL dereference
      Cc: stable@vger.kernel.org # 3.7.x: a43ec98b: NFSv4: don't fail on missin
      Cc: stable@vger.kernel.org # 3.7.x
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      d2bfda2e
    • T
      NFSv4: Fix state reference counting in _nfs4_opendata_reclaim_to_nfs4_state · d49f042a
      Trond Myklebust 提交于
      Currently, if the call to nfs_refresh_inode fails, then we end up leaking
      a reference count, due to the call to nfs4_get_open_state.
      While we're at it, replace nfs4_get_open_state with a simple call to
      atomic_inc(); there is no need to do a full lookup of the struct nfs_state
      since it is passed as an argument in the struct nfs4_opendata, and
      is already assigned to the variable 'state'.
      
      Cc: stable@vger.kernel.org # 3.7.x: a43ec98b: NFSv4: don't fail on missing
      Cc: stable@vger.kernel.org # 3.7.x
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      d49f042a
    • W
      NFSv4: don't fail on missing fattr in open recover · a43ec98b
      Weston Andros Adamson 提交于
      This is an unneeded check that could cause the client to fail to recover
      opens.
      Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      a43ec98b
    • W
      NFSv4: fix NULL dereference in open recover · f494a607
      Weston Andros Adamson 提交于
      _nfs4_opendata_reclaim_to_nfs4_state doesn't expect to see a cached
      open CLAIM_PREVIOUS, but this can happen. An example is when there are
      RDWR openers and RDONLY openers on a delegation stateid. The recovery
      path will first try an open CLAIM_PREVIOUS for the RDWR openers, this
      marks the delegation as not needing RECLAIM anymore, so the open
      CLAIM_PREVIOUS for the RDONLY openers will not actually send an rpc.
      
      The NULL dereference is due to _nfs4_opendata_reclaim_to_nfs4_state
      returning PTR_ERR(rpc_status) when !rpc_done. When the open is
      cached, rpc_done == 0 and rpc_status == 0, thus
      _nfs4_opendata_reclaim_to_nfs4_state returns NULL - this is unexpected
      by callers of nfs4_opendata_to_nfs4_state().
      
      This can be reproduced easily by opening the same file two times on an
      NFSv4.0 mount with delegations enabled, once as RDWR and once as RDONLY then
      sleeping for a long time.  While the files are held open, kick off state
      recovery and this NULL dereference will be hit every time.
      
      An example OOPS:
      
      [   65.003602] BUG: unable to handle kernel NULL pointer dereference at 00000000
      00000030
      [   65.005312] IP: [<ffffffffa037d6ee>] __nfs4_close+0x1e/0x160 [nfsv4]
      [   65.006820] PGD 7b0ea067 PUD 791ff067 PMD 0
      [   65.008075] Oops: 0000 [#1] SMP
      [   65.008802] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache
      snd_ens1371 gameport nfsd snd_rawmidi snd_ac97_codec ac97_bus btusb snd_seq snd
      _seq_device snd_pcm ppdev bluetooth auth_rpcgss coretemp snd_page_alloc crc32_pc
      lmul crc32c_intel ghash_clmulni_intel microcode rfkill nfs_acl vmw_balloon serio
      _raw snd_timer lockd parport_pc e1000 snd soundcore parport i2c_piix4 shpchp vmw
      _vmci sunrpc ata_generic mperf pata_acpi mptspi vmwgfx ttm scsi_transport_spi dr
      m mptscsih mptbase i2c_core
      [   65.018684] CPU: 0 PID: 473 Comm: 192.168.10.85-m Not tainted 3.11.2-201.fc19
      .x86_64 #1
      [   65.020113] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
      Reference Platform, BIOS 6.00 07/31/2013
      [   65.022012] task: ffff88003707e320 ti: ffff88007b906000 task.ti: ffff88007b906000
      [   65.023414] RIP: 0010:[<ffffffffa037d6ee>]  [<ffffffffa037d6ee>] __nfs4_close+0x1e/0x160 [nfsv4]
      [   65.025079] RSP: 0018:ffff88007b907d10  EFLAGS: 00010246
      [   65.026042] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      [   65.027321] RDX: 0000000000000050 RSI: 0000000000000001 RDI: 0000000000000000
      [   65.028691] RBP: ffff88007b907d38 R08: 0000000000016f60 R09: 0000000000000000
      [   65.029990] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
      [   65.031295] R13: 0000000000000050 R14: 0000000000000000 R15: 0000000000000001
      [   65.032527] FS:  0000000000000000(0000) GS:ffff88007f600000(0000) knlGS:0000000000000000
      [   65.033981] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   65.035177] CR2: 0000000000000030 CR3: 000000007b27f000 CR4: 00000000000407f0
      [   65.036568] Stack:
      [   65.037011]  0000000000000000 0000000000000001 ffff88007b907d90 ffff88007a880220
      [   65.038472]  ffff88007b768de8 ffff88007b907d48 ffffffffa037e4a5 ffff88007b907d80
      [   65.039935]  ffffffffa036a6c8 ffff880037020e40 ffff88007a880000 ffff880037020e40
      [   65.041468] Call Trace:
      [   65.042050]  [<ffffffffa037e4a5>] nfs4_close_state+0x15/0x20 [nfsv4]
      [   65.043209]  [<ffffffffa036a6c8>] nfs4_open_recover_helper+0x148/0x1f0 [nfsv4]
      [   65.044529]  [<ffffffffa036a886>] nfs4_open_recover+0x116/0x150 [nfsv4]
      [   65.045730]  [<ffffffffa036d98d>] nfs4_open_reclaim+0xad/0x150 [nfsv4]
      [   65.046905]  [<ffffffffa037d979>] nfs4_do_reclaim+0x149/0x5f0 [nfsv4]
      [   65.048071]  [<ffffffffa037e1dc>] nfs4_run_state_manager+0x3bc/0x670 [nfsv4]
      [   65.049436]  [<ffffffffa037de20>] ? nfs4_do_reclaim+0x5f0/0x5f0 [nfsv4]
      [   65.050686]  [<ffffffffa037de20>] ? nfs4_do_reclaim+0x5f0/0x5f0 [nfsv4]
      [   65.051943]  [<ffffffff81088640>] kthread+0xc0/0xd0
      [   65.052831]  [<ffffffff81088580>] ? insert_kthread_work+0x40/0x40
      [   65.054697]  [<ffffffff8165686c>] ret_from_fork+0x7c/0xb0
      [   65.056396]  [<ffffffff81088580>] ? insert_kthread_work+0x40/0x40
      [   65.058208] Code: 5c 41 5d 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 89 d5 41 54 53 48 89 fb <4c> 8b 67 30 f0 41 ff 44 24 44 49 8d 7c 24 40 e8 0e 0a 2d e1 44
      [   65.065225] RIP  [<ffffffffa037d6ee>] __nfs4_close+0x1e/0x160 [nfsv4]
      [   65.067175]  RSP <ffff88007b907d10>
      [   65.068570] CR2: 0000000000000030
      [   65.070098] ---[ end trace 0d1fe4f5c7dd6f8b ]---
      
      Cc: <stable@vger.kernel.org> #3.7+
      Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      f494a607
    • T
      NFSv4.1: Don't change the security label as part of open reclaim. · 83c78eb0
      Trond Myklebust 提交于
      The current caching model calls for the security label to be set on
      first lookup and/or on any subsequent label changes. There is no
      need to do it as part of an open reclaim.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      83c78eb0
    • J
      nfs: fix handling of invalid mount options in nfs_remount · 1966903f
      Jeff Layton 提交于
      nfs_parse_mount_options returns 0 on error, not -errno.
      Reported-by: NKarel Zak <kzak@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      1966903f
    • J
    • A
      NFSv4 Remove zeroing state kern warnings · 3660cd43
      Andy Adamson 提交于
      As of commit 5d422301 we no longer zero the
      state.
      Signed-off-by: NAndy Adamson <andros@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      3660cd43
  2. 02 10月, 2013 2 次提交
  3. 01 10月, 2013 2 次提交
    • V
      nilfs2: fix issue with race condition of competition between segments for dirty blocks · 7f42ec39
      Vyacheslav Dubeyko 提交于
      Many NILFS2 users were reported about strange file system corruption
      (for example):
      
         NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768
         NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540)
      
      But such error messages are consequence of file system's issue that takes
      place more earlier.  Fortunately, Jerome Poulin <jeromepoulin@gmail.com>
      and Anton Eliasson <devel@antoneliasson.se> were reported about another
      issue not so recently.  These reports describe the issue with segctor
      thread's crash:
      
        BUG: unable to handle kernel paging request at 0000000000004c83
        IP: nilfs_end_page_io+0x12/0xd0 [nilfs2]
      
        Call Trace:
         nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
         nilfs_segctor_construct+0x17b/0x290 [nilfs2]
         nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
         kthread+0xc0/0xd0
         ret_from_fork+0x7c/0xb0
      
      These two issues have one reason.  This reason can raise third issue
      too.  Third issue results in hanging of segctor thread with eating of
      100% CPU.
      
      REPRODUCING PATH:
      
      One of the possible way or the issue reproducing was described by
      Jermoe me Poulin <jeromepoulin@gmail.com>:
      
      1. init S to get to single user mode.
      2. sysrq+E to make sure only my shell is running
      3. start network-manager to get my wifi connection up
      4. login as root and launch "screen"
      5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies.
      6. lscp | xz -9e > lscp.txt.xz
      7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs
      8. start a screen to dump /proc/kmsg to text file since rsyslog is killed
      9. start a screen and launch strace -f -o find-cat.log -t find
      /mnt/nilfs -type f -exec cat {} > /dev/null \;
      10. start a screen and launch strace -f -o apt-get.log -t apt-get update
      11. launch the last command again as it did not crash the first time
      12. apt-get crashes
      13. ps aux > ps-aux-crashed.log
      13. sysrq+W
      14. sysrq+E  wait for everything to terminate
      15. sysrq+SUSB
      
      Simplified way of the issue reproducing is starting kernel compilation
      task and "apt-get update" in parallel.
      
      REPRODUCIBILITY:
      
      The issue is reproduced not stable [60% - 80%].  It is very important to
      have proper environment for the issue reproducing.  The critical
      conditions for successful reproducing:
      
      (1) It should have big modified file by mmap() way.
      
      (2) This file should have the count of dirty blocks are greater that
          several segments in size (for example, two or three) from time to time
          during processing.
      
      (3) It should be intensive background activity of files modification
          in another thread.
      
      INVESTIGATION:
      
      First of all, it is possible to see that the reason of crash is not valid
      page address:
      
        NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
        NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783
      
      Moreover, value of b_page (0x1a82) is 6786.  This value looks like segment
      number.  And b_blocknr with b_size values look like block numbers.  So,
      buffer_head's pointer points on not proper address value.
      
      Detailed investigation of the issue is discovered such picture:
      
        [-----------------------------SEGMENT 6783-------------------------------]
        NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
        NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
        NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
        NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
        NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
        NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
        NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111149024, segbuf->sb_segnum 6783
      
        [-----------------------------SEGMENT 6784-------------------------------]
        NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
        NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
        NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff8802174a6798, bh->b_assoc_buffers.prev ffff880221cffee8
        NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
        NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
        NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
        NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
        NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
        NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784
        NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111150080, segbuf->sb_segnum 6784, segbuf->sb_nbio 0
        [----------] ditto
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111164416, segbuf->sb_segnum 6784, segbuf->sb_nbio 15
      
        [-----------------------------SEGMENT 6785-------------------------------]
        NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
        NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
        NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff880219277e80, bh->b_assoc_buffers.prev ffff880221cffc88
        NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
        NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
        NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
        NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
        NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785
        NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111165440, segbuf->sb_segnum 6785, segbuf->sb_nbio 0
        [----------] ditto
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111177728, segbuf->sb_segnum 6785, segbuf->sb_nbio 12
      
        NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait
        NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783
        NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784
        NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785
      
        NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
      
        BUG: unable to handle kernel paging request at 0000000000001a82
        IP: [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2]
      
      Usually, for every segment we collect dirty files in list.  Then, dirty
      blocks are gathered for every dirty file, prepared for write and
      submitted by means of nilfs_segbuf_submit_bh() call.  Finally, it takes
      place complete write phase after calling nilfs_end_bio_write() on the
      block layer.  Buffers/pages are marked as not dirty on final phase and
      processed files removed from the list of dirty files.
      
      It is possible to see that we had three prepare_write and submit_bio
      phases before segbuf_wait and complete_write phase.  Moreover, segments
      compete between each other for dirty blocks because on every iteration
      of segments processing dirty buffer_heads are added in several lists of
      payload_buffers:
      
        [SEGMENT 6784]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
        [SEGMENT 6785]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
      
      The next pointer is the same but prev pointer has changed.  It means
      that buffer_head has next pointer from one list but prev pointer from
      another.  Such modification can be made several times.  And, finally, it
      can be resulted in various issues: (1) segctor hanging, (2) segctor
      crashing, (3) file system metadata corruption.
      
      FIX:
      This patch adds:
      
      (1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write()
          for every proccessed dirty block;
      
      (2) checking of BH_Async_Write flag in
          nilfs_lookup_dirty_data_buffers() and
          nilfs_lookup_dirty_node_buffers();
      
      (3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(),
          nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page().
      Reported-by: NJerome Poulin <jeromepoulin@gmail.com>
      Reported-by: NAnton Eliasson <devel@antoneliasson.se>
      Cc: Paul Fertser <fercerpav@gmail.com>
      Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp>
      Cc: Piotr Szymaniak <szarpaj@grubelek.pl>
      Cc: Juan Barry Manuel Canham <Linux@riotingpacifist.net>
      Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com>
      Cc: Elmer Zhang <freeboy6716@gmail.com>
      Cc: Kenneth Langga <klangga@gmail.com>
      Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com>
      Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f42ec39
    • D
      fs/binfmt_elf.c: prevent a coredump with a large vm_map_count from Oopsing · 72023656
      Dan Aloni 提交于
      A high setting of max_map_count, and a process core-dumping with a large
      enough vm_map_count could result in an NT_FILE note not being written,
      and the kernel crashing immediately later because it has assumed
      otherwise.
      
      Reproduction of the oops-causing bug described here:
      
          https://lkml.org/lkml/2013/8/30/50
      
      Rge ussue originated in commit 2aa362c4 ("coredump: extend core dump
      note section to contain file names of mapped file") from Oct 4, 2012.
      
      This patch make that section optional in that case.  fill_files_note()
      should signify the error, and also let the info struct in
      elf_core_dump() be zero-initialized so that we can check for the
      optionally written note.
      
      [akpm@linux-foundation.org: avoid abusing E2BIG, remove a couple of not-really-needed local variables]
      [akpm@linux-foundation.org: fix sparse warning]
      Signed-off-by: NDan Aloni <alonid@stratoscale.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Reported-by: NMartin MOKREJS <mmokrejs@gmail.com>
      Tested-by: NMartin MOKREJS <mmokrejs@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72023656
  4. 30 9月, 2013 7 次提交