1. 10 4月, 2017 1 次提交
    • P
      audit: make sure we don't let the retry queue grow without bounds · 264d5096
      Paul Moore 提交于
      The retry queue is intended to provide a temporary buffer in the case
      of transient errors when communicating with auditd, it is not meant
      as a long life queue, that functionality is provided by the hold
      queue.
      
      This patch fixes a problem identified by Seth where the retry queue
      could grow uncontrollably if an auditd instance did not connect to
      the kernel to drain the queues.  This commit fixes this by doing the
      following:
      
      * Make sure we always call auditd_reset() if we decide the connection
      with audit is really dead.  There were some cases in
      kauditd_hold_skb() where we did not reset the connection, this patch
      relocates the reset calls to kauditd_thread() so all the error
      conditions are caught and the connection reset.  As a side effect,
      this means we could move auditd_reset() and get rid of the forward
      definition at the top of kernel/audit.c.
      
      * We never checked the status of the auditd connection when
      processing the main audit queue which meant that the retry queue
      could grow unchecked.  This patch adds a call to auditd_reset()
      after the main queue has been processed if auditd is not connected,
      the auditd_reset() call will make sure the retry and hold queues are
      correctly managed/flushed so that the retry queue remains reasonable.
      
      Cc: <stable@vger.kernel.org> # 4.10.x-: 5b52330bReported-by: NSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      264d5096
  2. 21 3月, 2017 1 次提交
    • P
      audit: fix auditd/kernel connection state tracking · 5b52330b
      Paul Moore 提交于
      What started as a rather straightforward race condition reported by
      Dmitry using the syzkaller fuzzer ended up revealing some major
      problems with how the audit subsystem managed its netlink sockets and
      its connection with the userspace audit daemon.  Fixing this properly
      had quite the cascading effect and what we are left with is this rather
      large and complicated patch.  My initial goal was to try and decompose
      this patch into multiple smaller patches, but the way these changes
      are intertwined makes it difficult to split these changes into
      meaningful pieces that don't break or somehow make things worse for
      the intermediate states.
      
      The patch makes a number of changes, but the most significant are
      highlighted below:
      
      * The auditd tracking variables, e.g. audit_sock, are now gone and
      replaced by a RCU/spin_lock protected variable auditd_conn which is
      a structure containing all of the auditd tracking information.
      
      * We no longer track the auditd sock directly, instead we track it
      via the network namespace in which it resides and we use the audit
      socket associated with that namespace.  In spirit, this is what the
      code was trying to do prior to this patch (at least I think that is
      what the original authors intended), but it was done rather poorly
      and added a layer of obfuscation that only masked the underlying
      problems.
      
      * Big backlog queue cleanup, again.  In v4.10 we made some pretty big
      changes to how the audit backlog queues work, here we haven't changed
      the queue design so much as cleaned up the implementation.  Brought
      about by the locking changes, we've simplified kauditd_thread() quite
      a bit by consolidating the queue handling into a new helper function,
      kauditd_send_queue(), which allows us to eliminate a lot of very
      similar code and makes the looping logic in kauditd_thread() clearer.
      
      * All netlink messages sent to auditd are now sent via
      auditd_send_unicast_skb().  Other than just making sense, this makes
      the lock handling easier.
      
      * Change the audit_log_start() sleep behavior so that we never sleep
      on auditd events (unchanged) or if the caller is holding the
      audit_cmd_mutex (changed).  Previously we didn't sleep if the caller
      was auditd or if the message type fell between a certain range; the
      type check was a poor effort of doing what the cmd_mutex check now
      does.  Richard Guy Briggs originally proposed not sleeping the
      cmd_mutex owner several years ago but his patch wasn't acceptable
      at the time.  At least the idea lives on here.
      
      * A problem with the lost record counter has been resolved.  Steve
      Grubb and I both happened to notice this problem and according to
      some quick testing by Steve, this problem goes back quite some time.
      It's largely a harmless problem, although it may have left some
      careful sysadmins quite puzzled.
      
      Cc: <stable@vger.kernel.org> # 4.10.x-
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      5b52330b
  3. 19 1月, 2017 1 次提交
  4. 15 12月, 2016 11 次提交
  5. 06 12月, 2016 1 次提交
  6. 02 12月, 2016 1 次提交
  7. 18 11月, 2016 1 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
  8. 31 8月, 2016 1 次提交
  9. 29 6月, 2016 1 次提交
  10. 27 6月, 2016 1 次提交
  11. 27 4月, 2016 1 次提交
  12. 05 4月, 2016 1 次提交
  13. 28 1月, 2016 2 次提交
  14. 26 1月, 2016 2 次提交
    • R
      audit: log failed attempts to change audit_pid configuration · 935c9e7f
      Richard Guy Briggs 提交于
      Failed attempts to change the audit_pid configuration are not presently
      logged.  One case is an attempt to starve an old auditd by starting up
      a new auditd when the old one is still alive and active.  The other
      case is an attempt to orphan a new auditd when an old auditd shuts
      down.
      
      Log both as AUDIT_CONFIG_CHANGE messages with failure result.
      Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      935c9e7f
    • R
      audit: stop an old auditd being starved out by a new auditd · 133e1e5a
      Richard Guy Briggs 提交于
      Nothing prevents a new auditd starting up and replacing a valid
      audit_pid when an old auditd is still running, effectively starving out
      the old auditd since audit_pid no longer points to the old valid
      auditd.
      
      If no message to auditd has been attempted since auditd died
      unnaturally or got killed, audit_pid will still indicate it is alive.
      There isn't an easy way to detect if an old auditd is still running on
      the existing audit_pid other than attempting to send a message to see
      if it fails.  An -ECONNREFUSED almost certainly means it disappeared
      and can be replaced.  Other errors are not so straightforward and may
      indicate transient problems that will resolve themselves and the old
      auditd will recover.  Yet others will likely need manual intervention
      for which a new auditd will not solve the problem.
      
      Send a new message type (AUDIT_REPLACE) to the old auditd containing a
      u32 with the PID of the new auditd.  If the audit replace message
      succeeds (or doesn't fail with certainty), fail to register the new
      auditd and return an error (-EEXIST).
      
      This is expected to make the patch preventing an old auditd orphaning a
      new auditd redundant.
      
      V3: Switch audit message type from 1000 to 1300 block.
      Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      133e1e5a
  15. 13 1月, 2016 5 次提交
  16. 25 12月, 2015 1 次提交
  17. 07 11月, 2015 1 次提交
    • M
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc
      Mel Gorman 提交于
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
      
      __GFP_WAIT has been used to identify atomic context in callers that hold
      spinlocks or are in interrupts.  They are expected to be high priority and
      have access one of two watermarks lower than "min" which can be referred
      to as the "atomic reserve".  __GFP_HIGH users get access to the first
      lower watermark and can be called the "high priority reserve".
      
      Over time, callers had a requirement to not block when fallback options
      were available.  Some have abused __GFP_WAIT leading to a situation where
      an optimisitic allocation with a fallback option can access atomic
      reserves.
      
      This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
      cannot sleep and have no alternative.  High priority users continue to use
      __GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
      are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
      callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
      redefined as a caller that is willing to enter direct reclaim and wake
      kswapd for background reclaim.
      
      This patch then converts a number of sites
      
      o __GFP_ATOMIC is used by callers that are high priority and have memory
        pools for those requests. GFP_ATOMIC uses this flag.
      
      o Callers that have a limited mempool to guarantee forward progress clear
        __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
        into this category where kswapd will still be woken but atomic reserves
        are not used as there is a one-entry mempool to guarantee progress.
      
      o Callers that are checking if they are non-blocking should use the
        helper gfpflags_allow_blocking() where possible. This is because
        checking for __GFP_WAIT as was done historically now can trigger false
        positives. Some exceptions like dm-crypt.c exist where the code intent
        is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
        flag manipulations.
      
      o Callers that built their own GFP flags instead of starting with GFP_KERNEL
        and friends now also need to specify __GFP_KSWAPD_RECLAIM.
      
      The first key hazard to watch out for is callers that removed __GFP_WAIT
      and was depending on access to atomic reserves for inconspicuous reasons.
      In some cases it may be appropriate for them to use __GFP_HIGH.
      
      The second key hazard is callers that assembled their own combination of
      GFP flags instead of starting with something like GFP_KERNEL.  They may
      now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
      if it's missed in most cases as other activity will wake kswapd.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0164adc
  18. 04 11月, 2015 4 次提交
    • P
      audit: make audit_log_common_recv_msg() a void function · 233a6866
      Paul Moore 提交于
      It always returns zero and no one is checking the return value.
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      233a6866
    • S
      audit: removing unused variable · c5ea6efd
      Saurabh Sengar 提交于
      Variable rc in not required as it is just used for unchanged for return,
      and return is always 0 in the function.
      Signed-off-by: NSaurabh Sengar <saurabh.truth@gmail.com>
      [PM: fixed spelling errors in description]
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      c5ea6efd
    • Y
      audit: audit_string_contains_control can be boolean · 9fcf836b
      Yaowei Bai 提交于
      This patch makes audit_string_contains_control return bool to improve
      readability due to this particular function only using either one or
      zero as its return value.
      Signed-off-by: NYaowei Bai <bywxiaobai@163.com>
      [PM: tweaked subject line]
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      9fcf836b
    • R
      audit: try harder to send to auditd upon netlink failure · 32a1dbae
      Richard Guy Briggs 提交于
      There are several reports of the kernel losing contact with auditd when
      it is, in fact, still running.  When this happens, kernel syslogs show:
      	"audit: *NO* daemon at audit_pid=<pid>"
      although auditd is still running, and is apparently happy, listening on
      the netlink socket. The pid in the "*NO* daemon" message matches the pid
      of the running auditd process.  Restarting auditd solves this.
      
      The problem appears to happen randomly, and doesn't seem to be strongly
      correlated to the rate of audit events being logged.  The problem
      happens fairly regularly (every few days), but not yet reproduced to
      order.
      
      On production kernels, BUG_ON() is a no-op, so any error will trigger
      this.
      
      Commit 34eab0a7 ("audit: prevent an older auditd shutdown from
      orphaning a newer auditd startup") eliminates one possible cause.  This
      isn't the case here, since the PID in the error message and the PID of
      the running auditd match.
      
      The primary expected cause of error here is -ECONNREFUSED when the audit
      daemon goes away, when netlink_getsockbyportid() can't find the auditd
      portid entry in the netlink audit table (or there is no receive
      function).  If -EPERM is returned, that situation isn't likely to be
      resolved in a timely fashion without administrator intervention.  In
      both cases, reset the audit_pid.  This does not rule out a race
      condition.  SELinux is expected to return zero since this isn't an INET
      or INET6 socket.  Other LSMs may have other return codes.  Log the error
      code for better diagnosis in the future.
      
      In the case of -ENOMEM, the situation could be temporary, based on local
      or general availability of buffers.  -EAGAIN should never happen since
      the netlink audit (kernel) socket is set to MAX_SCHEDULE_TIMEOUT.
      -ERESTARTSYS and -EINTR are not expected since this kernel thread is not
      expected to receive signals.  In these cases (or any other unexpected
      ones for now), report the error and re-schedule the thread, retrying up
      to 5 times.
      
      v2:
      	Removed BUG_ON().
      	Moved comma in pr_*() statements.
      	Removed audit_strerror() text.
      Reported-by: NVipin Rathor <v.rathor@gmail.com>
      Reported-by: <ctcard@hotmail.com>
      Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
      [PM: applied rgb's fixup patch to correct audit_log_lost() format issues]
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      32a1dbae
  19. 07 8月, 2015 1 次提交
  20. 30 5月, 2015 1 次提交
  21. 16 4月, 2015 1 次提交