• B
    sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK() · ec81048c
    Boqun Feng 提交于
    In theory, COMPLETION_INITIALIZER_ONSTACK() should never affect the
    stack allocation of the caller. However, on some compilers, a temporary
    structure was allocated for the return value of
    COMPLETION_INITIALIZER_ONSTACK().
    
    For example in write_journal() with LOCKDEP_COMPLETIONS=y (GCC is 7.1.1):
    
    	io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
    	    2462:       e8 00 00 00 00          callq  2467 <write_journal+0x47>
    	    2467:       48 8d 85 80 fd ff ff    lea    -0x280(%rbp),%rax
    	    246e:       48 c7 c6 00 00 00 00    mov    $0x0,%rsi
    	    2475:       48 c7 c2 00 00 00 00    mov    $0x0,%rdx
    		x->done = 0;
    	    247c:       c7 85 90 fd ff ff 00    movl   $0x0,-0x270(%rbp)
    	    2483:       00 00 00
    		init_waitqueue_head(&x->wait);
    	    2486:       48 8d 78 18             lea    0x18(%rax),%rdi
    	    248a:       e8 00 00 00 00          callq  248f <write_journal+0x6f>
    		if (commit_start + commit_sections <= ic->journal_sections) {
    	    248f:       41 8b 87 a8 00 00 00    mov    0xa8(%r15),%eax
    		io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
    	    2496:       48 8d bd e8 f9 ff ff    lea    -0x618(%rbp),%rdi
    	    249d:       48 8d b5 90 fd ff ff    lea    -0x270(%rbp),%rsi
    	    24a4:       b9 17 00 00 00          mov    $0x17,%ecx
    	    24a9:       f3 48 a5                rep movsq %ds:(%rsi),%es:(%rdi)
    		if (commit_start + commit_sections <= ic->journal_sections) {
    	    24ac:       41 39 c6                cmp    %eax,%r14d
    		io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
    	    24af:       48 8d bd 90 fd ff ff    lea    -0x270(%rbp),%rdi
    	    24b6:       48 8d b5 e8 f9 ff ff    lea    -0x618(%rbp),%rsi
    	    24bd:       b9 17 00 00 00          mov    $0x17,%ecx
    	    24c2:       f3 48 a5                rep movsq %ds:(%rsi),%es:(%rdi)
    
    We can obviously see the temporary structure allocated, and the compiler
    also does two meaningless memcpy with "rep movsq".
    
    And according to:
    
    	https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html#Statement-Exprs
    
    The return value of a statement expression is returned by value, so the
    temporary variable is created in COMPLETION_INITIALIZER_ONSTACK(), and
    that's why the temporary structures are allocted.
    
    To fix this, make the brace block in COMPLETION_INITIALIZER_ONSTACK()
    return a pointer and dereference it outside the block rather than return
    the whole structure, in this way, we are able to teach the compiler not
    to do the unnecessary stack allocation.
    
    This could also reduce the stack size even if !LOCKDEP, for example in
    write_journal(), compiled with gcc 7.1.1, the result of command:
    
    	 objdump -d drivers/md/dm-integrity.o | ./scripts/checkstack.pl x86
    
    before:
    
    	0x0000246a write_journal [dm-integrity.o]:              696
    
    after:
    
    	0x00002b7a write_journal [dm-integrity.o]:              296
    Reported-by: NArnd Bergmann <arnd@arndb.de>
    Signed-off-by: NBoqun Feng <boqun.feng@gmail.com>
    Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
    Acked-by: NArnd Bergmann <arnd@arndb.de>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Byungchul Park <byungchul.park@lge.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Nicholas Piggin <npiggin@gmail.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: walken@google.com
    Cc: willy@infradead.org
    Link: http://lkml.kernel.org/r/20170823152542.5150-3-boqun.feng@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
    ec81048c
completion.h 4.7 KB