When Optimized Out Isn't - Developer and Bug Submitter's Guide

A quick guide to debugging "optimized" code for the assembly phobic developer and bug submitter

With clang generated code one will frequently see that a variable has its "value optimized out". This happens at times with gcc as well, but it is less frequent. This is frustrating and can lead users to believe the code is "over-optimized". What I speculate is actually happening is that the register scheduler is failing to update the debug info when the value is moved to a temporary register.

This recently came up for me when I added my two cents to a mailing list discussion about a zpool import hang. A committer had helpfully identified the thread backtraces that corresponded to the problem at hand: txg_sync_thread hanging forever on a cv_wait in zio_wait. I asked for the value of a number of fields in the zio that was being waited on, but was told that it had been "optimized out".

(kgdb) thread 459
[Switching to thread 459 (Thread 101524)]#0 sched_switch (td=0xfffff80063111000, newtd=,
flags=) at /usr/home/kmacy/devel/svn/10/sys/kern/sched_ule.c:1945
1945 cpuid = PCPU_GET(cpuid);
Current language: auto; currently minimal
(kgdb) bt
#0 sched_switch (td=0xfffff80063111000, newtd=, flags=)
at /usr/home/kmacy/devel/svn/10/sys/kern/sched_ule.c:1945
#1 0xffffffff807aa199 in mi_switch (flags=260, newtd=0x0) at /usr/home/kmacy/devel/svn/10/sys/kern/kern_synch.c:494
#2 0xffffffff807e6e82 in sleepq_switch (wchan=, pri=)
at /usr/home/kmacy/devel/svn/10/sys/kern/subr_sleepqueue.c:538
#3 0xffffffff807e6ce3 in sleepq_wait (wchan=0xfffff8004ddf4a50, pri=0)
at /usr/home/kmacy/devel/svn/10/sys/kern/subr_sleepqueue.c:617
#4 0xffffffff80750d7a in _cv_wait (cvp=0xfffff8004ddf4a50, lock=0xfffff8004ddf4a30)
at /usr/home/kmacy/devel/svn/10/sys/kern/kern_condvar.c:139
#5 0xffffffff817d145b in zio_wait (zio=)
at /usr/home/kmacy/devel/svn/10/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1442
#6 0xffffffff81779d3c in dsl_pool_sync (dp=0xfffff8004d364800, txg=11733518)
at /usr/home/kmacy/devel/svn/10/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c:531
#7 0xffffffff8179d800 in spa_sync (spa=0xfffffe000372f000, txg=11733518)
at /usr/home/kmacy/devel/svn/10/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:6604
#8 0xffffffff817a7e9d in txg_sync_thread (arg=0xfffff8004d364800)
at /usr/home/kmacy/devel/svn/10/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c:518
#9 0xffffffff8076ed34 in fork_exit (callout=0xffffffff817a7c50 , arg=0xfffff8004d364800,
frame=0xfffffe012043fac0) at /usr/home/kmacy/devel/svn/10/sys/kern/kern_fork.c:996
#10 0xffffffff80b96b3e in fork_trampoline () at /usr/home/kmacy/devel/svn/10/sys/amd64/amd64/exception.S:606
#11 0x0000000000000000 in ?? ()
(kgdb) f 5
#5 0xffffffff817d145b in zio_wait (zio=)
at /usr/home/kmacy/devel/svn/10/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1442
1442 cv_wait(&zio->io_cv, &zio->io_lock);

How frustrating! zio_wait is (excluding asserts) only 7 lines of code. How can this be? Well, let's look at the assembly for zio_wait:

(kgdb) disassemble zio_wait
Dump of assembler code for function zio_wait:
0xffffffff817d13c0 : push %rbp
0xffffffff817d13c1 : mov %rsp,%rbp
0xffffffff817d13c4 : push %r15
0xffffffff817d13c6 : push %r14
0xffffffff817d13c8 : push %r12
0xffffffff817d13ca : push %rbx
0xffffffff817d13cb : mov %rdi,%r14
0xffffffff817d13ce : cmpl $0x1,0x254(%r14)
0xffffffff817d13d6 : je 0xffffffff817d13f0
0xffffffff817d13d8 : mov $0xffffffff81883de9,%rdi
0xffffffff817d13df : mov $0xffffffff81883b20,%rsi
0xffffffff817d13e6 : mov $0x599,%edx
0xffffffff817d13eb : callq 0xffffffff81a19200
0xffffffff817d13f0 : cmpq $0x0,0x2f0(%r14)
0xffffffff817d13f8 : je 0xffffffff817d1412
0xffffffff817d13fa : mov $0xffffffff8188410c,%rdi
0xffffffff817d1401 : mov $0xffffffff81883b20,%rsi
0xffffffff817d1408 : mov $0x59a,%edx
0xffffffff817d140d : callq 0xffffffff81a19200
0xffffffff817d1412 : mov %gs:0x0,%rax
0xffffffff817d141b : mov %rax,0x2f8(%r14)
0xffffffff817d1422 : mov %r14,%rdi
0xffffffff817d1425 : callq 0xffffffff817d24c0
0xffffffff817d142a : lea 0x300(%r14),%r15
0xffffffff817d1431 : xor %esi,%esi
0xffffffff817d1433 : mov $0xffffffff81883b20,%rdx
0xffffffff817d143a : mov $0x5a0,%ecx
0xffffffff817d143f : mov %r15,%rdi
0xffffffff817d1442 : callq 0xffffffff807a8270 <_sx_xlock>
0xffffffff817d1447 : lea 0x320(%r14),%rbx
0xffffffff817d144e : jmp 0xffffffff817d145b
0xffffffff817d1450 : mov %rbx,%rdi
0xffffffff817d1453 : mov %r15,%rsi
0xffffffff817d1456 : callq 0xffffffff80750ba0 <_cv_wait>
0xffffffff817d145b : cmpq $0x0,0x2f0(%r14)
0xffffffff817d1463 : jne 0xffffffff817d1450
0xffffffff817d1465 : mov $0xffffffff81883b20,%rsi
0xffffffff817d146c : mov $0x5a3,%edx
0xffffffff817d1471 : mov %r15,%rdi
0xffffffff817d1474 : callq 0xffffffff807a8630 <_sx_xunlock>
0xffffffff817d1479 : mov 0x268(%r14),%r12d
0xffffffff817d1480 : lea 0xf0(%r14),%rdi
0xffffffff817d1487 : callq 0xffffffff8172d480
0xffffffff817d148c : lea 0x110(%r14),%rdi
0xffffffff817d1493 : callq 0xffffffff8172d480
0xffffffff817d1498 : mov %r15,%rdi
0xffffffff817d149b : callq 0xffffffff807a8010
0xffffffff817d14a0 : mov %rbx,%rdi
0xffffffff817d14a3 : callq 0xffffffff80750b50
0xffffffff817d14a8 : mov 0xffffffff818af6e0,%rdi
0xffffffff817d14b0 : mov %r14,%rsi
0xffffffff817d14b3 : callq 0xffffffff81a19400
0xffffffff817d14b8 : mov %r12d,%eax
0xffffffff817d14bb : pop %rbx
0xffffffff817d14bc : pop %r12
0xffffffff817d14be : pop %r14
0xffffffff817d14c0 : pop %r15
0xffffffff817d14c2 : pop %rbp
0xffffffff817d14c3 : retq
End of assembler dump.

To the uninitiated that looks complicated. However, we don't really need to understand the code to get what we're looking for. The calling convention for platforms is well documented. In this case the calling convention is part of the System V AMD64 ABI. The first argument to a function is passed in by %rdi.

We see right after the prolog (saving the frame pointer to the stack, making the stack pointer the new frame pointer, and saving any caller save registers that we intend to use to the stack) that %rdi is moved to %r14 to preserve its value across calls.

0xffffffff817d13cb : mov %rdi,%r14

We scan down for further modifications right up to the call to cv_wait. As it turns out, there are none. So there we have it, %rdi contains the address of zio. To cut to the chase in the debug session:

(kgdb) p ((zio_t *)$r14)->io_reexecute
$42 = 2 '\002'

Which means ZIO_REEXECUTE_SUSPEND was set. Determine what it will do in zio_done:

(kgdb) p ((zio_t *)$r14)->io_flags
$43 = 0

This means it will call zio_suspend:

(kgdb) p ((zio_t *)$r14)->io_spa->spa_suspended
$44 = 1 '\001'

And yes it is suspended. If the SPA failure mode were panic (probably the right thing to do if INVARIANTS is on) we would have panicked with an "uncorrectable I/O failure":

if (spa_get_failmode(spa) == ZIO_FAILURE_MODE_PANIC)
fm_panic("Pool '%s' has encountered an uncorrectable I/O "
"failure and the failure mode property for this pool "
"is set to panic.", spa_name(spa));

instead we just reported failure, set suspend, and added ourself to the suspend root:

zfs_ereport_post(FM_EREPORT_ZFS_IO_FAILURE, spa, NULL, NULL, 0, 0);

if (spa->spa_suspend_zio_root == NULL)
spa->spa_suspend_zio_root = zio_root(spa, NULL, NULL,
ZIO_FLAG_CANFAIL | ZIO_FLAG_SPECULATIVE |
ZIO_FLAG_GODFATHER);

if (zio != NULL) {
...
zio_add_child(spa->spa_suspend_zio_root, zio);
}

Back to our debug session. Let's find it in a larger function where the caller has actually been inlined. The caller of zio_wait is dsl_pool_sync_mos which has been inlined in dsl_pool_sync:

static void
dsl_pool_sync_mos(dsl_pool_t *dp, dmu_tx_t *tx)
{
zio_t *zio = zio_root(dp->dp_spa, NULL, NULL, ZIO_FLAG_MUSTSUCCEED);
dmu_objset_sync(dp->dp_meta_objset, zio, tx);
VERIFY0(zio_wait(zio));
dprintf_bp(&dp->dp_meta_rootbp, "meta objset rootbp is %s", "");
spa_set_rootblkptr(dp->dp_spa, &dp->dp_meta_rootbp);
}

(kgdb) up
#6 0xffffffff81779d3c in dsl_pool_sync (dp=0xfffff8004d364800, txg=11733518)
at /usr/home/kmacy/devel/svn/10/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c:531
531 VERIFY0(zio_wait(zio));
(kgdb) p $rdi
$45 = 0

Hrrrm. It looks like %rdi isn't saved in a way that can be recovered from the stack. Well, we'll look at where it comes from:

(kgdb) disassemble dsl_pool_sync
Dump of assembler code for function dsl_pool_sync:

<... snip>

We look for 0xffffffff81779d3c, which is actually the instruction following the call, that was pushed on the stack as the return address by the call instruction itself. We scan back to where the zio is obtained from zio_root. %rax is the is the return value which clang saves in %rbx. We note that %rbx is the value stored in to %rdi so it must be the zio that is being waited on, by extension it must have been preserved by dmu_objset_sync.

0xffffffff81779d1d : callq 0xffffffff817d0ed0
0xffffffff81779d22 : mov %rax,%rbx
0xffffffff81779d25 : mov 0x8(%r13),%rdi
0xffffffff81779d29 : mov %rbx,%rsi
0xffffffff81779d2c : mov %r12,%rdx
0xffffffff81779d2f : callq 0xffffffff8175b450
0xffffffff81779d34 : mov %rbx,%rdi
0xffffffff81779d37 : callq 0xffffffff817d13c0
0xffffffff81779d3c : test %eax,%eax

(kgdb) p /x $rbx
$48 = 0xfffff8004ddf4730
(kgdb) down
#5 0xffffffff817d145b in zio_wait (zio=)
at /usr/home/kmacy/devel/svn/10/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1442
1442 cv_wait(&zio->io_cv, &zio->io_lock);
(kgdb) p /x $r14
$49 = 0xfffff8004ddf4730

Success, it is the zio in question.

I hope that this quick note has convinced the reader that he (or she) doesn't need to be able to understand assembly to cope with the "value optimized out" problem.