A quick guide to debugging "optimized" code for the assembly phobic developer and bug submitter
With clang generated code one will frequently see that a variable has its "value optimized out". This happens at times with gcc as well, but it is less frequent. This is frustrating and can lead users to believe the code is "over-optimized". What I speculate is actually happening is that the register scheduler is failing to update the debug info when the value is moved to a temporary register.
This recently came up for me when I added my two cents to a mailing list discussion about a zpool import hang. A committer had helpfully identified the thread backtraces that corresponded to the problem at hand: txg_sync_thread hanging forever on a cv_wait in zio_wait. I asked for the value of a number of fields in the zio that was being waited on, but was told that it had been "optimized out".
(kgdb) thread 459
[Switching to thread 459 (Thread 101524)]#0 sched_switch (td=0xfffff80063111000, newtd=
flags=
1945 cpuid = PCPU_GET(cpuid);
Current language: auto; currently minimal
(kgdb) bt
#0 sched_switch (td=0xfffff80063111000, newtd=
at /usr/home/kmacy/devel/svn/10/sys/kern/sched_ule.c:1945
#1 0xffffffff807aa199 in mi_switch (flags=260, newtd=0x0) at /usr/home/kmacy/devel/svn/10/sys/kern/kern_synch.c:494
#2 0xffffffff807e6e82 in sleepq_switch (wchan=
at /usr/home/kmacy/devel/svn/10/sys/kern/subr_sleepqueue.c:538
#3 0xffffffff807e6ce3 in sleepq_wait (wchan=0xfffff8004ddf4a50, pri=0)
at /usr/home/kmacy/devel/svn/10/sys/kern/subr_sleepqueue.c:617
#4 0xffffffff80750d7a in _cv_wait (cvp=0xfffff8004ddf4a50, lock=0xfffff8004ddf4a30)
at /usr/home/kmacy/devel/svn/10/sys/kern/kern_condvar.c:139
#5 0xffffffff817d145b in zio_wait (zio=
at /usr/home/kmacy/devel/svn/10/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1442
#6 0xffffffff81779d3c in dsl_pool_sync (dp=0xfffff8004d364800, txg=11733518)
at /usr/home/kmacy/devel/svn/10/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c:531
#7 0xffffffff8179d800 in spa_sync (spa=0xfffffe000372f000, txg=11733518)
at /usr/home/kmacy/devel/svn/10/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:6604
#8 0xffffffff817a7e9d in txg_sync_thread (arg=0xfffff8004d364800)
at /usr/home/kmacy/devel/svn/10/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c:518
#9 0xffffffff8076ed34 in fork_exit (callout=0xffffffff817a7c50
frame=0xfffffe012043fac0) at /usr/home/kmacy/devel/svn/10/sys/kern/kern_fork.c:996
#10 0xffffffff80b96b3e in fork_trampoline () at /usr/home/kmacy/devel/svn/10/sys/amd64/amd64/exception.S:606
#11 0x0000000000000000 in ?? ()
(kgdb) f 5
#5 0xffffffff817d145b in zio_wait (zio=
at /usr/home/kmacy/devel/svn/10/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1442
1442 cv_wait(&zio->io_cv, &zio->io_lock);
How frustrating! zio_wait is (excluding asserts) only 7 lines of code. How can this be? Well, let's look at the assembly for zio_wait:
(kgdb) disassemble zio_wait
Dump of assembler code for function zio_wait:
0xffffffff817d13c0
0xffffffff817d13c1
0xffffffff817d13c4
0xffffffff817d13c6
0xffffffff817d13c8
0xffffffff817d13ca
0xffffffff817d13cb
0xffffffff817d13ce
0xffffffff817d13d6
0xffffffff817d13d8
0xffffffff817d13df
0xffffffff817d13e6
0xffffffff817d13eb
0xffffffff817d13f0
0xffffffff817d13f8
0xffffffff817d13fa
0xffffffff817d1401
0xffffffff817d1408
0xffffffff817d140d
0xffffffff817d1412
0xffffffff817d141b
0xffffffff817d1422
0xffffffff817d1425
0xffffffff817d142a
0xffffffff817d1431
0xffffffff817d1433
0xffffffff817d143a
0xffffffff817d143f
0xffffffff817d1442
0xffffffff817d1447
0xffffffff817d144e
0xffffffff817d1450
0xffffffff817d1453
0xffffffff817d1456
0xffffffff817d145b
0xffffffff817d1463
0xffffffff817d1465
0xffffffff817d146c
0xffffffff817d1471
0xffffffff817d1474
0xffffffff817d1479
0xffffffff817d1480
0xffffffff817d1487
0xffffffff817d148c
0xffffffff817d1493
0xffffffff817d1498
0xffffffff817d149b
0xffffffff817d14a0
0xffffffff817d14a3
0xffffffff817d14a8
0xffffffff817d14b0
0xffffffff817d14b3
0xffffffff817d14b8
0xffffffff817d14bb
0xffffffff817d14bc
0xffffffff817d14be
0xffffffff817d14c0
0xffffffff817d14c2
0xffffffff817d14c3
End of assembler dump.
To the uninitiated that looks complicated. However, we don't really need to understand the code to get what we're looking for. The calling convention for platforms is well documented. In this case the calling convention is part of the System V AMD64 ABI. The first argument to a function is passed in by %rdi.
We see right after the prolog (saving the frame pointer to the stack, making the stack pointer the new frame pointer, and saving any caller save registers that we intend to use to the stack) that %rdi is moved to %r14 to preserve its value across calls.
0xffffffff817d13cb
We scan down for further modifications right up to the call to cv_wait. As it turns out, there are none. So there we have it, %rdi contains the address of zio. To cut to the chase in the debug session:
(kgdb) p ((zio_t *)$r14)->io_reexecute
$42 = 2 '\002'
Which means ZIO_REEXECUTE_SUSPEND was set. Determine what it will do in zio_done:
(kgdb) p ((zio_t *)$r14)->io_flags
$43 = 0
This means it will call zio_suspend:
(kgdb) p ((zio_t *)$r14)->io_spa->spa_suspended
$44 = 1 '\001'
And yes it is suspended. If the SPA failure mode were panic (probably the right thing to do if INVARIANTS is on) we would have panicked with an "uncorrectable I/O failure":
if (spa_get_failmode(spa) == ZIO_FAILURE_MODE_PANIC)
fm_panic("Pool '%s' has encountered an uncorrectable I/O "
"failure and the failure mode property for this pool "
"is set to panic.", spa_name(spa));
instead we just reported failure, set suspend, and added ourself to the suspend root:
zfs_ereport_post(FM_EREPORT_ZFS_IO_FAILURE, spa, NULL, NULL, 0, 0);
if (spa->spa_suspend_zio_root == NULL)
spa->spa_suspend_zio_root = zio_root(spa, NULL, NULL,
ZIO_FLAG_CANFAIL | ZIO_FLAG_SPECULATIVE |
ZIO_FLAG_GODFATHER);
if (zio != NULL) {
...
zio_add_child(spa->spa_suspend_zio_root, zio);
}
Back to our debug session. Let's find it in a larger function where the caller has actually been inlined. The caller of zio_wait is dsl_pool_sync_mos which has been inlined in dsl_pool_sync:
static void
dsl_pool_sync_mos(dsl_pool_t *dp, dmu_tx_t *tx)
{
zio_t *zio = zio_root(dp->dp_spa, NULL, NULL, ZIO_FLAG_MUSTSUCCEED);
dmu_objset_sync(dp->dp_meta_objset, zio, tx);
VERIFY0(zio_wait(zio));
dprintf_bp(&dp->dp_meta_rootbp, "meta objset rootbp is %s", "");
spa_set_rootblkptr(dp->dp_spa, &dp->dp_meta_rootbp);
}
(kgdb) up
#6 0xffffffff81779d3c in dsl_pool_sync (dp=0xfffff8004d364800, txg=11733518)
at /usr/home/kmacy/devel/svn/10/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c:531
531 VERIFY0(zio_wait(zio));
(kgdb) p $rdi
$45 = 0
Hrrrm. It looks like %rdi isn't saved in a way that can be recovered from the stack. Well, we'll look at where it comes from:
(kgdb) disassemble dsl_pool_sync
Dump of assembler code for function dsl_pool_sync:
<... snip>
We look for 0xffffffff81779d3c, which is actually the instruction following the call, that was pushed on the stack as the return address by the call instruction itself. We scan back to where the zio is obtained from zio_root. %rax is the is the return value which clang saves in %rbx. We note that %rbx is the value stored in to %rdi so it must be the zio that is being waited on, by extension it must have been preserved by dmu_objset_sync.
0xffffffff81779d1d
0xffffffff81779d22
0xffffffff81779d25
0xffffffff81779d29
0xffffffff81779d2c
0xffffffff81779d2f
0xffffffff81779d34
0xffffffff81779d37
0xffffffff81779d3c
(kgdb) p /x $rbx
$48 = 0xfffff8004ddf4730
(kgdb) down
#5 0xffffffff817d145b in zio_wait (zio=
at /usr/home/kmacy/devel/svn/10/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1442
1442 cv_wait(&zio->io_cv, &zio->io_lock);
(kgdb) p /x $r14
$49 = 0xfffff8004ddf4730
Success, it is the zio in question.
I hope that this quick note has convinced the reader that he (or she) doesn't need to be able to understand assembly to cope with the "value optimized out" problem.