Process hang in state "zilog->zl_writer_lock" on Unstable


#21

Sorry for long delay, not had much time to work on this, so just to summarise the tests I managed to carry out so far:

  1. corebird on FreeBSD 11.1 usning ssh X forwarding - no problem
  2. add xorg,lumina,… to the FreeBSD 11.1 install - no problems
  3. boot FreeBSD 12 snapshot kldload video and chroot into FreeBSD11.1 - no problems
  4. install FreeBSD 12 snapshot lumina,xork,corebird,dconf - lock after 40 min

The first two tests ran a few days and the chroot test has 7 days uptime.
It is interesting that the chroot has not failed.

The FreeBSD 12 snapshot install is able to get to kernel debugger, Just before I tried to umount the memory stick that contained the initial dot files to set up my home directory, it was MSDOSFS mounted RO and the umount command locked in zfs.
The backtrace of umount and corebird look like they are both stuck in some kind of lock:

_sx_xlock_hard
_sx_xlock
zil_commit_impl

dcon-service is in:

dnode_hold_impl
_xs_xlock
witness_lock

I will try to get some photos as no serial port on that machine :frowning:. I will likely just post those logs to FreeBSD bug tracker and post a link here.


#22

Ahh witness locks. That definitely sounds like upstream issue. Witness locks are debugging tools trying to help determine any lock order issues; in “release” they should be turned off but since this is 12-CURRENT, it’s turned on (maybe adding performance hit)
I glanced at some dconf code/documents a while back and I think they are basically optimized for read performance; I’m pretty sure they mmap the file. That is a perfectly legal operation, but starts to involve more interactions between the VM processing and FS (ZFS) than normal.


#23

Maybe this is fixed now by: FreeBSD-EN-18:18.zfs
this fix notes: "Processes may hang on the waitchan “zilog->zl_writer_lock”