"read-only file system" error message from openrc during boot


#1

I wrecked a BE, probably by hard resetting it before.
It still “works” generally but I get these strange and not really good sounding error messages from openrc during booting the BE:

openrc: unlink `libexec/rc/init.d/failed/[some service]’ : Read-only file system

This is repeated for a bunch of services, e.g. […] bridge, network, newsyslog, nisdomain, cron, ipfw, rpcbind, routing, avahi-daemon, nfsclient, devfs, cups_browsed.

Is there anything I could to to repair/fix this BE or just ditch it? I don’t really need it but it would be nice to learn how to recover from such an error in case sometime in the future I really need it. Also it would strengthen my faith in zfs/trueos (I would assume it’s a zfs issue).

Thanks!


#2

A BE is a clone of a snapshot of a previous BE.
First thing I would try
zpool status
followed by a zpool scrub this may take a while to complete, be patient. zpool status will tell you how far it’s gotten and how much longer.
scrub should fix any errors it can.
Otherwise, if you don’t need it, you can blow it way and start over.


#3

I’d remove the symlinks in “/libexec/rc/init.d/failed” (Move them elsewhere, before deleting). And reboot, immediately.And see what’s the result.

Should be done in Single User Mode, preferably.


#4

So it looks like /libexec/rc/init.d is where OpenRC puts a bunch of stuff. I didn’t know that.
Instead of Single User mode, you could use the magic of BEs:
Lets call the BE that shows this problem BE-Bad, a working one BE-Good
as root (su or sudo):
beadm activate BE-Good
reboot
when system is up, again as root:
beadm mount BE-Bad that should mount the bad BE somewhere under /tmp. You can also give a mountpoint like /mnt if you want.
Lets say it mounted at /tmp/BE-Bad. All you need to do is “cd /tmp/BE-Bad/libexec/rc/init.d/failed” and you can rm the symlinks as @bsdtester suggests. Then cd && beadm unmount BE-Bad && beadm activate BE-Bad && reboot.

beadm and mounting other BEs all ties into my comment about “BEs are clones of snapshots of previous BE”. ZFS lets you mount snapshots and clones, so you can get to data in them. Just be careful: clones you can change, snapshots you don’t want to, so think of clones a read/write, snapshots treat as read-only.


#5

Hi, mer. A question:

Do You think (like I do), that there is an error in the sequence of the boot scripts?

Symptom:
openrc tries to unlink stale symlinks before the mounting is changed to read/write mode.

You agree? Or do I miss something?


#6

What I think happened:
OP states he did a hard reset, just switched off the power.
OpenRC uses the directories in /libexec/rc/init.d to keep track of services that need starting, are started, failed, etc (this is a guess on my part based on what I see on my machines).
During boot, OpenRC is seeing the symlinks in the failed directory, goes “hhmm. I haven’t started those yet” and wants to clean up. But since the rm command needs the dataset to be read/write it fails, simply because it’s trying to early.

That’s a lot of words to say that I agree with what you have under “Symptom”.
Not sure it if means an error in sequencing of boot scripts or not. It could be that OpenRC has to be the first thing to run (from init), so it’s logical that it tries to clean up very early. Now I’d be surprised if OpenRC didn’t try to clean up when you successfully shut down the system in the BE. Of course, getting a list of the symlinks in that failed directory may lead to more interesting things.


#7

Thanks for the hints and suggestions!

I did as advised, worked as advertised!

For future reference for me and others:

  1. Boot into sane BE
  2. Mount bad BE

sudo beadm mount [bad BE]

  1. Remove entries from

/temp/[bad BE. …]/libexec/rc/init.d/failed

(I temporarily moved them to a newly created folder in /temp/[bad BE. …]/ but deleted the folder later after the operation turned out to have been successful.)

  1. Unmount bad BE

sudo beadm umount [bad BE]

  1. Boot into bad BE and hopefully don’t see those nasty error messages anymore during boot, having solved this problem. -> Bad BE now is now a good BE again, too.

Interestingly, all the entries in /libexec/rc/init.d/failed were missing (as folders) in /libexec/rc/init.d/daemons and /libexec/rc/init.d/options. Although not all have a folder in the latter even if the BE is sane. Once fixed (entries remeoved from /failed the folders reappear after successully booting the previously bad BE.

So, thank you very much, guys, for the insights and help! One more mystery solved and behaviour of bsd/TrueOS learned/“understood”.


#9

No problem. Just remember that working with BEs you should use the beadm command (beadm list, beadm mount, beadm unmount plus a few others). Using the mount command as you did was fine, I think it’s better to be consistent by using beadm.

Boot Environments are one of the best and unsung heros of ZFS. Solaris had them for a long time, but as you see it makes upgrading and recovery fairly painless. I’ve used beadm mount to mount BEs so I could compare things between a working and nonworking one. Quick easy way to fix stuff or figure out why things are broken.

/libexec/rc/init.d is pretty much owned and controlled by OpenRC processing. I have not really looked at what OpenRC code is doing, but I guess that it’s all part of the housekeeping: what’s running, what’s not, what’s failed, any special configuration.


#10

Thanks for the hint. I think I did use the beadm mount command, but I’m not sure (maybe I just used SysAdm?). I corrected my previous post accordingly and added the umount command as well.