I’m not sure where to start to explain what exactly happened to end up in this bad situation and I do not want to make it worse.
In the beginning the system froze (kerneloops) but I blamed it on the graphics card why it happened, because with journalctl I had found messages with GPU lockup.
In worst case after a reboot the RAID would start re-syncing and all would be fine.
After a while the system would not boot up normally and end up in emergency mode. With the help of journalctl -b
I found out that one of the disks (sdd) belonging to the array was causing errors. First I believed it could be the SATA cable. Fiddling with the cables made the whole disk disappear so I replaced the cable with another one.
During boot the missing disk was visible again and I noticed that the status of the RAID array changed from Degraded to Rebuild.
Unfortunately, after grub Fedora boots after a while into emergency mode.
sda, sdc and sdc are the disks of the RAID array (md127). However, lsblk
shows that sdd is not part of the array (anymore).
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 931.5G 0 disk
└─md127 9:127 0 0B 0 md
sdb 8:16 0 3.6T 0 disk /data/bulk
sdc 8:32 0 931.5G 0 disk
└─md127 9:127 0 0B 0 md
sdd 8:48 0 931.5G 0 disk
sde 8:64 0 119.2G 0 disk
├─sde1 8:65 0 1G 0 part /boot
└─sde2 8:66 0 118.2G 0 part
├─fedora_fedaic-root 253:0 0 112.3G 0 lvm /
└─fedora_fedaic-swap 253:1 0 5.9G 0 lvm [SWAP]
sdf 8:80 1 14.5G 0 disk
├─sdf1 8:81 1 14.4G 0 part
└─sdf2 8:82 1 32M 0 part
sr0 11:0 1 1024M 0 rom
cat /proc/mdstat
shows that it is inactive and sdd is not part of the RAID array
Personalities :
md127 : inactive sda[1](S) sdc[0](S)
5552 blocks super external:imsm
unused devices: <none>
And with mdadm --examine /dev/sd[acd]
it is possible to see that sdd was member of the RAID array.
/dev/sda:
Magic : Intel Raid ISM Cfg Sig.
Version : 1.2.02
Orig Family : d6ba148c
Family : d6ba148c
Generation : 002db736
Creation Time : Unknown
Attributes : All supported
UUID : fcaaa905:3813afd6:892f86ab:83424a41
Checksum : 89e55567 correct
MPB Sectors : 2
Disks : 3
RAID Devices : 1
Disk00 Serial : WD-WCC3F3525846
State : active
Id : 00000000
Usable Size : 1953519616 (931.51 GiB 1000.20 GB)
[RAID5_2TB]:
Subarray : 0
UUID : 5aca63f5:865b38a5:fec0f38e:a41e95cb
RAID Level : 5 <-- 5
Members : 3 <-- 3
Slots : [UUU] <-- [_UU]
Failed disk : 0
This Slot : 0 (out-of-sync)
Sector Size : 512
Array Size : 3907039232 (1863.02 GiB 2000.40 GB)
Per Dev Size : 1953519880 (931.51 GiB 1000.20 GB)
Sector Offset : 0
Num Stripes : 15261872
Chunk Size : 64 KiB <-- 64 KiB
Reserved : 0
Migrate State : rebuild
Map State : normal <-- degraded
Checkpoint : 0 (128)
Dirty State : clean
RWH Policy : off
Volume ID : 0
Disk01 Serial : WD-WMC1U6533085
State : active
Id : 00020000
Usable Size : 1953519616 (931.51 GiB 1000.20 GB)
Disk02 Serial : WD-WMC1U5023513
State : active
Id : 00030000
Usable Size : 1953519616 (931.51 GiB 1000.20 GB)
/dev/sdc:
Magic : Intel Raid ISM Cfg Sig.
Version : 1.2.02
Orig Family : d6ba148c
Family : d6ba148c
Generation : 002db736
Creation Time : Unknown
Attributes : All supported
UUID : fcaaa905:3813afd6:892f86ab:83424a41
Checksum : 89e55567 correct
MPB Sectors : 2
Disks : 3
RAID Devices : 1
Disk01 Serial : WD-WMC1U6533085
State : active
Id : 00020000
Usable Size : 1953519616 (931.51 GiB 1000.20 GB)
[RAID5_2TB]:
Subarray : 0
UUID : 5aca63f5:865b38a5:fec0f38e:a41e95cb
RAID Level : 5 <-- 5
Members : 3 <-- 3
Slots : [UUU] <-- [_UU]
Failed disk : 0
This Slot : 1
Sector Size : 512
Array Size : 3907039232 (1863.02 GiB 2000.40 GB)
Per Dev Size : 1953519880 (931.51 GiB 1000.20 GB)
Sector Offset : 0
Num Stripes : 15261872
Chunk Size : 64 KiB <-- 64 KiB
Reserved : 0
Migrate State : rebuild
Map State : normal <-- degraded
Checkpoint : 0 (128)
Dirty State : clean
RWH Policy : off
Volume ID : 0
Disk00 Serial : WD-WCC3F3525846
State : active
Id : 00000000
Usable Size : 1953519616 (931.51 GiB 1000.20 GB)
Disk02 Serial : WD-WMC1U5023513
State : active
Id : 00030000
Usable Size : 1953519616 (931.51 GiB 1000.20 GB)
/dev/sdd:
Magic : Intel Raid ISM Cfg Sig.
Version : 1.2.02
Orig Family : d6ba148c
Family : d6ba148c
Generation : 002db736
Creation Time : Unknown
Attributes : All supported
UUID : fcaaa905:3813afd6:892f86ab:83424a41
Checksum : 89e55567 correct
MPB Sectors : 2
Disks : 3
RAID Devices : 1
Disk02 Serial : WD-WMC1U5023513
State : active
Id : 00030000
Usable Size : 1953519616 (931.51 GiB 1000.20 GB)
[RAID5_2TB]:
Subarray : 0
UUID : 5aca63f5:865b38a5:fec0f38e:a41e95cb
RAID Level : 5 <-- 5
Members : 3 <-- 3
Slots : [UUU] <-- [_UU]
Failed disk : 0
This Slot : 2
Sector Size : 512
Array Size : 3907039232 (1863.02 GiB 2000.40 GB)
Per Dev Size : 1953519880 (931.51 GiB 1000.20 GB)
Sector Offset : 0
Num Stripes : 15261872
Chunk Size : 64 KiB <-- 64 KiB
Reserved : 0
Migrate State : rebuild
Map State : normal <-- degraded
Checkpoint : 0 (128)
Dirty State : clean
RWH Policy : off
Volume ID : 0
Disk00 Serial : WD-WCC3F3525846
State : active
Id : 00000000
Usable Size : 1953519616 (931.51 GiB 1000.20 GB)
Disk01 Serial : WD-WMC1U6533085
State : active
Id : 00020000
Usable Size : 1953519616 (931.51 GiB 1000.20 GB)
What I don’t understand when I examine messages from earlier boots with e.g. journalctl -b -12 | egrep md[0-9]
I’ve found lines with md126. What’s the difference between md126 and md127 ?
kernel: md/raid:md126: not clean -- starting background reconstruction
kernel: md/raid:md126: device sda operational as raid disk 0
kernel: md/raid:md126: device sdc operational as raid disk 1
kernel: md/raid:md126: device sdd operational as raid disk 2
[...]
systemd[1]: Started mdmon@md127.service - MD Metadata Monitor on /dev/md127
[...]
mdadm[1141]: RebuildFinished event detected on md device /dev/md126
If I do journalctl -b -11 | egrep md[0-9]
shows nothing. If I examine it without grep, then I find for example this.
I’ve found this Linux Raid Wiki, but when I read it I do not understand the risks if would try something. As I said at the beginning I don’t want to go from a bad situation to a worse situation.
What is best course of action to fix the RAID array without losing (too many) data?
I have an identical empty/spare disk if that helps and if more info is needed please let me know.