Page 1 of 1

Why does one of my drives boot but not the other? (Grub efi)

Posted: 2024-11-09 20:11
by looplin
I am having some trouble getting my boot partition one of my drives to work.

My drive setup is two NVMe SSDs setup identically with one of each:
- GPT
- 949MB boot partition
- 930GB data partition

Each data partition has an identical LUKS2 volume (same password).
Each LUKS2 volume has a BTRFS filesystem inside, that is set as a RAID 1 between the two (decrypted) LUKS volumes.

Because I wanted to RAID1 these two disks but couldn't RAID1 the /boot partitions, I am manually ensuring the disk I rarely boot from has a working /boot partition and grub installed to it.

Currently, my first boot drive is severely broken, and I am relying on my backup grub to boot (lucky me for having the second grub!)
Secureboot is enabled, and fully functional on my second disk. (Using Debians system CA in the motherboard)

So now I am attempting to track down why one /boot partition and grub works, but the other does not. I have most certainly broken something on the first drive but I cannot seem to find what is broken (even after multiple grub updates, reinstalls).

Just to note, nvme1n1 is the working drive that I currently boot from.

Code: Select all

fdisk -l
Disk /dev/nvme0n1: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: WD_BLACK SN770M 1TB                     
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 65AD34DD-A930-4C85-9489-A02D288C270D

Device           Start        End    Sectors   Size Type
/dev/nvme0n1p1    2048    1945599    1943552   949M Linux filesystem
/dev/nvme0n1p2 1945600 1953523711 1951578112 930.6G Linux filesystem


Disk /dev/nvme1n1: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: WD_BLACK SN770 1TB                      
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 94237C4B-87F2-4795-9F3C-296FB8278E72

Device           Start        End    Sectors   Size Type
/dev/nvme1n1p1    2048    1945599    1943552   949M Linux filesystem
/dev/nvme1n1p2 1945600 1953523711 1951578112 930.6G Linux filesystem

lsblk
NAME            MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
nvme0n1         259:0    0 931.5G  0 disk  
├─nvme0n1p1     259:1    0   949M  0 part  /boot
└─nvme0n1p2     259:2    0 930.6G  0 part  
  └─crypt_nvme0 254:0    0 930.6G  0 crypt /
nvme1n1         259:3    0 931.5G  0 disk  
├─nvme1n1p1     259:4    0   949M  0 part  /backupboot/efi
└─nvme1n1p2     259:5    0 930.6G  0 part  
  └─crypt_nvme1 254:1    0 930.6G  0 crypt

blkid
/dev/nvme0n1p1: UUID="A490-28B5" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="primary" PARTUUID="5fb2891d-33f7-48a0-b021-efdc358653b3"
/dev/nvme0n1p2: UUID="63780057-f91e-426d-9be3-84383fd9b534" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="b6d69209-d214-4256-9192-a6f772e490c4"
/dev/nvme1n1p2: UUID="c434a425-6552-4036-ae34-4f8c1c728d9a" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="5e37cdf0-8828-49e4-9e16-4583be15b4c2"
/dev/nvme1n1p1: UUID="D2D5-D83C" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="primary" PARTUUID="13286ee6-2fd0-4b1f-b8b0-953f504fcbd9"

My attempt at a fresh install to the first drives /boot (I had to specify efi dir because otherwise grub won't continue because /boot/grub would be inside the LUKS volume - don't ask me how the second drive manages to work mounted in the efi subdir...)

Code: Select all

rm -rf /boot/*
mkdir /boot/efi
mkdir /boot/grub
cat /etc/default/grub
GRUB_DEFAULT=0
GRUB_TIMEOUT=2
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet"
GRUB_CMDLINE_LINUX=""

update-grub
Generating grub configuration file ...
Warning: os-prober will not be executed to detect other bootable partitions.
Systems on them will not be added to the GRUB boot configuration.
Check GRUB_DISABLE_OS_PROBER documentation entry.
Adding boot menu entry for UEFI Firmware Settings ...
done

grub-install --efi-directory=/boot/efi --uefi-secure-boot /dev/nvme0
Installing for x86_64-efi platform.
Installation finished. No error reported.

apt install --reinstall linux-image-6.1.0-27-amd64

find /boot
/boot/
/boot/efi
/boot/efi/EFI
/boot/efi/EFI/debian
/boot/efi/EFI/debian/grubx64.efi
/boot/efi/EFI/debian/grub.cfg
/boot/grub
/boot/grub/x86_64-efi
....
/boot/grub/grub.cfg
/boot/grub/locale
....
/boot/grub/fonts
/boot/grub/fonts/unicode.pf2
/boot/grub/grubenv
/boot/grub/.background_cache.png
/boot/config-6.1.0-27-amd64
/boot/vmlinuz-6.1.0-27-amd64
/boot/System.map-6.1.0-27-amd64
/boot/initrd.img-6.1.0-27-amd64

The working /boot directory (/backupboot) looks like this:

Code: Select all

find /backupboot/
/backupboot/
/backupboot/efi
/backupboot/efi/EFI
/backupboot/efi/EFI/grub2
/backupboot/efi/EFI/grub2/shimx64.efi
/backupboot/efi/EFI/grub2/grubx64.efi
/backupboot/efi/EFI/grub2/mmx64.efi
/backupboot/efi/EFI/grub2/fbx64.efi
/backupboot/efi/EFI/grub2/BOOTX64.CSV
/backupboot/efi/EFI/grub2/grub.cfg
/backupboot/efi/EFI/grub2b
/backupboot/efi/EFI/grub2b/shimx64.efi
/backupboot/efi/EFI/grub2b/grubx64.efi
/backupboot/efi/EFI/grub2b/mmx64.efi
/backupboot/efi/EFI/grub2b/fbx64.efi
/backupboot/efi/EFI/grub2b/BOOTX64.CSV
/backupboot/efi/EFI/grub2b/grub.cfg
/backupboot/efi/System.map-6.1.0-26-amd64
/backupboot/efi/config-6.1.0-26-amd64
/backupboot/efi/grub
/backupboot/efi/grub/x86_64-efi
....
/backupboot/efi/grub/locale
....
/backupboot/efi/grub/fonts
/backupboot/efi/grub/fonts/unicode.pf2
/backupboot/efi/grub/grubenv
/backupboot/efi/grub/grub.cfg
/backupboot/efi/grub/unicode.pf2
/backupboot/efi/initrd.img-6.1.0-26-amd64
/backupboot/efi/vmlinuz-6.1.0-26-amd64
/backupboot/grub
/backupboot/grub/grub.cfg
The two grub.cfgs:

Code: Select all

cat /backupboot/efi/EFI/grub2b/grub.cfg
search.fs_uuid D2D5-D83C root 
set prefix=($root)'/grub'
configfile $prefix/grub.cfg

at /boot/efi/EFI/debian/grub.cfg 
search.fs_uuid A490-28B5 root 
set prefix=($root)'/grub'
configfile $prefix/grub.cfg
After redo'ing the /boot install,

Code: Select all

efibootmgr
BootCurrent: 0005
Timeout: 0 seconds
BootOrder: 0003,0000,0005,2001,2002,2003
Boot0000* grub2
Boot0001* Windows Boot Manager
Boot0002* EFI PXE 0 for IPv4 (0C-37-96-80-1C-81)
Boot0003* debian
Boot0004* Windows Boot Manager
Boot0005* grub2b
Boot2001* EFI USB Device
Boot2002* EFI DVD/CDROM
Boot2003* EFI Network
There's entries, but when I reboot, the UEFI does not have an option for debian (or even grub2).
So I have to use my backup grub2b option to get back in.

Once Im back up, I check efibootmgr again, and the options have disappeared.

Code: Select all

after reboot
efibootmgr
BootCurrent: 0005
Timeout: 0 seconds
BootOrder: 0005,2001,2002,2003
Boot0001* Windows Boot Manager
Boot0002* EFI PXE 0 for IPv4 (0C-37-96-80-1C-81)
Boot0004* Windows Boot Manager
Boot0005* grub2b
Boot2001* EFI USB Device
Boot2002* EFI DVD/CDROM
Boot2003* EFI Network
How can I fix the first drive boot?
I realize my second boot drive might not be correct either, but until I can successfully boot with the first drive, I won't touch the second drive (its my only way into to my pc)

Re: Why does one of my drives boot but not the other? (Grub efi)

Posted: 2024-11-09 22:07
by Aki
Hello,
looplin wrote: 2024-11-09 20:11 Currently, my first boot drive is severely broken, and I am relying on my backup grub to boot (lucky me for having the second grub!) […] I am attempting to track down why one /boot partition and grub works, but the other does not.
You told it: your drive is severely broken.

Re: Why does one of my drives boot but not the other? (Grub efi)

Posted: 2024-11-09 23:28
by looplin
Aki wrote: 2024-11-09 22:07 Hello,
looplin wrote: 2024-11-09 20:11 Currently, my first boot drive is severely broken, and I am relying on my backup grub to boot (lucky me for having the second grub!) […] I am attempting to track down why one /boot partition and grub works, but the other does not.
You told it: your drive is severely broken.
True, however it is not a hardware issue. Putting aside the fact thats it's a brand new drive, there was one combination of commands (I forget which specifically...) did successfully get the UEFI to show the first drives grub and load it.
But when it attempted to load grub, the UEFI just said "Grub2: failed to boot" and went to the second drive.
Rather than me throwing darts in dark, I hope someone more knowledgable about this can assist.

Re: Why does one of my drives boot but not the other? (Grub efi)

Posted: 2024-11-10 08:16
by Aki
You can use the https://packages.debian.org/bookworm/boot-info-script to collect more information

Re: Why does one of my drives boot but not the other? (Grub efi)

Posted: 2024-11-10 17:31
by looplin
It certainly shows an issue exists, with all the blank sections.

Code: Select all

                 Boot Info Script 0.78      [09 October 2019]


============================= Boot Info Summary: ===============================



============================ Drive/Partition Info: =============================

no valid partition table found
"blkid" output: ________________________________________________________________

Device           UUID                                   TYPE       LABEL

/dev/mapper/crypt_nvme0 15767954-1ec3-44aa-b1e3-b890ca937277   btrfs      
/dev/mapper/crypt_nvme1 15767954-1ec3-44aa-b1e3-b890ca937277   btrfs      
/dev/nvme0n1p1   A490-28B5                              vfat       
/dev/nvme0n1p2   63780057-f91e-426d-9be3-84383fd9b534   crypto_LUKS 
/dev/nvme1n1p1   D2D5-D83C                              vfat       
/dev/nvme1n1p2   c434a425-6552-4036-ae34-4f8c1c728d9a   crypto_LUKS 

========================= "ls -l /dev/disk/by-id" output: ======================

total 0
lrwxrwxrwx 1 root root 10 Nov 10 09:16 dm-name-crypt_nvme0 -> ../../dm-0
lrwxrwxrwx 1 root root 10 Nov 10 09:16 dm-name-crypt_nvme1 -> ../../dm-1
lrwxrwxrwx 1 root root 10 Nov 10 09:16 dm-uuid-CRYPT-LUKS2-63780057f91e426d9be384383fd9b534-crypt_nvme0 -> ../../dm-0
lrwxrwxrwx 1 root root 10 Nov 10 09:16 dm-uuid-CRYPT-LUKS2-c434a42565524036ae344f8c1c728d9a-crypt_nvme1 -> ../../dm-1
lrwxrwxrwx 1 root root 13 Nov 10 09:16 nvme-eui.e8238fa6bf530001001b448b4704c020 -> ../../nvme0n1
lrwxrwxrwx 1 root root 15 Nov 10 09:16 nvme-eui.e8238fa6bf530001001b448b4704c020-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Nov 10 09:16 nvme-eui.e8238fa6bf530001001b448b4704c020-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 13 Nov 10 09:16 nvme-eui.e8238fa6bf530001001b448b47df8f8f -> ../../nvme1n1
lrwxrwxrwx 1 root root 15 Nov 10 09:16 nvme-eui.e8238fa6bf530001001b448b47df8f8f-part1 -> ../../nvme1n1p1
lrwxrwxrwx 1 root root 15 Nov 10 09:16 nvme-eui.e8238fa6bf530001001b448b47df8f8f-part2 -> ../../nvme1n1p2
lrwxrwxrwx 1 root root 13 Nov 10 09:16 nvme-WD_BLACK_SN770_1TB_241623806343 -> ../../nvme1n1
lrwxrwxrwx 1 root root 13 Nov 10 09:16 nvme-WD_BLACK_SN770_1TB_241623806343_1 -> ../../nvme1n1
lrwxrwxrwx 1 root root 15 Nov 10 09:16 nvme-WD_BLACK_SN770_1TB_241623806343_1-part1 -> ../../nvme1n1p1
lrwxrwxrwx 1 root root 15 Nov 10 09:16 nvme-WD_BLACK_SN770_1TB_241623806343_1-part2 -> ../../nvme1n1p2
lrwxrwxrwx 1 root root 15 Nov 10 09:16 nvme-WD_BLACK_SN770_1TB_241623806343-part1 -> ../../nvme1n1p1
lrwxrwxrwx 1 root root 15 Nov 10 09:16 nvme-WD_BLACK_SN770_1TB_241623806343-part2 -> ../../nvme1n1p2
lrwxrwxrwx 1 root root 13 Nov 10 09:16 nvme-WD_BLACK_SN770M_1TB_241357800325 -> ../../nvme0n1
lrwxrwxrwx 1 root root 13 Nov 10 09:16 nvme-WD_BLACK_SN770M_1TB_241357800325_1 -> ../../nvme0n1
lrwxrwxrwx 1 root root 15 Nov 10 09:16 nvme-WD_BLACK_SN770M_1TB_241357800325_1-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Nov 10 09:16 nvme-WD_BLACK_SN770M_1TB_241357800325_1-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 15 Nov 10 09:16 nvme-WD_BLACK_SN770M_1TB_241357800325-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Nov 10 09:16 nvme-WD_BLACK_SN770M_1TB_241357800325-part2 -> ../../nvme0n1p2

========================= "ls -R /dev/mapper/" output: =========================

/dev/mapper:
control
crypt_nvme0
crypt_nvme1

================================ Mount points: =================================

Device           Mount_Point              Type       Options

/dev/mapper/crypt_nvme0 /archive                 btrfs      (rw,noatime,compress=zstd:3,ssd,space_cache=v2,commit=120,subvolid=258,subvol=/@archive)
/dev/mapper/crypt_nvme0 /home                    btrfs      (rw,noatime,compress=zstd:3,ssd,space_cache=v2,commit=120,subvolid=257,subvol=/@home)
/dev/mapper/crypt_nvme0 /                        btrfs      (rw,noatime,compress=zstd:3,ssd,space_cache=v2,commit=120,subvolid=256,subvol=/@)
/dev/nvme0n1p1   /boot                    vfat       (rw,nosuid,nodev,noexec,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro)
/dev/nvme1n1p1   /backupboot/efi          vfat       (rw,nosuid,nodev,noexec,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro)


=============================== StdErr Messages: ===============================

Interestingly, I'm not sure what the "no valid partition table found" error is about, if I list with parted, there clearly are partition tables as the two disks are GPT.

Code: Select all

Model: Linux device-mapper (crypt) (dm)
Disk /dev/mapper/crypt_nvme1: 999GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags: 

Number  Start  End    Size   File system  Flags
 1      0.00B  999GB  999GB  btrfs


Model: Linux device-mapper (crypt) (dm)
Disk /dev/mapper/crypt_nvme0: 999GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags: 

Number  Start  End    Size   File system  Flags
 1      0.00B  999GB  999GB  btrfs


Model: WD_BLACK SN770M 1TB (nvme)
Disk /dev/nvme0n1: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size   File system  Name     Flags
 1      1049kB  996MB   995MB  fat32        primary
 2      996MB   1000GB  999GB               primary


Model: WD_BLACK SN770 1TB (nvme)
Disk /dev/nvme1n1: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size   File system  Name     Flags
 1      1049kB  996MB   995MB  fat32        primary
 2      996MB   1000GB  999GB               primary


I thought that maybe the bootloader disappearing from efibootmgr output on reboot may be interfering with the script output, so I ran `grub-install --efi-directory=/boot/efi --uefi-secure-boot /dev/nvme0` again, and reran bootinfoscript.
But, the output did not change at all, the "Boot Info Summary" is still blank.