i’ve instaled opensuse tumbleweed a bunch of times in the last few years, but i always used ext4 instead of btrfs because of previous bad experiences with it nearly a decade ago. every time, with no exceptions, the partition would crap itself into an irrecoverable state

this time around i figured that, since so many years had passed since i last tried btrfs, the filesystem would be in a more reliable state, so i decided to try it again on a new opensuse installation. already, right after installation, os-prober failed to setup opensuse’s entry in grub, but maybe that’s on me, since my main system is debian (turns out the problem was due to btrfs snapshots)

anyway, after a little more than a week, the partition turned read-only in the middle of a large compilation and then, after i rebooted, the partition died and was irrecoverable. could be due to some bad block or read failure from the hdd (it is supposedly brand new, but i guess it could be busted), but shit like this never happens to me on extfs, even if the hdd is literally dying. also, i have an ext4 and an ufs partition in the same hdd without any issues.

even if we suppose this is the hardware’s fault and not btrfs’s, should a file system be a little bit more resilient than that? at this rate, i feel like a cosmic ray could set off a btrfs corruption. i hear people claim all the time how mature btrfs is and that it no longer makes sense to create new ext4 partitions, but either i’m extremely unlucky with btrfs or the system is in fucking perpetual beta state and it will never change because it is just good enough for companies who can just, in the case of a partition failure, can just quickly switch the old hdd for a new one and copy the nightly backup over to it

in any case, i am never going to touch btrfs ever again and i’m always going to advise people to choose ext4 instead of btrfs

  • Atemu@lemmy.ml
    link
    fedilink
    arrow-up
    25
    ·
    1 month ago

    could be due to some bad block or read failure from the hdd (it is supposedly brand new, but i guess it could be busted)

    I’d suspect the controller or cable first.

    shit like this never happens to me on extfs, even if the hdd is literally dying

    You say that as if it’s a good thing. If you HDD is “literally dying”, you want the filesystem to fail safe to make you (and applications) aware and not continue as if nothing happened. extfs doesn’t fail here because it cannot even detect that something is wrong.

    btrfs has its own share of bugs but, in theory, this is actually a feature.

    i have an ext4 and an ufs partition in the same hdd without any issues.

    Not any issue that you know of. For all extfs (and, by extension, you) knows, the disk/cable/controller/whatever could have mangled your most precious files and it would be none the wiser; happily passing mangled data to applications.

    You have backups of course (right?), so that’s not an issue you might say but if the filesystem isn’t integer, that can permeate to your backups because the backup tool reading those files is none the wiser too; it relies on the filesystem to return the correct data. If you don’t manually verify each and every file on a higher level (e.g. manual inspection or hashing) and prune old backups, this has potential for actual data loss.

    If your hardware isn’t handling the storage of data as it should, you want to know.

    even if we suppose this is the hardware’s fault and not btrfs’s, should a file system be a little bit more resilient than that? at this rate, i feel like a cosmic ray could set off a btrfs corruption.

    While the behaviour upon encountering an issue is in theory correct, btrfs is quite fragile. Hardware issues shouldn’t happen but when they happen, you’re quite doomed because btrfs doesn’t have the option to continue despite the integrity of a part of it being compromised.
    btrfs-restore disables btrfs’ integrity; emulating extfs’s failure mode but it’s only for extracting files from the raw disks, not for continuing to use it as a filesystem.

    I don’t know enough about btrfs to know whether this is feasible but perhaps it could be made a bit more log-structured such that old data is overwritten first which would allow you to simply roll back the filesystem state to a wide range of previous generations, of which some are hopefully not corrupted. You’d then discard the newer generations which would allow you to keep using the filesystem.
    You’d risk losing data that was written since that generation of course but that’s often a much lesser evil. This isn’t applicable to all kinds of corruption because older generations can become corrupted retroactively of course but at least a good amount of them I suspect.

    • Pup Biru@aussie.zone
      cake
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 month ago

      I don’t know enough about btrfs to know whether this is feasible but perhaps it could be made a bit more log-structured such that old data is overwritten first which would allow you to simply roll back the filesystem state to a wide range of previous generations, of which some are hopefully not corrupted. You’d then discard the newer generations which would allow you to keep using the filesystem.

      i’m not sure i understand quite what you’re suggesting, but BTRFD is a copy on write filesystem

      so when you write a block, you’re not writing over the old data: you’re writing to empty space, and then BTRFS is marking the old space as unused - or in the case of snapshots, marking it to be kept as old data

      • Atemu@lemmy.ml
        link
        fedilink
        arrow-up
        3
        ·
        edit-2
        1 month ago

        I am well aware of how CoW works. What I wrote does not stand in conflict with it.

        Perhaps I wasn’t clear enough in what I said though:

        Each metadata operation (“commit” I think it’s called) has a generation number; it first builds this generation (efficiently in a non-damaging way via CoW) and then atomically switches to it. The next generation is built with an incremented generation number and atomically switched again.
        That’s my understanding of how btrfs generally operates.

        When things go awry, some sector that holds some of the newest generation may be corrupt but it might be that a relatively recent generation does not contain this data and is therefore unaffected.

        What I’m suggesting is that you should be able to roll back to such a generation at the cost of the changes which happened in between in order to restore a usable filesystem. For this to be feasible, btrfs would need to take greater care not to overwrite recent generation data though which is what I meant by making it “more log-structured”.

        I don’t know whether any of this is realistically doable though; my knowledge of btrfs isn’t enough to ascertain this.

        • Pup Biru@aussie.zone
          cake
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 month ago

          right! okay, i believe that’s theoretically possible, but the tools don’t exist - which is the constant problem with btrfs

          … and i could be completely wrong too - this is getting to the limits of my knowledge

    • beleza pura@lemmy.eco.brOP
      link
      fedilink
      arrow-up
      1
      arrow-down
      11
      ·
      1 month ago

      as i said, maybe that’s the ideal for industrial/business applications (e.g. servers, remote storage) where the cost of replacing disks due to failure is already accounted for and the company has a process ready and pristine data integrity is of utmost importance, but for home use, reliability of the hardware you do have right now is more important than perfect data integrity, because i want to be as confident as possible that my system is going to boot up next time i turn it on. in my experience, i’ve never had any major data loss in ext4 due to hardware malfunction. also, most files on a filesystem are replaceable anyway (especially the system files), so it makes even less sense to install your system on a btrfs drive from that perspective.

      what you’re saying me is basically “btrfs should never be advised for home use”

      • Ephera@lemmy.ml
        link
        fedilink
        English
        arrow-up
        16
        ·
        1 month ago

        I mean, as someone who hasn’t encountered these same issues as you, I found btrfs really useful for home use. The snapshotting functionality is what gives me a safe feeling that I’ll be able to boot my system. On ext4, any OS update could break your system and you’d have to resort to backups or a reinstall to fix it.

        But yeah, it’s quite possible that my hard drives were never old/bad enough that I ran into major issues…

        • beleza pura@lemmy.eco.brOP
          link
          fedilink
          arrow-up
          2
          arrow-down
          4
          ·
          1 month ago

          honestly, i do get the appeal of btrfs, which is why i wanted to try it out one more time. but i feel i can’t trust it if it is really that fault intolerant. ext4 might not have as many features as btrfs, but it is more lenient and more predictable

          (also, recovering from update failures should be the job of the package system imo)

          • ReversalHatchery@beehaw.org
            link
            fedilink
            English
            arrow-up
            3
            ·
            1 month ago

            (also, recovering from update failures should be the job of the package system imo)

            I think it cannot be expected from the package manager, because it cannot revert database and config structure updates that were automatically done by the programs themselves. if you just restore the old versions of packages, some of them will refuse to start up, crash, or lose data