Jump to content

Do you maintain digital file integrity?


Recommended Posts

I am keen to reach out to this audience to find out what (if anything) you might do to maintain your digital file integrity? I have just completed an arduous process of converting all 32,000 digital files of mine from ALAC to AIF (and please i do not want this thread to be a discussion about the merits or comparisons of certain file formats (ie. ALAC v's FLAC, v's WAV v's [enter file format here]) except where this comparison pertains to the merits of the maintenance of digital data integrity).

 

During this conversion process (ALAC to AIFF using the apple iTunes media player) I found i had many files that were unable to be converted (ERR: File Format not recognised) because some how they had become corrupted over time. I have moved my library as it has expanded from one HDD to a larger HDD or NAS to accommodate its growth. And by all accounts as the files are transferred the transfer process completes without incident - or so i at least thought. But it would appear that all this moving of files over the years has indeed caused some files to be corrupted. I would say about 1% of my digital audio library has become corrupted in time. Another curious observation is that when i attempted to convert the same corrupted file using the J River AIF converter it would convert the corrupt file with no message what so ever. So if the file was corrupt before conversion J River would convert the file with the same corruption. I should point out the symptom of this corruption is that the file will play but only up to a certain point when there is a bit of white static noise and then the file is skipped and the media player begins to play the next song in my library. The position of this white static noise from one corrupted file to the next is never the same. It could be 30seconds into one song or 4min 25 sec into another song.

 

I started to look into how to maintain digital file integrity and started to read about terms like "file fixity" and "data erosion" and "bit rot and data decay" and from what i have been able to glean thus far that what i am experiencing is to be expected. So i was wondering if anyone else who manages a reasonable large digital data audio collection uses some unique or other specialised type of program that is able to continuously scan your audio files for any type of data corruption or any other type of data integrity management process? I would be very interested to hear what others do about this annoying trait.

  • Like 3
Link to comment
Share on other sites



I had the same problem with my files when I moved them from a external HD to a NAS. I reckon it affected 20% of my files and I though it was a transfer over the network issue as I never had the problem with previous USB cable transfers. I did not realise it was a data integrity issue. Gave me an excuse to redo my music collection in wav, still in the process of wav-ing my music collection. I am definately interested in how to protect my new files. 

Link to comment
Share on other sites

To clarify: is the noise on the original corrupted file? 

 

Not too sure what you are asking here. But if you are asking is the noise on the actual file BEFORE i attempt to convert the file then Yes it is. The apple iTunes throws up an error message when it attempted to convert the ALAC File to AIFF. This is how i first became alerted to the pervasiveness of the corruption in my library. I was getting alot of error messages during this conversion process. I am lucky i decided to use the iTunes converter in the first place because the J River converter did not provide any indication there was a problem with these files. Had i started with J River I would of potentially converted my whole library without ever knowing (other than when the file was played back) that there was an issue with the integrity of the audio file. I only used J River to see if it was able to convert the file when Apple iTunes would not and if it was able to convert the file would it actually "correct" the corruption? But as indicated in my original thread even though J River converted the corrupted file with no error message when i replayed the converted AIF file it still had the same corruption in the same place as the pre-converted ALAC file. J River just made a replica of the corrupted file albeit in a different format. Apple iTunes on the other hand must perform some other checking during the conversion process. Because it would throw up a message during the conversion process informing me that the file format was unknown - even though the file was actually an ALAC file and i could play it (well part of it) in iTunes before i got that static white noise sound. But my guess is that when the iTunes conversion utility hit this part of the file during the conversion it recognised the corruption and would not proceed with the conversion whereas J River would not recognise it and proceed with the conversion. Of course the other (less likely) scenario is that it is actually the iTunes conversion utility that is causing the file to become corrupted in the first place? I say less likely because i was aware that in my ALAC library i already had corrupted files because i had heard this white noise and skipping to the next audio file from time to time. So i do doubt it is the apple file utility causing this corruption.

 

Now by "original" if you are saying were the files on the CD from which i ripped the ALAC originals corrupt no i don't think so. As i know for a fact that i have been able to listen to ripped albums in their entirety over years gone by. So the original files on the original CD's were ok and the original rips were also ok. And unless the CD is in very poor condition i can't say i have ever heard many CD's that i have not been able to play.

Link to comment
Share on other sites



Guest myrantz

Do a RAM test on the PC you've used to do the conversion and hope it's that...

 

Only two instances of file corruption I'm aware of when using NAS:

1/ old version Windows Home Server (first version I think)

2/ NAS using ZFS

 

First is pretty unlikely (considering it's been years now since the issue is fixed), 2nd is and can be a problem (e.g. scrubbing and bad RAM = silently errors). And the errors can be pretty random.. Old or failing hardware + ZFS = risk..

 

Only way to work around this is to do regular and multiple backups, and do regular data integrity checks (e.g. simple checksums comparisons) to make sure your backups match each other.. I only do this with critical files though (personal files and photos).. Audio files I just have 2 copies (one kept at home, the other at the office), but I don't check for integrity.. If it errors out, I can just re-rip, and if I don't even know of the error, it probably means I never listen to it anyway :P...

 

If you're using ZFS, and you have backups, then you can risk it by running a scrub and hope that may fix it (or make it worse :P)... Not sure what to do here since I've yet to face this problem.. 

Link to comment
Share on other sites

I normally rip with dbpoweramp.

Originally I ripped into 3 formats. Now I just rip to flac, but then use dbpoweramp to batchconvert into other formats.

I now have my files in at least aiff and flac on my nas. I also have a separate hard disk with a flac copy.  

I reckon that if something goes wrong I will be able to have it in another place to restore from.

  • Like 2
Link to comment
Share on other sites



I haven't heard about this before. I need to pay some attention to this given the time investment I've made into ripping a large CD collection.

 

I know from general file integrity principles there is some very well understood methods for validating and recovering files using checksum methods. However, they're probably only relevant at the time the file of interest is created (when its integrity is assumed to be completely valid).

 

You may have heard of PAR files that were originally developed for uploading to UseNet services. They use parity bits and checksums to provide a method to validate a file and also recover it if there are errors. This approach has since been extended to provide tools for general file integrity and recovery.

 

See: http://en.wikipedia.org/wiki/Parchive

 

There is a significant extra storage requirement if PAR recovery files are generated, but it's probably a small price to pay compared to losing irreplaceable data.

 

Thanks for the heads up...

 

Regards, David.

  • Like 1
Link to comment
Share on other sites

Do a RAM test on the PC you've used to do the conversion and hope it's that...

 

Only two instances of file corruption I'm aware of when using NAS:

1/ old version Windows Home Server (first version I think)

2/ NAS using ZFS

 

First is pretty unlikely (considering it's been years now since the issue is fixed), 2nd is and can be a problem (e.g. scrubbing and bad RAM = silently errors). And the errors can be pretty random.. Old or failing hardware + ZFS = risk..

 

Only way to work around this is to do regular and multiple backups, and do regular data integrity checks (e.g. simple checksums comparisons) to make sure your backups match each other.. I only do this with critical files though (personal files and photos).. Audio files I just have 2 copies (one kept at home, the other at the office), but I don't check for integrity.. If it errors out, I can just re-rip, and if I don't even know of the error, it probably means I never listen to it anyway :P...

 

If you're using ZFS, and you have backups, then you can risk it by running a scrub and hope that may fix it (or make it worse :P)... Not sure what to do here since I've yet to face this problem.. 

Regular backups is okay but what happens if the corrupted version overwrites the good backup? I have kept all my cds but lately I have been buying hires files so it could be a painful loss.

  • Like 1
Link to comment
Share on other sites

Guest myrantz

Regular backups is okay but what happens if the corrupted version overwrites the good backup? I have kept all my cds but lately I have been buying hires files so it could be a painful loss.

Can't prevent everything I guess.. That's kind of what integrity checking can do.. Multiple backups (2 copies + 1 in use) is the best I can do... Thought of storing some stuffs to the "cloud" but can't do much with my ADSL upload speed..

 

Would like to know what file system OP uses on his NAS though, if it's ZFS the corruption is pretty rare AFAIK so the problem could well be something else.. I only have a CRC error since moving to FreeNAS (prob 2 years now?), and scrubbing fixed it up - and no data loss for me.

Link to comment
Share on other sites

Can't prevent everything I guess.. That's kind of what integrity checking can do.. Multiple backups (2 copies + 1 in use) is the best I can do... Thought of storing some stuffs to the "cloud" but can't do much with my ADSL upload speed..

 

Would like to know what file system OP uses on his NAS though, if it's ZFS the corruption is pretty rare AFAIK so the problem could well be something else.. I only have a CRC error since moving to FreeNAS (prob 2 years now?), and scrubbing fixed it up - and no data loss for me.

 

I am using a Synology NAS using RAID1 across 2x 2TB drives - which means I only have a total of 2TB capacity. I don't think this uses ZFS. ZFS is not a standard i am familiar with and i don't believe the Synology NAS's are compatible with this standard? My laptop is a i7 Quad Core HP laptop with about 32GB RAM but i have changed laptops a number of times since i started building my music collection.

Link to comment
Share on other sites



Check into MD5sum or MD5sums utilities.

 

Flac itself has the ability to create a fingerprint file from the audio data.  It doesn't fingerprint the metadata (tags), so it may not be ideal for checking file integrity.

Link to comment
Share on other sites

I learnt a long time ago when printers started to have HDD for memory storage and Fiery capabilities that care must be taken when formatting and reimagine.

I have never trusted any RAID arrangements because even if you buy identical HDD, the controller or driver assy can be very different using entirely different firmware, with the same brand, same model number etc etc.

When transferring files that need integrity, minimised all irrelevant applications and don't use the Computer other than wait for the transfer to be complete.

When backing up I try not to use the same HDD in case they exhibit similar faults. So when rip a CD to a main drive, I back that up to another drive. If I wasn't lazy I would rip of the CD to a backup drive.

I have experienced bad transfer even from a SD card where somehow it becomes corrupt for no reason at all. Transferring it the 2nd time it's OK!

  • Like 1
Link to comment
Share on other sites

Flash memory does have random errors, or failure of bits. That's why the firmware on sad drives has been such a big deal to get correct.

I use a hardware RAID 5 - automatic parity, and backup to external hdds.

I suppose backing up to the cloud is also now an option. I would recommend zipping your backup with password protection, or other encryption

Link to comment
Share on other sites

Guest thathifiguy

I had this happen with a number of files, in particular when I moved my music library to an external HDD. Very annoying.

Link to comment
Share on other sites



Flash memory does have random errors, or failure of bits. That's why the firmware on sad drives has been such a big deal to get correct.

I use a hardware RAID 5 - automatic parity, and backup to external hdds.

I suppose backing up to the cloud is also now an option. I would recommend zipping your backup with password protection, or other encryption

Most corporation that I go to are not on the cloud, in some places wifi is prohibited and a sticker must be placed on all tablets and phones with cameras. Some organisations will never use wifi or cloud, full stop...

Link to comment
Share on other sites

Guest myrantz

I suppose backing up to the cloud is also now an option. I would recommend zipping your backup with password protection, or other encryption

Not with the kind of internet most people have.. :P I just realised why my Internet is so bad the past couple of weeks everytime I'm home... Other people are using Netflix!!!!!!! 

 

I am using a Synology NAS using RAID1 across 2x 2TB drives - which means I only have a total of 2TB capacity. I don't think this uses ZFS. ZFS is not a standard i am familiar with and i don't believe the Synology NAS's are compatible with this standard? My laptop is a i7 Quad Core HP laptop with about 32GB RAM but i have changed laptops a number of times since i started building my music collection.

ZFS is a file system (and Logical Volume Manager in 1).. if 2TB I doubt it'd be using ZFS unless you specifically set it as so.. And you're probably right, looking at the web page, I don't think the 2 bays synology NASes use ZFS... So the problem could be something else... 

 

ZFS is apparently very good at fixing silent data corruption (which is the problem you seem to be having)... 

Link to comment
Share on other sites

I rip using DBP to 3 USB HDD's drives. FLAC lossless, ALAC lossless and LAME 320kbps CBR.

I then copy to 2 Seagate Cental NAS's as well another 3 HDD's as well as a PC internal HDD. Bit anal but could not see me doing it all again..and have to rely on Tidal or other....

Also store at copy a my parents...

I use the original  USB drives that I rip to as the "master".

I copy any update changes using Karenware Relicator from the "master USB's" to the other drives.

I periodically use Audio Tester from

http://www.vuplayer.com/other.php ( just noticed a new release for this )

to check the integrity ( as far as it will )  for my FLAC and the LAME 320kbps.

For ALAC I just tick the "after encoding verify the written audio" box when ripping in DBPoweramp...not sure of any way of testing ALAC..?

Takes a long time but it good to get "0 files failed message" at the end of 30 thousand odd files.......

I have over the last 5 or so years had a few files corrupted and  just delete and re rip or get a copy from one of the backup drives that passes audio tester.

These issues "seem" to of coincided when the available disk space on the drive gets to below 10% space avail on the USB .

Always interested in any better ideas though...

Edited by ummagumma
  • Like 1
Link to comment
Share on other sites

I rip using DBP to 3 USB HDD's drives. FLAC lossless, ALAC lossless and LAME 320kbps CBR.

I then copy to 2 Seagate Cental NAS's as well another 3 HDD's as well as a PC internal HDD. Bit anal but could not see me doing it all again..and have to rely on Tidal or other....

Also store at copy a my parents...

I use the original  USB drives that I rip to as the "master".

I copy any update changes using Karenware Relicator from the "master USB's" to the other drives.

I periodically use Audio Tester from

http://www.vuplayer.com/other.php ( just noticed a new release for this )

to check the integrity ( as far as it will )  for my FLAC and the LAME 320kbps.

For ALAC I just tick the "after encoding verify the written audio" box when ripping in DBPoweramp...not sure of any way of testing ALAC..?

Takes a long time but it good to get "0 files failed message" at the end of 30 thousand odd files.......

I have over the last 5 or so years had a few files corrupted and the just delete and re rip whatever so has an issue.

These issues "seem to of coincided when the available disk space on the drive gets to below 10% space avail on the USB .

Always interested in any better ideas though...

If you want you store a back up at my place.....serious....:P

Link to comment
Share on other sites

I rip using DBP to 3 USB HDD's drives. FLAC lossless, ALAC lossless and LAME 320kbps CBR.

I then copy to 2 Seagate Cental NAS's as well another 3 HDD's as well as a PC internal HDD. Bit anal but could not see me doing it all again..and have to rely on Tidal or other....

Also store at copy a my parents...

I use the original  USB drives that I rip to as the "master".

I copy any update changes using Karenware Relicator from the "master USB's" to the other drives.

I periodically use Audio Tester from

http://www.vuplayer.com/other.php ( just noticed a new release for this )

to check the integrity ( as far as it will )  for my FLAC and the LAME 320kbps.

For ALAC I just tick the "after encoding verify the written audio" box when ripping in DBPoweramp...not sure of any way of testing ALAC..?

Takes a long time but it good to get "0 files failed message" at the end of 30 thousand odd files.......

I have over the last 5 or so years had a few files corrupted and  just delete and re rip or get a copy from one of the backup drives that passes audio tester.

These issues "seem" to of coincided when the available disk space on the drive gets to below 10% space avail on the USB .

Always interested in any better ideas though...

 

 

Some serious backup strategies you got going on there. Yes i do have a backup of my library but this is not what i am really after. I wanted something that would proactively scan your library and inform you when a corrupt file was detected rather than having to wait until i heard it on my system. I looked at the audio checker from VU Player. It doesn't do AIF or ALAC unfortunately but a tool similar to this would suffice until someone made the "proactive" audio checker toolset. A constant file scanner would be great to have built into the audio player itself. Similar to an antivirus checker that is always checking your hard disk drive and always checking every file opened or accessed etc. 

Link to comment
Share on other sites



  • Recently Browsing   0 members

    • No registered users viewing this page.




×
×
  • Create New...
To Top