On SQL Server 2017 (CU3), whenever I enable backup compression on one of my TDE databases, the backup process always corrupts a specific page in the database. If I run the backup without compression, it does not get corrupted. Here are the steps I’ve taken to verify and reproduce this issue:
- Run DBCC CheckDB on database “TDE_DB1”; all is good, no errors;
- Successfully back up database without compression; RESTORE VERIFYONLY says all is good;
- Successfully restore database as “TDE_DB2”; all is good, DBCC CheckDB shows no errors;
- Successfully back up “TDE_DB1” database WITH compression; RESTORE VERIFYONLY errors, saying “Damage to the backup set was detected”;
- Attempt to restore database as “TDE_DB2”; errors, saying “RESTORE detected an error on page (1:92454) in database”
- Repeat steps 1-3; all is good;
- DROP “TDE_DB1” and “TDE_DB2”; Restore “TDE_DB1” from backup; all is good;
- Repeat steps 1-5; get same results;
To summarize: The database and regular backups seem fine, running CHECKDB on the database and VERIFYONLY on the backups do not report any errors. Backing up the database with compression seems to cause the corruption.
Below are the code samples with errors. (Note: MAXTRANSFERSIZE is required for using compression with a TDE database)
-- Good, completes with no corruption; BACKUP DATABASE [TDE_DB1] TO DISK = N'E:\MSSQL\Backup\TDE_DB1a.bak' WITH CHECKSUM; RESTORE VERIFYONLY FROM DISK = N'E:\MSSQL\Backup\TDE_DB1a.bak' WITH CHECKSUM; RESTORE DATABASE [TDE_DB2] FROM DISK = 'E:\MSSQL\Backup\TDE_DB1a.bak' WITH MOVE 'DataFileName' to 'E:\MSSQL\Data\TDE_DB2.mdf' ,MOVE 'LogFileName' to 'F:\MSSQL\Log\TDE_DB2_log.ldf'; -- Bad, I haz corruption; BACKUP DATABASE [TDE_DB1] TO DISK = N'E:\MSSQL\Backup\TDE_DB1b.bak' WITH CHECKSUM, COMPRESSION, MAXTRANSFERSIZE = 131072; RESTORE VERIFYONLY FROM DISK = N'E:\MSSQL\Backup\TDE_DB1b.bak' WITH CHECKSUM; -- ERROR --Msg 3189, Level 16, State 1, Line 1 --Damage to the backup set was detected. --Msg 3013, Level 16, State 1, Line 1 --VERIFY DATABASE is terminating abnormally. RESTORE DATABASE [TDE_DB2] FROM DISK = 'E:\MSSQL\Backup\TDE_DB1b.bak' WITH MOVE 'DataFileName' to 'E:\MSSQL\Data\TDE_DB2.mdf' ,MOVE 'LogFileName' to 'F:\MSSQL\Log\TDE_DB2_log.ldf'; -- ERROR --Msg 3183, Level 16, State 1, Line 7 --RESTORE detected an error on page (1:92454) in database "TDE_DB2" as read from the backup set. --Msg 3013, Level 16, State 1, Line 7 --RESTORE DATABASE is terminating abnormally.
I then tried to check the page that is reported as having the error (It is always the same page.), but DBCC PAGE reports that the ObjectId is 0. According to this article by Paul Randal that means there was no metadata found, and one of the reasons could be that the page itself is corrupt and incorrect values were used to try to look up the metadata. His advice is to run CHECKDB, which I cannot do because the corrupted backup won’t restore.
I tried the suggestions from this SO Post (Adding INIT and FORMAT to the BACKUP command) to reset the metadata, but that did not seem to change anything, I still get corruption on the compressed backup.
This only happens with one of my TDE database. I have 4 other TDE databases on this same server, and they do not have this problem. That tells me that there may be an underlying problem with this specific database. I realize that the easy solution is to just not use compression, but I feel like this may actually be an early warning to a bigger problem coming down the road.
Has anyone ever seen this before, or have any idea why compression would corrupt that page? At this point, I’m kind of at a loss as to what to do next. I considered restoring the page from an earlier backup, but I don’t think that would matter since the page in the regular database seems fine.