I have a massive problem with 100% CPU spikes because of a bad execution plan used by a specific query. I spend weeks now solve with by my own.
My Database
My sample DB contains 3 simplified tables.
[Datalogger]
CREATE TABLE [model].[DataLogger]( [ID] [bigint] IDENTITY(1,1) NOT NULL, [ProjectID] [bigint] NULL, CONSTRAINT [PK_DataLogger] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY]
[Inverter]
CREATE TABLE [model].[Inverter]( [ID] [bigint] IDENTITY(1,1) NOT NULL, [SerialNumber] [nvarchar](50) NOT NULL, CONSTRAINT [PK_Inverter] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY], CONSTRAINT [UK_Inverter] UNIQUE NONCLUSTERED ( [DataLoggerID] ASC, [SerialNumber] ASC )WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] ALTER TABLE [model].[Inverter] WITH CHECK ADD CONSTRAINT [FK_Inverter_DataLogger] FOREIGN KEY([DataLoggerID]) REFERENCES [model].[DataLogger] ([ID])
[InverterData]
CREATE TABLE [data].[InverterData]( [InverterID] [bigint] NOT NULL, [Timestamp] [datetime] NOT NULL, [DayYield] [decimal](18, 2) NULL, CONSTRAINT [PK_InverterData] PRIMARY KEY CLUSTERED ( [InverterID] ASC, [Timestamp] ASC )WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) )
Stats and Maintanance
The [InverterData]
table has around 120,000,000 rows.
All indexer are defragmentated and all stat rebuild/reorganized as needed on a daily/weekly turn.
My Query
The query is Entity Framework generated and also simple. But i runes 1,000 times per minute and performance is essential.
SELECT [Extent1].[InverterID] AS [InverterID], [Extent1].[DayYield] AS [DayYield] FROM [data].[InverterDayData] AS [Extent1] INNER JOIN [model].[Inverter] AS [Extent2] ON [Extent1].[InverterID] = [Extent2].[ID] INNER JOIN [model].[DataLogger] AS [Extent3] ON [Extent2].[DataLoggerID] = [Extent3].[ID] WHERE ([Extent3].[ProjectID] = @p__linq__0) AND ([Extent1].[Date] = @p__linq__1) OPTION (MAXDOP 1)
The MAXDOP 1
hint is for another problem with a slow paralel plan.
The “good” plan
Over the 90% of time the used plan is lightning fast and looks like this:
The problem
Over the day the good plan randomly changed to a bad and slow plan.
The “bad” plan is used for 10-60min and then changed back to the “good” plan. The “bad” plan spike the CPU up to permanent 100%.
This is how it looks:
What I try so far
My first thought was the Hash Match
is the bad boy. So I modified the query with a new hint.
...Extent1].[Date] = @p__linq__1) OPTION (MAXDOP 1, LOOP JOIN)
The LOOP JOIN
should force to use Nested Loop
instant of Hash Match
.
The result is that the 90% plan looks like before. But the plan also changed randomly to a bad one.
The “bad” plan now looks like this (table loop order changed):
The CPU also peeks to 100% during the “new bad” plan.
Solution?
It comes to my mind to force the “good” plan. But I don’t knwo if this is a good idea.
Please help me!
Update 1 – related to @James question
here are both plans:
Good Plan
Bad plan