The Curious Case of THREADPOOL Waits

The Curious Case of THREADPOOL Waits

Featured image photo by alexey turenkov on Unsplash

A client reached out for help with a performance problem that had been going on for over a day. The application was performing so slowly that it was nearly an outage to their users.

I love working on production emergencies where there’s a severe performance problem. I can often solve them or provide a solution within a few minutes to an hour. Sometimes they take longer when they are complex. Sometimes they take longer because you keep going down the wrong path.

This particular incident had us going down the wrong path because we had not seen this issue before. Once we were on the right path, it was an easy fix!

These were the symptoms that we saw:

  • Abnormally high CPU
  • Significant THREADPOOL waits
  • High HADR_SYNC_COMMIT waits
  • Moderate LCK* waits
  • Low virtual memory condition events

This is what it looked like in SQL Sentry the day before they contacted us:

high sql server CPU usage
High CPU
SQL Server waits
SQL Server waits
other SQL Server waits
“Other” waits
SQL Server low memory events
Low memory events

We temporarily switched the Availability Group to asynchronous-commit which obviously eliminated the HADR_SYNC_COMMIT waits, but the users were still complaining about severe slowness.

SQL Server HADR_SYNC_COMMIT waits
No more HADR_SYNC_COMMIT waits
sql server async-commit change
“Other” waits after async-commit change
blocking waits after async-commit charge
“Blocking” waits after async-commit change
sql server threadpool waits
THREADPOOL waits getting worse!
"Other" waits for a severe THREADPOOL waits spike
“Other” waits for a severe THREADPOOL waits spike

I typically see THREADPOOL waits occurring due to one of the following reasons:

  • Severe blocking
  • Large number of databases in an Availability Group
  • Large number of active sessions
  • Bad parameter sniffing, though this is fairly rare

There was blocking timed with most of the THREADPOOL spikes but not all of them. Most of the blocking that was captured was due to IntelliSense and not experienced by the users.

Availability Groups can use a lot of worker threads, but AGs typically only cause THREADPOOL waits when there are hundreds or even thousands of databases. This server only had 33 databases in the AG. We looked into it anyway. By default this server can use up to 701 worker threads as it has 16 processors. We never saw it use more than 300 worker threads when the THREADPOOL waits were happening. This was a clue!

Nothing was making sense.

I said, “Let’s run sp_Blitz so that we can check the server config and see if there is anything weird in there.”

I ran this command and immediately saw concerning issues:

EXEC sp_Blitz @CheckServerInfo = 1, @CheckUserDatabaseObjects = 0
sp_blitz curated rows
sp_Blitz curated rows

Wait, what?!

sp_blitz warning signs
Danger, Will Robinson

sp_Blitz was reporting dangerously low memory at the NUMA node level. It also was reporting that “min server memory” and “max server memory” were set to the same value, which is not recommended. I am not sure why it was set like this.

This server uses a RAM disk for tempdb – again, not sure why. A RAM disk reserves part of memory as a storage drive. This RAM disk was configured to use 175GB of memory. SQL Server was configured to use about 615GB of memory. 615GB + 175GB = 790GB, but there’s only 766GB of RAM on the server. The math doesn’t work!

The RAM disk’s size was increased recently due to tempdb running out of space, but this sent the server over the edge and caused the performance issue.

We lowered “min server memory” and “max server memory” so that the server would have enough memory for SQL Server, the RAM disk and Windows. Suddenly the users were happy! CPU utilization and the waits were back to normal.

sql server CPU usage back to baseline
CPU back to the baseline
sql server waits back to baseline
Waits back to the baseline

We’d have eventually figured out the problem but using sp_Blitz saved us and the client a lot of troubleshooting time.

Subscribe for Updates

Name

3 thoughts on “The Curious Case of THREADPOOL Waits”

  1. Thanks for sharing this interesting situation. I haven’t encountered threadpool issues before so it is good to know possible solutions.

    Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share This