Wrong estimated number of rows

Question

I'm trying to identify an issue I sometime experience on a stored procedure.

I have to say that the stored procedure uses an UDF in the select clause (perhaps that's the cause of the problem) but anyway, whether it's not, here below a part of the execution plan:

where both the output estimated rows of the Hash Match and the estimated of the Index_Seek on the non clustered index IX_Job_OperationID are completely wrong:

I've tried to update the statistic related to the index IX_Job_OperationID, even with fullscan option, but it didn't help.

UPDATE STATISTICS Job [IX_Job_OperationID] with fullscan

I'm also using the recompile option in the stored because given the parameters provided the dataset involved may change a lot.

Can someone point me to the right direction and help me understand why even though statistics have been updated the Estimated values are so far from Actual?

Here a link to the actual execution plan

This execution plan causes a total of 8,610,665 logical reads, while other better plan that somehow are chosen can be around 122,692

This might help. Possible duplicate. http://dba.stackexchange.com/questions/8067/estimated-vs-actual-rows-and-multi-column-statistics?rq=1 — SS_DBA, Jan 05 '17 at 15:41
@WEI_DBA thank you for your comment, but I don't have an index on 2 columns.. the answer didn't help. Thanks again! — gipinani, Jan 05 '17 at 15:54
What version of SQL Server are you running? I ask because of Cardinality enhancements to the latter versions. — bwilliamson, Jan 05 '17 at 16:02
Does the column covered by IX_Job_OperationID happen to contain unique values? — Duffy, Jan 05 '17 at 16:59
The actual query will be helpful here. Despite Microsoft's best efforts, there are generated query plans that are simply not optimal for the data. At the very least, we need to know the actual JOINs being used between the tables; the index structure for the indexes being used; how the tables actually relate to each other; any WHERE clauses; and the flow of the query. We probably don't need individual data items returned, unless returned data elements are (1) in an index being used and (2) not actually shown in the query. — Laughing Vergil, Jan 05 '17 at 17:20
@gipinani I should have been more specific, does the current data happen to be unique despite the column settings? — Duffy, Jan 05 '17 at 18:10
@gipinani The column may be defined (sorry about previous poor word choice of settings) as not unique and nullable, but the current data in the column may in fact be unique and the statistics will reflect that which can have effects on your query plans. — Duffy, Jan 05 '17 at 19:13
@Duffy The involved column is a foreign key involved in a 1-N relation, thus the column could have many rows with the same value. And could possibly be null. Bye! — gipinani, Jan 05 '17 at 23:09
Share your execution plans using Paste The Plan @ brentozar.com — SqlZim, Jan 07 '17 at 14:00

score 7 · Answer 1 · answered Jan 07 '17 at 19:19

Based on your description of the problem it is unlikely that the UDF is the cause of the cardinality estimation errors. You can try commenting out the UDF and checking if the join estimate changes if you want to test that.

It is possible to experience cardinality estimation errors even when statistics have been updated with FULLSCAN. The histograms created by SQL Server may not model your data in a way for the optimizer to create good enough plans, SQL Server may not have histograms that exactly match the joined data (you could be filtering on the inner and/or outer table), you may be joining on multiple columns which complicate estimations, and so on. For example, Microsoft changed some of the join cardinality estimate calculations with the new CE released in SQL Server 2014. Some tables will have data that better matches the assumptions for the legacy CE and some tables will have data that better matches the assumptions for the new CE.

For the first estimation issue (38999 estimated rows versus 701680 actual rows), based on what you've provided I can only recommend reading about join cardinality calculations. Microsoft released a white paper for the new cardinality estimator in SQL Server 2014. I also know of a blog post that dives into the internals of join cardinality quite a bit. As the question is written there are too many possible causes to give a good answer. Please post an actual execution plan if you'd like further guidance.

For the second estimation issue (1 estimated row versus 1546894 actual rows), note that the estimated number of rows is for a single iteration of the inner loop for the join. The actual number of rows is for all rows returned from the table over all iterations of the join, so they will not often match for nested loop joins. What you are seeing is common and not necessarily a sign of a problem.

score 0 · Answer 2 · answered Jul 27 '23 at 05:02

The Cardinality Estimator saw improvements in many versions of SQL Server. If your database has been around a long time, check to see if an old version of the logic is being used. Check database settings:

COMPATIBILITY_LEVEL
LEGACY_CARDINALITY_ESTIMATION

See https://learn.microsoft.com/en-us/sql/relational-databases/performance/cardinality-estimation-sql-server?view=sql-server-ver16

Wrong estimated number of rows

2 Answers2