What’s Faster, Count

or country EXISTS With limit in SQL? Let’s check COUNT(*) In a previous blog post, we’ve advertised the use of sql

rather than

SELECT count(*)
FROM actor a
JOIN film_actor fa USING (actor_id)
WHERE a.last_name = 'WAHLBERG'

to check for existence of a value in SQL.

SELECT EXISTS (
  SELECT 1 FROM actor a
  JOIN film_actor fa USING (actor_id)
  WHERE a.last_name = 'WAHLBERG'
)

IE to check if in the Sakila Database, actors called Wahlberg Have Played In Any Films, Intead of: FROM DUAL Do this: CASE (Depending on your dialect you may require a BOOLEAN Clause, or a

Expression If

Types aren’t supported). 2 Check for Multiple Rows NBut what if you want to check if there are at least EXISTS(Or COUNT(*)) Rows? In that case, you cannot use But have to reveal to using However, instalad of just counting LIMIT all

SELECT (
  SELECT count(*)
  FROM actor a
  JOIN film_actor fa USING (actor_id)
  WHERE a.last_name = 'WAHLBERG'
) >= 2

matches, why not add a

SELECT (
  SELECT count(*)
  FROM (
    SELECT *
    FROM actor a
    JOIN film_actor fa USING (actor_id)
    WHERE a.last_name = 'WAHLBERG'
    LIMIT 2
  ) t
) >= 2

Clause as well? So, if you want to check if actors called Wahlberg Have Played in at Least 2 Films, Instead of this:

  1. Write this: LIMIT 2 In other words:
  2. Run the join Query with a COUNT(*) in a derived table
  3. Then

The rows (at most 2) from that derived table

Finally, check if the country is high enough COUNT(*) Does it matter?

In Principle, The Optimiser Could have Figured this out itself, especially because we used a constant to compare the

Value with. But did it really apply the transformation?

Let’s check execution plans and benchmark the Query on Various RDBMS. LIMIT

Result  (cost=14.70..14.71 rows=1 width=1) (actual time=0.039..0.039 rows=1 loops=1)
InitPlan 1 (returns $1)
-> Aggregate (cost=14.69..14.70 rows=1 width=8) (actual time=0.037..0.037 rows=1 loops=1)
-> Nested Loop (cost=0.28..14.55 rows=55 width=0) (actual time=0.009..0.032 rows=56 loops=1)
-> Seq Scan on actor a (cost=0.00..4.50 rows=2 width=4) (actual time=0.006..0.018 rows=2 loops=1)
Filter: ((last_name)::text="WAHLBERG"::text)
Rows Removed by Filter: 198
-> Index Only Scan using film_actor_pkey on film_actor fa (cost=0.28..4.75 rows=27 width=4) (actual time=0.003..0.005 rows=28 loops=2)
Index Cond: (actor_id = a.actor_id)
Heap Fetches: 0

Postgresql 15 LIMIT

Result  (cost=0.84..0.85 rows=1 width=1) (actual time=0.023..0.024 rows=1 loops=1)
InitPlan 1 (returns $1)
-> Aggregate (cost=0.83..0.84 rows=1 width=8) (actual time=0.021..0.022 rows=1 loops=1)
-> Limit (cost=0.28..0.80 rows=2 width=240) (actual time=0.016..0.018 rows=2 loops=1)
-> Nested Loop (cost=0.28..14.55 rows=55 width=240) (actual time=0.015..0.016 rows=2 loops=1)
-> Seq Scan on actor a (cost=0.00..4.50 rows=2 width=4) (actual time=0.008..0.008 rows=1 loops=1)
Filter: ((last_name)::text="WAHLBERG"::text)
Rows Removed by Filter: 1
-> Index Only Scan using film_actor_pkey on film_actor fa (cost=0.28..4.75 rows=27 width=4) (actual time=0.005..0.005 rows=2 loops=1)
Index Cond: (actor_id = a.actor_id)
Heap Fetches: 0

No

With

Nested Loop  (cost=0.28..14.55 rows=55 width=0) (actual time=0.009..0.032 rows=56 loops=1)

To understand the difference, focus on these rows:

Nested Loop  (cost=0.28..14.55 rows=55 width=240) (actual time=0.015..0.016 rows=2 loops=1)

Before:

After: In Both cases, the estimated number of rows produced by the join is 55 (IE all Wahlbergs are expected to have played in a total of 55 films according to statistics). But into he second execution the LIMIT actual rows

Value is Much Lower, Because We Only Needed 2 Rows Before We Cold Stop Execution of the Operation of the operation, because of the

Above.

RUN 1, Statement 1: 2.61927
RUN 1, Statement 2: 1.01506
RUN 2, Statement 1: 2.47193
RUN 2, Statement 2: 1.00614
RUN 3, Statement 1: 2.63533
RUN 3, Statement 2: 1.14282
RUN 4, Statement 1: 2.55228
RUN 4, Statement 2: 1.00000 -- Fastest run is 1
RUN 5, Statement 1: 2.53801
RUN 5, Statement 2: 1.02363

Benchmark results: 1 Using our recommended sql benchmarking technique that compares running two queries many times (5 runs x 2000 executions in this case) Languages ​​(to avoid network latency, etc.), we get these results: COUNT(*) The fastest run is LIMIT Units of Time, Slower Runs Run in Multiples of that Time. The complete

Query is consistent and significantly slower than the

Query.

Both the plans and Benchmark Results Speak for Themselves. BOOLEAN Oracle 23C DUALWith Oracle 23C, we can finally use

Types and Omit FETCH FIRSTYay!

SQL_ID  40yy0tskvs1zw, child number 0
-------------------------------------
SELECT /*+GATHER_PLAN_STATISTICS*/ ( SELECT count(*)
FROM actor a JOIN film_actor fa USING (actor_id)
WHERE a.last_name="WAHLBERG" ) >= 2

Plan hash value: 2539243977

---------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 0 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 6 |
| 2 | NESTED LOOPS | | 1 | 55 | 56 |00:00:00.01 | 6 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| ACTOR | 1 | 2 | 2 |00:00:00.01 | 2 |
|* 4 | INDEX RANGE SCAN | IDX_ACTOR_LAST_NAME | 1 | 2 | 2 |00:00:00.01 | 1 |
|* 5 | INDEX RANGE SCAN | IDX_FK_FILM_ACTOR_ACTOR | 2 | 27 | 56 |00:00:00.01 | 4 |
| 6 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("A"."LAST_NAME"='WAHLBERG')
5 - access("A"."ACTOR_ID"="FA"."ACTOR_ID")

No FETCH FIRST,

SQL_ID  f88t1r0avnr7b, child number 0
-------------------------------------
SELECT /*+GATHER_PLAN_STATISTICS*/( SELECT count(*)
from ( select * FROM actor a JOIN
film_actor fa USING (actor_id) WHERE a.last_name =
'WAHLBERG' FETCH FIRST 2 ROWS ONLY ) t )
>= 2

Plan hash value: 4019277616

------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 0 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 6 | | | |
|* 2 | VIEW | | 1 | 2 | 2 |00:00:00.01 | 6 | | | |
|* 3 | WINDOW BUFFER PUSHED RANK | | 1 | 55 | 2 |00:00:00.01 | 6 | 2048 | 2048 | 2048 (0)|
| 4 | NESTED LOOPS | | 1 | 55 | 56 |00:00:00.01 | 6 | | | |
| 5 | TABLE ACCESS BY INDEX ROWID| ACTOR | 1 | 2 | 2 |00:00:00.01 | 2 | | | |
|* 6 | INDEX RANGE SCAN | IDX_ACTOR_LAST_NAME | 1 | 2 | 2 |00:00:00.01 | 1 | | | |
|* 7 | INDEX RANGE SCAN | IDX_FK_FILM_ACTOR_ACTOR | 2 | 27 | 56 |00:00:00.01 | 4 | | | |
| 8 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 | | | |
------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("from$_subquery$_005"."rowlimit_$$_rownumber"<=2)
3 - filter(ROW_NUMBER() OVER ( ORDER BY NULL )<=2)
6 - access("A"."LAST_NAME"='WAHLBERG')
7 - access("A"."ACTOR_ID"="FA"."ACTOR_ID")

With NESTED LOOPS , WINDOW BUFFER PUSHED RANK Uh oh, this doesn’t look better. The E-Rows Operation does not seem to have gotten the memo from the A-Rows Operation about the Query Being Aborted. The JOIN (Estimated) and

(Actual) Values ​​Still Match, so the

Seems to be executed completely. ROWNUMFor good measure, let’s also try:

With FETCH ,

SELECT (
  SELECT count(*)
  FROM (
    SELECT *
    FROM actor a
    JOIN film_actor fa USING (actor_id)
    WHERE a.last_name = 'WAHLBERG'
    AND ROWNUM <= 2 -- Yuck, but it works
  ) t
) >= 2

I had hoped that this undeed syntax belongs only to distant memory

SQL_ID  6r7w9d0425j6c, child number 0
-------------------------------------
SELECT /*+GATHER_PLAN_STATISTICS*/( SELECT count(*)
from ( select * FROM actor a JOIN
film_actor fa USING (actor_id) WHERE a.last_name =
'WAHLBERG' AND ROWNUM <= 2 ) t ) >= 2

Plan hash value: 1271700124

-----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 0 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 4 |
| 2 | VIEW | | 1 | 2 | 2 |00:00:00.01 | 4 |
|* 3 | COUNT STOPKEY | | 1 | | 2 |00:00:00.01 | 4 |
| 4 | NESTED LOOPS | | 1 | 55 | 2 |00:00:00.01 | 4 |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| ACTOR | 1 | 2 | 1 |00:00:00.01 | 2 |
|* 6 | INDEX RANGE SCAN | IDX_ACTOR_LAST_NAME | 1 | 2 | 1 |00:00:00.01 | 1 |
|* 7 | INDEX RANGE SCAN | IDX_FK_FILM_ACTOR_ACTOR | 1 | 27 | 2 |00:00:00.01 | 2 |
| 8 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 |
-----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter(ROWNUM<=2)
6 - access("A"."LAST_NAME"='WAHLBERG')
7 - access("A"."ACTOR_ID"="FA"."ACTOR_ID")

Syntax, but let’s try what happy with this alternative: NESTED LOOPS The plan is now: A-Rows Now, that’s what i’m talking about. The 2Operation has a COUNT STOPKEY Value of

As it should have. The

Run 1, Statement 1 : 1.9564
Run 1, Statement 2 : 2.98499
Run 1, Statement 3 : 1.07291
Run 2, Statement 1 : 1.69192
Run 2, Statement 2 : 2.66905
Run 2, Statement 3 : 1.01144
Run 3, Statement 1 : 1.71051
Run 3, Statement 2 : 2.63831
Run 3, Statement 3 : 1 -- Fastest run is 1
Run 4, Statement 1 : 1.61544
Run 4, Statement 2 : 2.67334
Run 4, Statement 3 : 1.00786
Run 5, Statement 1 : 1.72981
Run 5, Statement 2 : 2.77913
Run 5, Statement 3 : 1.02716

Operation knows how to tell its successors to behave. FETCH FIRST 2 ROWS ONLY Benchmark results: ROWNUM Whatsies. Indeed, it appears that the LIMITClause is bad in this case. It even made performance Worse than if we omit it and count the complete result. However, the FETCH FIRST Filter Helped Greatly, Just Like Before With Postgresql’s

I would consider this an optimiser bug in Oracle.

Should be an operation that can be pushed down to various other operations LIMITMysql

-> Rows fetched before execution  (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
-> Select #2 (subquery in projection; run only once)
-> Aggregate: count(0) (cost=1.35 rows=1) (actual time=0.479..0.479 rows=1 loops=1)
-> Nested loop inner join (cost=1.15 rows=2) (actual time=0.077..0.110 rows=56 loops=1)
-> Covering index lookup on a using idx_actor_last_name (last_name="WAHLBERG") (cost=0.45 rows=2) (actual time=0.059..0.061 rows=2 loops=1)
-> Covering index lookup on fa using PRIMARY (actor_id=a.actor_id) (cost=0.30 rows=1) (actual time=0.011..0.021 rows=28 loops=2)

No LIMIT,

-> Rows fetched before execution  (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
-> Select #2 (subquery in projection; run only once)
-> Aggregate: count(0) (cost=4.08..4.08 rows=1) (actual time=0.399..0.400 rows=1 loops=1)
-> Table scan on t (cost=2.62..3.88 rows=2) (actual time=0.394..0.394 rows=2 loops=1)
-> Materialize (cost=1.35..1.35 rows=2) (actual time=0.033..0.033 rows=2 loops=1)
-> Limit: 2 row(s) (cost=1.15 rows=2) (actual time=0.024..0.025 rows=2 loops=1)
-> Nested loop inner join (cost=1.15 rows=2) (actual time=0.024..0.024 rows=2 loops=1)
-> Covering index lookup on a using idx_actor_last_name (last_name="WAHLBERG") (cost=0.45 rows=2) (actual time=0.014..0.014 rows=1 loops=1)
-> Covering index lookup on fa using PRIMARY (actor_id=a.actor_id) (cost=0.30 rows=1) (actual time=0.008..0.008 rows=2 loops=1)

With Nested loop inner join ,

We again get the

Nested loop inner join  (cost=1.15 rows=2) (actual time=0.077..0.110 rows=56 loops=1)

Row with the wanted differentice:

Nested loop inner join  (cost=1.15 rows=2) (actual time=0.024..0.024 rows=2 loops=1)

Before:

After: LIMIT Benchmark results:

0	1	1.2933
0 2 1.0089
1 1 1.2489
1 2 1.0000 -- Fastest run is 1
2 1 1.2444
2 2 1.0933
3 1 1.2133
3 2 1.0178
4 1 1.2267
4 2 1.0178

Again, The

is helpful, thought the differentce is less impressive: LIMITSQL Server

  |--Compute Scalar(DEFINE:([Expr1006]=CASE WHEN [Expr1004]>=(2) THEN (1) ELSE (0) END))
|--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1010],0)))
|--Stream Aggregate(DEFINE:([Expr1010]=Count(*)))
|--Nested Loops(Inner Join, OUTER REFERENCES:([a].[actor_id]))
|--Table Scan(OBJECT:([sakila].[dbo].[actor] AS [a]), WHERE:([sakila].[dbo].[actor].[last_name] as [a].[last_name]='WAHLBERG'))
|--Index Seek(OBJECT:([sakila].[dbo].[film_actor].[PK__film_act__086D31FF6BE587FC] AS [fa]), SEEK:([fa].[actor_id]=[sakila].[dbo].[actor].[actor_id] as [a].[actor_id]) ORDERED FORWARD)

No LIMIT,

  |--Compute Scalar(DEFINE:([Expr1007]=CASE WHEN [Expr1005]>=(2) THEN (1) ELSE (0) END))
|--Compute Scalar(DEFINE:([Expr1005]=CONVERT_IMPLICIT(int,[Expr1011],0)))
|--Stream Aggregate(DEFINE:([Expr1011]=Count(*)))
|--Top(TOP EXPRESSION:((2)))
|--Nested Loops(Inner Join, OUTER REFERENCES:([a].[actor_id]))
|--Table Scan(OBJECT:([sakila].[dbo].[actor] AS [a]), WHERE:([sakila].[dbo].[actor].[last_name] as [a].[last_name]='WAHLBERG'))
|--Index Seek(OBJECT:([sakila].[dbo].[film_actor].[PK__film_act__086D31FF6BE587FC] AS [fa]), SEEK:([fa].[actor_id]=[sakila].[dbo].[actor].[actor_id] as [a].[actor_id]) ORDERED FORWARD)

With SHOWPLAN_ALL,

The text version does not indicate actual rows, even with

Run 1, Statement 1: 1.92118
Run 1, Statement 2: 1.00000 -- Fastest run is 1
Run 2, Statement 1: 1.95567
Run 2, Statement 2: 1.01724
Run 3, Statement 1: 1.91379
Run 3, Statement 2: 1.01724
Run 4, Statement 1: 1.93842
Run 4, Statement 2: 1.04926
Run 5, Statement 1: 1.95567
Run 5, Statement 2: 1.03448

So let’s just look at what happen in the benchmark:

Benchmark results:

And Again, an impressive 2x improvement for this particular Query COUNT(*) Conclusion EXISTSJust as with our previous blog post about N vs LIMIT The seemingly obvious is true against in this case where we want to check if TOP Or more rows exist in a Query. If we blindly country all the rows, then we’ve seen much WorsE performance than if we helpd the optimiser with a ROWNUM or

Clause, or

in Oracle. ROWNUM Technically, an optimiser could have detected this optimization itself, but as our previous article about optimisations that do show depend on the cost model has shown, optimisresrs do’rys do always They can

Unfortunately, in Oracle’s Case, The Standard SQL Syntax Made Things Slower (in this Benchmark). This doesn’t mean it’s generally slower for all cases, but it’s somenting Worth looking out for. There are still cases where ancient

LikeLoading …

Ramesh Ghorai is the founder of www.livenewsblogger.com, a platform dedicated to delivering exclusive live news from across the globe and the local market. With a passion for covering diverse topics, he ensures readers stay updated with the latest and most reliable information. Over the past two years, Ramesh has also specialized in writing top software reviews, partnering with various software companies to provide in-depth insights and unbiased evaluations. His mission is to combine news reporting with valuable technology reviews, helping readers stay informed and make smarter choices.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top