or country EXISTS
With limit in SQL? Let’s check COUNT(*)
In a previous blog post, we’ve advertised the use of sql
rather than
SELECT count(*)
FROM actor a
JOIN film_actor fa USING (actor_id)
WHERE a.last_name = 'WAHLBERG'
to check for existence of a value in SQL.
SELECT EXISTS (
SELECT 1 FROM actor a
JOIN film_actor fa USING (actor_id)
WHERE a.last_name = 'WAHLBERG'
)
IE to check if in the Sakila Database, actors called Wahlberg Have Played In Any Films, Intead of: FROM DUAL
Do this: CASE
(Depending on your dialect you may require a BOOLEAN
Clause, or a
Expression If
Types aren’t supported). 2
Check for Multiple Rows N
But what if you want to check if there are at least EXISTS
(Or COUNT(*)
) Rows? In that case, you cannot use But have to reveal to using However, instalad of just counting LIMIT
all
SELECT (
SELECT count(*)
FROM actor a
JOIN film_actor fa USING (actor_id)
WHERE a.last_name = 'WAHLBERG'
) >= 2
matches, why not add a
SELECT (
SELECT count(*)
FROM (
SELECT *
FROM actor a
JOIN film_actor fa USING (actor_id)
WHERE a.last_name = 'WAHLBERG'
LIMIT 2
) t
) >= 2
Clause as well? So, if you want to check if actors called Wahlberg Have Played in at Least 2 Films, Instead of this:
- Write this:
LIMIT 2
In other words: - Run the join Query with a
COUNT(*)
in a derived table - Then
The rows (at most 2) from that derived table
Finally, check if the country is high enough COUNT(*)
Does it matter?
In Principle, The Optimiser Could have Figured this out itself, especially because we used a constant to compare the
Value with. But did it really apply the transformation?
Let’s check execution plans and benchmark the Query on Various RDBMS. LIMIT
Result (cost=14.70..14.71 rows=1 width=1) (actual time=0.039..0.039 rows=1 loops=1)
InitPlan 1 (returns $1)
-> Aggregate (cost=14.69..14.70 rows=1 width=8) (actual time=0.037..0.037 rows=1 loops=1)
-> Nested Loop (cost=0.28..14.55 rows=55 width=0) (actual time=0.009..0.032 rows=56 loops=1)
-> Seq Scan on actor a (cost=0.00..4.50 rows=2 width=4) (actual time=0.006..0.018 rows=2 loops=1)
Filter: ((last_name)::text="WAHLBERG"::text)
Rows Removed by Filter: 198
-> Index Only Scan using film_actor_pkey on film_actor fa (cost=0.28..4.75 rows=27 width=4) (actual time=0.003..0.005 rows=28 loops=2)
Index Cond: (actor_id = a.actor_id)
Heap Fetches: 0
Postgresql 15 LIMIT
Result (cost=0.84..0.85 rows=1 width=1) (actual time=0.023..0.024 rows=1 loops=1)
InitPlan 1 (returns $1)
-> Aggregate (cost=0.83..0.84 rows=1 width=8) (actual time=0.021..0.022 rows=1 loops=1)
-> Limit (cost=0.28..0.80 rows=2 width=240) (actual time=0.016..0.018 rows=2 loops=1)
-> Nested Loop (cost=0.28..14.55 rows=55 width=240) (actual time=0.015..0.016 rows=2 loops=1)
-> Seq Scan on actor a (cost=0.00..4.50 rows=2 width=4) (actual time=0.008..0.008 rows=1 loops=1)
Filter: ((last_name)::text="WAHLBERG"::text)
Rows Removed by Filter: 1
-> Index Only Scan using film_actor_pkey on film_actor fa (cost=0.28..4.75 rows=27 width=4) (actual time=0.005..0.005 rows=2 loops=1)
Index Cond: (actor_id = a.actor_id)
Heap Fetches: 0
No
With
Nested Loop (cost=0.28..14.55 rows=55 width=0) (actual time=0.009..0.032 rows=56 loops=1)
To understand the difference, focus on these rows:
Nested Loop (cost=0.28..14.55 rows=55 width=240) (actual time=0.015..0.016 rows=2 loops=1)
Before:
After: In Both cases, the estimated number of rows produced by the join is 55 (IE all Wahlbergs are expected to have played in a total of 55 films according to statistics). But into he second execution the LIMIT
actual rows
Value is Much Lower, Because We Only Needed 2 Rows Before We Cold Stop Execution of the Operation of the operation, because of the
Above.
RUN 1, Statement 1: 2.61927
RUN 1, Statement 2: 1.01506
RUN 2, Statement 1: 2.47193
RUN 2, Statement 2: 1.00614
RUN 3, Statement 1: 2.63533
RUN 3, Statement 2: 1.14282
RUN 4, Statement 1: 2.55228
RUN 4, Statement 2: 1.00000 -- Fastest run is 1
RUN 5, Statement 1: 2.53801
RUN 5, Statement 2: 1.02363
Benchmark results: 1
Using our recommended sql benchmarking technique that compares running two queries many times (5 runs x 2000 executions in this case) Languages ​​(to avoid network latency, etc.), we get these results: COUNT(*)
The fastest run is LIMIT
Units of Time, Slower Runs Run in Multiples of that Time. The complete
Query is consistent and significantly slower than the
Query.
Both the plans and Benchmark Results Speak for Themselves. BOOLEAN
Oracle 23C DUAL
With Oracle 23C, we can finally use
Types and Omit FETCH FIRST
Yay!
SQL_ID 40yy0tskvs1zw, child number 0
-------------------------------------
SELECT /*+GATHER_PLAN_STATISTICS*/ ( SELECT count(*)
FROM actor a JOIN film_actor fa USING (actor_id)
WHERE a.last_name="WAHLBERG" ) >= 2Plan hash value: 2539243977
---------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 0 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 6 |
| 2 | NESTED LOOPS | | 1 | 55 | 56 |00:00:00.01 | 6 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| ACTOR | 1 | 2 | 2 |00:00:00.01 | 2 |
|* 4 | INDEX RANGE SCAN | IDX_ACTOR_LAST_NAME | 1 | 2 | 2 |00:00:00.01 | 1 |
|* 5 | INDEX RANGE SCAN | IDX_FK_FILM_ACTOR_ACTOR | 2 | 27 | 56 |00:00:00.01 | 4 |
| 6 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 |
---------------------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------4 - access("A"."LAST_NAME"='WAHLBERG')
5 - access("A"."ACTOR_ID"="FA"."ACTOR_ID")
No FETCH FIRST
,
SQL_ID f88t1r0avnr7b, child number 0
-------------------------------------
SELECT /*+GATHER_PLAN_STATISTICS*/( SELECT count(*)
from ( select * FROM actor a JOIN
film_actor fa USING (actor_id) WHERE a.last_name =
'WAHLBERG' FETCH FIRST 2 ROWS ONLY ) t )
>= 2Plan hash value: 4019277616
------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 0 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 6 | | | |
|* 2 | VIEW | | 1 | 2 | 2 |00:00:00.01 | 6 | | | |
|* 3 | WINDOW BUFFER PUSHED RANK | | 1 | 55 | 2 |00:00:00.01 | 6 | 2048 | 2048 | 2048 (0)|
| 4 | NESTED LOOPS | | 1 | 55 | 56 |00:00:00.01 | 6 | | | |
| 5 | TABLE ACCESS BY INDEX ROWID| ACTOR | 1 | 2 | 2 |00:00:00.01 | 2 | | | |
|* 6 | INDEX RANGE SCAN | IDX_ACTOR_LAST_NAME | 1 | 2 | 2 |00:00:00.01 | 1 | | | |
|* 7 | INDEX RANGE SCAN | IDX_FK_FILM_ACTOR_ACTOR | 2 | 27 | 56 |00:00:00.01 | 4 | | | |
| 8 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 | | | |
------------------------------------------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------2 - filter("from$_subquery$_005"."rowlimit_$$_rownumber"<=2)
3 - filter(ROW_NUMBER() OVER ( ORDER BY NULL )<=2)
6 - access("A"."LAST_NAME"='WAHLBERG')
7 - access("A"."ACTOR_ID"="FA"."ACTOR_ID")
With NESTED LOOPS
, WINDOW BUFFER PUSHED RANK
Uh oh, this doesn’t look better. The E-Rows
Operation does not seem to have gotten the memo from the A-Rows
Operation about the Query Being Aborted. The JOIN
(Estimated) and
(Actual) Values ​​Still Match, so the
Seems to be executed completely. ROWNUM
For good measure, let’s also try:
With FETCH
,
SELECT (
SELECT count(*)
FROM (
SELECT *
FROM actor a
JOIN film_actor fa USING (actor_id)
WHERE a.last_name = 'WAHLBERG'
AND ROWNUM <= 2 -- Yuck, but it works
) t
) >= 2
I had hoped that this undeed syntax belongs only to distant memory
SQL_ID 6r7w9d0425j6c, child number 0
-------------------------------------
SELECT /*+GATHER_PLAN_STATISTICS*/( SELECT count(*)
from ( select * FROM actor a JOIN
film_actor fa USING (actor_id) WHERE a.last_name =
'WAHLBERG' AND ROWNUM <= 2 ) t ) >= 2Plan hash value: 1271700124
-----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 0 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 4 |
| 2 | VIEW | | 1 | 2 | 2 |00:00:00.01 | 4 |
|* 3 | COUNT STOPKEY | | 1 | | 2 |00:00:00.01 | 4 |
| 4 | NESTED LOOPS | | 1 | 55 | 2 |00:00:00.01 | 4 |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| ACTOR | 1 | 2 | 1 |00:00:00.01 | 2 |
|* 6 | INDEX RANGE SCAN | IDX_ACTOR_LAST_NAME | 1 | 2 | 1 |00:00:00.01 | 1 |
|* 7 | INDEX RANGE SCAN | IDX_FK_FILM_ACTOR_ACTOR | 1 | 27 | 2 |00:00:00.01 | 2 |
| 8 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 |
-----------------------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------3 - filter(ROWNUM<=2)
6 - access("A"."LAST_NAME"='WAHLBERG')
7 - access("A"."ACTOR_ID"="FA"."ACTOR_ID")
Syntax, but let’s try what happy with this alternative: NESTED LOOPS
The plan is now: A-Rows
Now, that’s what i’m talking about. The 2
Operation has a COUNT STOPKEY
Value of
As it should have. The
Run 1, Statement 1 : 1.9564
Run 1, Statement 2 : 2.98499
Run 1, Statement 3 : 1.07291
Run 2, Statement 1 : 1.69192
Run 2, Statement 2 : 2.66905
Run 2, Statement 3 : 1.01144
Run 3, Statement 1 : 1.71051
Run 3, Statement 2 : 2.63831
Run 3, Statement 3 : 1 -- Fastest run is 1
Run 4, Statement 1 : 1.61544
Run 4, Statement 2 : 2.67334
Run 4, Statement 3 : 1.00786
Run 5, Statement 1 : 1.72981
Run 5, Statement 2 : 2.77913
Run 5, Statement 3 : 1.02716
Operation knows how to tell its successors to behave. FETCH FIRST 2 ROWS ONLY
Benchmark results: ROWNUM
Whatsies. Indeed, it appears that the LIMIT
Clause is bad in this case. It even made performance Worse than if we omit it and count the complete result. However, the FETCH FIRST
Filter Helped Greatly, Just Like Before With Postgresql’s
I would consider this an optimiser bug in Oracle.
Should be an operation that can be pushed down to various other operations LIMIT
Mysql
-> Rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
-> Select #2 (subquery in projection; run only once)
-> Aggregate: count(0) (cost=1.35 rows=1) (actual time=0.479..0.479 rows=1 loops=1)
-> Nested loop inner join (cost=1.15 rows=2) (actual time=0.077..0.110 rows=56 loops=1)
-> Covering index lookup on a using idx_actor_last_name (last_name="WAHLBERG") (cost=0.45 rows=2) (actual time=0.059..0.061 rows=2 loops=1)
-> Covering index lookup on fa using PRIMARY (actor_id=a.actor_id) (cost=0.30 rows=1) (actual time=0.011..0.021 rows=28 loops=2)
No LIMIT
,
-> Rows fetched before execution (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
-> Select #2 (subquery in projection; run only once)
-> Aggregate: count(0) (cost=4.08..4.08 rows=1) (actual time=0.399..0.400 rows=1 loops=1)
-> Table scan on t (cost=2.62..3.88 rows=2) (actual time=0.394..0.394 rows=2 loops=1)
-> Materialize (cost=1.35..1.35 rows=2) (actual time=0.033..0.033 rows=2 loops=1)
-> Limit: 2 row(s) (cost=1.15 rows=2) (actual time=0.024..0.025 rows=2 loops=1)
-> Nested loop inner join (cost=1.15 rows=2) (actual time=0.024..0.024 rows=2 loops=1)
-> Covering index lookup on a using idx_actor_last_name (last_name="WAHLBERG") (cost=0.45 rows=2) (actual time=0.014..0.014 rows=1 loops=1)
-> Covering index lookup on fa using PRIMARY (actor_id=a.actor_id) (cost=0.30 rows=1) (actual time=0.008..0.008 rows=2 loops=1)
With Nested loop inner join
,
We again get the
Nested loop inner join (cost=1.15 rows=2) (actual time=0.077..0.110 rows=56 loops=1)
Row with the wanted differentice:
Nested loop inner join (cost=1.15 rows=2) (actual time=0.024..0.024 rows=2 loops=1)
Before:
After: LIMIT
Benchmark results:
0 1 1.2933
0 2 1.0089
1 1 1.2489
1 2 1.0000 -- Fastest run is 1
2 1 1.2444
2 2 1.0933
3 1 1.2133
3 2 1.0178
4 1 1.2267
4 2 1.0178
Again, The
is helpful, thought the differentce is less impressive: LIMIT
SQL Server
|--Compute Scalar(DEFINE:([Expr1006]=CASE WHEN [Expr1004]>=(2) THEN (1) ELSE (0) END))
|--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1010],0)))
|--Stream Aggregate(DEFINE:([Expr1010]=Count(*)))
|--Nested Loops(Inner Join, OUTER REFERENCES:([a].[actor_id]))
|--Table Scan(OBJECT:([sakila].[dbo].[actor] AS [a]), WHERE:([sakila].[dbo].[actor].[last_name] as [a].[last_name]='WAHLBERG'))
|--Index Seek(OBJECT:([sakila].[dbo].[film_actor].[PK__film_act__086D31FF6BE587FC] AS [fa]), SEEK:([fa].[actor_id]=[sakila].[dbo].[actor].[actor_id] as [a].[actor_id]) ORDERED FORWARD)
No LIMIT
,
|--Compute Scalar(DEFINE:([Expr1007]=CASE WHEN [Expr1005]>=(2) THEN (1) ELSE (0) END))
|--Compute Scalar(DEFINE:([Expr1005]=CONVERT_IMPLICIT(int,[Expr1011],0)))
|--Stream Aggregate(DEFINE:([Expr1011]=Count(*)))
|--Top(TOP EXPRESSION:((2)))
|--Nested Loops(Inner Join, OUTER REFERENCES:([a].[actor_id]))
|--Table Scan(OBJECT:([sakila].[dbo].[actor] AS [a]), WHERE:([sakila].[dbo].[actor].[last_name] as [a].[last_name]='WAHLBERG'))
|--Index Seek(OBJECT:([sakila].[dbo].[film_actor].[PK__film_act__086D31FF6BE587FC] AS [fa]), SEEK:([fa].[actor_id]=[sakila].[dbo].[actor].[actor_id] as [a].[actor_id]) ORDERED FORWARD)
With SHOWPLAN_ALL
,
The text version does not indicate actual rows, even with
Run 1, Statement 1: 1.92118
Run 1, Statement 2: 1.00000 -- Fastest run is 1
Run 2, Statement 1: 1.95567
Run 2, Statement 2: 1.01724
Run 3, Statement 1: 1.91379
Run 3, Statement 2: 1.01724
Run 4, Statement 1: 1.93842
Run 4, Statement 2: 1.04926
Run 5, Statement 1: 1.95567
Run 5, Statement 2: 1.03448
So let’s just look at what happen in the benchmark:
Benchmark results:
And Again, an impressive 2x improvement for this particular Query COUNT(*)
Conclusion EXISTS
Just as with our previous blog post about N
vs LIMIT
The seemingly obvious is true against in this case where we want to check if TOP
Or more rows exist in a Query. If we blindly country all the rows, then we’ve seen much WorsE performance than if we helpd the optimiser with a ROWNUM
or
Clause, or
in Oracle. ROWNUM
Technically, an optimiser could have detected this optimization itself, but as our previous article about optimisations that do show depend on the cost model has shown, optimisresrs do’rys do always They can
Unfortunately, in Oracle’s Case, The Standard SQL Syntax Made Things Slower (in this Benchmark). This doesn’t mean it’s generally slower for all cases, but it’s somenting Worth looking out for. There are still cases where ancient
LikeLoading …
Ramesh Ghorai is the founder of www.livenewsblogger.com, a platform dedicated to delivering exclusive live news from across the globe and the local market. With a passion for covering diverse topics, he ensures readers stay updated with the latest and most reliable information. Over the past two years, Ramesh has also specialized in writing top software reviews, partnering with various software companies to provide in-depth insights and unbiased evaluations. His mission is to combine news reporting with valuable technology reviews, helping readers stay informed and make smarter choices.