one thing that I see fairly often and
occasionally am guilty of doing myself
is using count distinct and distinct
interchangeably to see the unique values
in a column of data the question is
though are they really interchangeable
so let's start with this table of data
you'll notice that there are some
duplicate values in column one this is
going to be really important for seeing
how the following queries work so let's
say we have count distinct from column
one from our test table and you'll
notice the query returns a count of two
unique values that's great if we want to
see what those values are maybe we just
get rid of the count and say select
distinct from column one and that should
return all the unique values in that
column the interesting thing here is
though instead of getting just two rows
of data back we actually get three we
have a B and a null and while this
discrepancy between two versus three
values is clearly explained in the
sequel server documentation there may be
times where you would want this logic to
match up let's take a look at how we can
accomplish that if you want to get the
distinct values without the nulls it's
pretty easy right we could just say
where column one is not null in a where
clause and that'll basically just return
the values of a and B with no nulls
trying to go in the opposite direction
by getting a distinct count of all the
different values which includes our null
values is a little bit more challenging
and we have a couple different options
our first option which is probably the
worst is to do a count distinct with and
is null and we could do this with a case
when as if we wanted to as well
the idea here is that we're gonna
replace a null value with some unique
value that we know isn't gonna exist in
our data which will then allow the count
distinct to count it the reason this is
a bad way to do it is because that is
null is gonna for sequel server to
perform that check on every single row
in our table or our index which will
just cause poor performance our second
option is to just do a count distinct
which will give us a count with out the
null values and then add an account
distinct when null values exist so that
value will always either be as
or a1 depending on if nulls exist taking
the sum of those two values will then
give us the correct count which includes
bills for our column we are gonna have
to do these aggregations twice and still
do a case statement based on those nulls
the third option is to use a derive
table to do a select distinct which will
include our null values and then do a
count star on top of that derived table
see when we say count with a column
named sequel server will give us the
unique values and not include nulls and
so if you have logic where you do want
to include nulls in your counts or you
don't want to include those nulls the
best thing to do is just test your
queries check the data and make sure
that those nulls are either being
counted or ignored based on what you
want so thanks for joining me again this
week if you're not already subscribed be
sure to hit that subscribe button so you
never miss a weekly episode and I'll see
you next time
thanks
[Music]
[Music]
you
[Music]