hello everybody welcome to this event by
pragmatic works this is going to be our
class on data modeling so welcome to
another event
in this event we're going to be taking a
look specifically at data modeling for
power bi but it's important to point out
that the data modeling that we're
talking about here has been around for
many many years and really it is
agnostic to the technology meaning that
when you're building out a data model
for reporting purposes it does work
across different technologies i was
actually spending my week this week
three days with a company where we were
building out a data model for them for
their enterprise teaching them about
this concept and then how to load those
tables using ssis so completely
different technology but that's what we
were doing this week so hopefully
everybody can hear me i'm watching the
chat just to make sure i see people from
literally all over the world joining us
today so
welcome everybody um it's exciting to
have you guys here so let's jump right
in and just do a real quick introduction
to myself
i am a consultant and trainer here at
pragmatic works i've been with pragmatic
works for 10 years
i spent the beginning of that actually
doing consulting specifically building
out data warehouses building out
dimensional models for customers and
helping them load that data
for reporting purposes into their
systems the last four or five years i've
been doing specifically training helping
companies learn how to do these things
how to implement their data model how to
work with power bi in azure and if
you've followed us for any length of
time you know that
i've had an opportunity to author a few
different books over the years
i blog at mitchellpearson.com not as
much as i do youtube of course youtube
is a little bit easier for us since we
record all the time
and then i have a wife and three kids
here in florida so florida united states
that's where most of our consultants and
trainers are located at
and i also enjoy playing table top games
now
you already know we have a youtube
channel pragmatic works make sure to
take a moment to subscribe to that
channel so you don't miss any events
like this in the future
and there we go if you want to reach out
to me you can reach out to me elm
pearson
pragmaticworks.com so we have a lot of
content today that we're going to get
into so i say it's time to let's jump
right in and take a look at some of the
logistics and the agenda
if you've been to any of these big
events that we do every three months we
generally have some kind of cell either
for boot camps or on-demand learning
we have classes and boot camps that are
structured of course around data
modeling because data modeling is very
important
and so my co-worker matt is monitoring
the chat today he's going to drop right
there in the chat window a link directly
to where you can get
our on-demand learning for a year for 50
off where we have classes on data
modeling i actually did a class a couple
years ago just on the dimensional model
regardless of the technology and then we
also have our dax boot camp that we do
about once a month
i think the one coming up is actually
full though so you might not be able to
sign up for the one in august but in
that class we actually talk about
dimensional modeling and stuff like that
at the beginning because it's important
to everything that we do in power bi
so a little bit about the logistics here
if you want the class files feel free to
go down into the description and you'll
find the class files there there's a
quick little form and then you can
download the files there's also a course
completion certificate people love that
take the course completion certificate
fill it out and post it on linkedin for
it so more people can come and watch
this and be more educated about power bi
in general so definitely go ahead and do
that
um
the timing today is going to be from 11
to 2 hopefully we'll get done a little
bit before two o'clock eastern standard
time and then we'll take a break
somewhere in that 12 o'clock eastern
hour probably around 12 20 12 25
something like that a quick 15 minute
break so i can recover from all the
talking that i'm going to be doing here
and good morning bud nice to see you all
right
so the agenda the agenda is going to be
primarily really focused on at the
beginning here laying a foundation for
what are facts and dimensions a
dimensional model a star schema you've
probably heard this mentioned of before
but what is it why is it important so
that's where we are going to start then
we're going to get into how you create
the data model in power bi because you
can do that in a couple different places
this week i was with the customer we
were doing that in management studio we
were creating tables and relationships
and foreign keys right that's a
different technology but the terminology
that we're talking about today applies
right so we're going to talk about how
to create that data model in power bi
we're also going to cover what if i have
multiple fact tables how does that
impact things
aggregate tables calculation groups and
then different storage modes joule free
i saw that i appreciate it thank you sir
all right so
with that being said before we jump
right in i want to recommend
a couple of different books here first
of all the data warehouse toolkit was
the first book that i ever read
over 10 years ago
that's what the company gave me and my
good buddy dustin ryan who's over at
microsoft now kind of walked through
that with me and mentored me the data
warehouse toolkit is a phenomenal
awesome book you also have the star
schema the star scheme i actually
recommend and i like a little bit more i
think it's more
consumable i think it's easier to digest
it's a little bit more straightforward i
would recommend if you want to dive
deeper into this that you get
one or both of those books because
they're both great
data modeling in and of itself let me
just be very clear is something that you
literally can spend a career
doing and learning so you're never
really i would say an expert in data
modeling because it's more of an art
than it is a science you have to model
your data based on your requirements and
what you're trying to accomplish right
but the good thing is that for power bi
because power bi is a lot of times a
smaller subset of tables you don't need
to know all of those different things
and you know have years of experience to
build a pretty solid data model in fact
we do that for customers in our
hackathons all the time where we'll do a
two-day five-day hackathon where they
bring their data we teach them on their
data how to build a model and how to
have something consumable at the end so
you can do a lot with the basic star
schema that we're going to be covering
in this class today
and peter says that you should go with
the data warehouse toolkit
i agree with you peter it's a good book
all right
so let's jump right in and talk a little
bit about data modeling now as we're
diving into this one of the things that
we should do
is we should talk about things that you
need to consider as you start to build
out your model you're looking at your
data you start to build out your model
what are the things we need to be
thinking about one what are you
measuring
what are you trying to accomplish right
am i trying to the customer i was
working with this this this week was
looking at invoices for accounts payable
that they were purchasing and so very
specifically they were trying to see
what the invoices were and what was paid
and what was left to be paid right and
so they knew very specifically what they
were trying to accomplish so you want to
know what you're measuring because
that's going to impact the tables that
we're going to be creating in your model
also what types of business problems are
you trying to solve generally you
already know the answers to this because
you can go in and look at existing
reports and ask the end users hey what's
not in here or what would you like to
have in this report that's not already
there right and so they will give you
that information and say look here's the
problem we're trying to solve so then we
can build out a model that will help us
to solve that problem
how much data are you working with this
becomes extremely important when we're
working with power bi specifically right
because
unlike maybe azure sql database or a
database power bi is going to put
everything ideally in memory so we can
get optimal performance but it also
means that generally speaking there are
some exceptions we're going to be very
limited on how much data we can get in
so how much data are you expecting to be
working with today and then how much
work or data do you expect to have in
the future and if you know that you're
going to have significantly more data in
the future then a solution that you
build today might not scale it might not
work so we got to go with a different
method and then of course what are your
data sources and where's the data coming
from because that will also impact the
data that we're bringing in and where
we're getting the data from all right so
there's many more things to consider
than what's on the slide right here and
i'm going to talk about this today is
going to be very conversational we're
going to get into a practical demo that
will take up a lot of our time but it's
going to be very conversational i'm
watching the chat window and if i see
some things that really drive home some
points i'll make sure to mention that
and bring those things out
so attributes of a good data model very
important one it should be easily
understood and consumed now one of the
things we do and i don't almost want to
mention this because we do so much of it
already i'm not necessarily trying to
get a bunch of people to sign up for
this but we do virtual mentoring
one-on-one mentoring with customers to
help them out to understand the concepts
of azure power apps power bi things like
that better right
and in those sessions as they pertain to
dax and power bi
90 of the time the vm calls that i'm on
one on one comes down to the data model
the data model could have been designed
in a way that was more optimal so that's
a very diplomatic answer the data model
could have been designed in a way that
was more optimal than what it currently
is and many of you probably have power
bi models
power bi data models and if you open
them up you realize that it's going to
be very difficult to explain to people
the different tables that are in that
model that means it's probably not
easily understood and easily consumed so
that's definitely going to be the first
point here attribute of a good data
model
also we want to make sure that large
data changes are scalable what that
means is that as you continue to add
more data to your data model it
continues to perform well and so we can
do things like make sure that it
continues to perform well by building it
the way we're going to talk about today
in a way that is going to perform well
over time
also it should provide predictable
performance meaning that every time you
refresh your dashboard you refresh your
reports you change your slicers
those are those things can be
predictable you should expect if it
takes five seconds to run one time you
run it again it should still be around
that five second time frame it shouldn't
take 57 seconds to run just because you
change the slicer on one of your
dimensions
and then of course it needs to be
flexible and adaptable this is one of
those things that is very very important
so i see a couple of questions in the
chat will the slides be available i will
add them to the
the download after the class so the
current class files do not have that i
will add them and then yes all of our
sessions are always recorded if you go
back to our youtube channel you'll find
many of these three hour events that are
there and ready for you to consume
now when i say flexible and adaptable
what i'm talking about is that we build
a model that supports what we're trying
to do today but then in the future we
can add additional tables and it still
supports that and this really comes down
to and i'm going to go back to my buddy
dustin ryan over at microsoft one of the
things he told me i'll never forget
probably the best piece of advice he
ever gave me
is when you build a data model it never
gets simpler it never gets easier it
always grows and becomes more complex so
if you build a data model from the very
beginning with five or six tables and
you're already building that data model
in a way that is complex then when you
start to add tables in the futures and
additional columns and additional
requirements
it's going to eventually just get out of
hand and out of control right and so you
want to make sure from the very
beginning
we keep it as close to what you see on
the screen here this star schema i'm
going to talk about in a minute as you
possibly can okay so flexible and
adaptable very important but not at the
expense of being understand understood
and things like that
so
another reason that we talk about data
models is because there are certain
things that are going to be a lot easier
if you have a good data model and that's
going to be things like managing storage
constraints right so if you put
everything in one big flat table it's
not going to be good for your storage
requirements it's going to take up more
space in your data model
also if you want to performance tune
your data model having everything in a
lot of different tables and not
consolidating those tables the way we're
going to talk about today can cause a
lot of problems for performance and make
it harder to performance tune mainly
because you might have to write some
crazy dax to get things to work and many
many times you guys know i've done a
three hour session on dax i have a dax
boot camp coming up i've been doing dax
for over 10 years
actually about nine years i think it is
but
a lot of the problems and those vm calls
and helping customers out when i'm doing
dax the really complicated scenario
of writing dax like complicated decks
usually is a result of a bad data model
if you have a really good data model
writing dax becomes a lot easier we also
row level security row level security is
pretty easy to implement even dynamic
row level security inside of power bi if
you have a solid data model now future
analytics says hey can we consolidate
two fact tables the answer is yes you
can have consolidated fact tables but i
generally recommend against that so if i
have a sales table and i have a returns
table i would generally create those as
two separate fact tables and i drill
across those fact tables mr future
analytics
i drill across those fact tables by date
by product by customer whatever those
related dimensions are so generally
speaking you would keep them as separate
fact tables because consolidating them
actually does cause certain challenges
in the future as well all right i won't
be able to answer all the questions
there's zooming by over there but if i
see a couple i can answer just so you
know i'm watching i'll do that
um all right so
good data model hopefully we're on the
same page here it's important
everything is going to be easier and
better if you have a good data model
that is true now star schema you've
heard of it
it's been whispered of
you you've crossed its path at some
point in the past what is it what is the
star schema the star schema is a way of
building a data model that is designed
for reporting purposes there are many
different ways to model your data there
are many different ways for different
purposes if you connect to some kind of
crm system dynamics 365.
salesforce right if you connect to some
kind of
system like that there's going to be
lots and lots of tables and it looks
crazy but it's designed to support a
very specific purpose right
when we're building a data model for
reporting and analytics we're designing
the model to make it easy to report from
to improve query performance so there's
some trade-offs there
the reason it's called a star schema is
because when you surround your main
table
with your descriptive tables
it looks like a star right all of your
tables are essentially one join away
from the fact table so from a
performance perspective it's going to
work really really well it's going to be
scalable
there are a lot of questions here about
bridge tables what are your thoughts on
building bridge tables a lot of times
bridge tables are things like
dimension tables
a fact table is a bridge table it's a
mini to many bridge table so bridge
tables are absolutely necessary
but you got to be careful with certain
bridge tables if it's not necessary so
it's one of those it depends scenarios
we won't get into that today we do cover
it in our different classes
so this is going to be an example of
star schema now
a snowflake which i see some people
mentioning here and peter is referencing
a snowflake is when you start to
normalize the data so you start to have
more and more tables so the geography
table filters through the customer table
filters down to the fact table so now
you have to go through multiple tables
to get to your fact table
generally speaking we want to try to
avoid this not that it's inherently bad
in and of itself but it again starts to
complicate the data model right so we're
trying to avoid
getting in there and adding these
additional tables if we can model it a
different way right so with geography
here i can potentially model geography
so that i build the relationship
directly to the fact so now i don't have
to go through my customer table or
through my screening table i can go
directly to my fact and it's one joint
away
and so i'll talk more about that when we
get to our actual practical example
today in our learn with the nerds event
all right so that is a snowflake and
snowflakes start to look
hard to consume hard to understand
because it's a lot of sometimes
unnecessary tables that could have been
consolidated all right so that's going
to be a snowflake schema
there's a couple of different model
types that you can work through as you
start to build your model what i do when
i'm working in a hackathon with
customers or back in the day when i was
doing consulting and we would go to
different consult you know companies and
help them build out their data models is
generally i would start with a
conceptual model very high level just a
visualization of what we're thinking so
we could see all the different tables
that were going to be involved in that
model that we built right
then you get down into more specific
details like the logical model so i have
my sales table what are going to
actually be the columns that are in that
sales table
transaction number product key you know
sales date sales amount what are going
to be the actual columns in my product
table so that's more of a logical model
that makes a lot of sense
then we have our physical model the
physical model specifically like in a
database is where you start putting
things in there like the data types if
it's nullable foreign key and primary
key constraints and things like that
today we're going to be focused
primarily on the conceptual and the
logical model and then the physical
model is built in power bi but it's not
like it is in a
database kimberly says does dim equal
dimension yes very astute observation
kimberly you have won yourself 50
mitchell bucks for the presentation
today good job dim does equal dimension
and fact equals fact for fact tables all
right
so the conceptual model is going to look
something like this a lot of times i
will use a conceptual model and i'll
build a conceptual model in excel right
this week monday i was doing a hackathon
with a customer i opened up excel i said
let's do it and we started modeling out
what the tables were that we thought
were going to be necessary for the
reports that they wanted right and then
we went to their system that of course
has thousands of tables across multiple
erps and we started pulling out the
tables that we needed to to really
identify the columns that would be part
of those tables now in power bi
usually building the data model is way
less complex than that right it's an
excel file or it's a some view
somebody's already created for you you
just need to pull it in and then turn
that into
a
turn it into a data model
so
on the next slide here let's dive a
little bit more into dimensional model
concepts so again we're going to talk
through some of these points before we
dive a little bit more into it
so the dimensional model itself is again
it is a very specific type of data
modeling that is designed for reporting
purposes making it easy to retrieve data
and making it easy to understand the
data that we're trying to access right
this has been around for a very long
time going back to the data warehouse
toolkit that we talked about with ralph
kimball and then of course the star
schema pragmatic work this is all we did
for our first you know
we've been around for 15 years as a
company for the first 10 years we just
did consulting and going to companies
helping them build data models and
helping to report on their data a fact
table is an event that may or may not
include measures now what i mean by that
is there are different types of fact
tables
generally speaking when we think of a
fact table we think of something very
standard like a cell a transaction a
line item on a transaction right so
mitchell went to this store on this date
bought this product for this amount and
this quantity that is a line item that
we enter into our fact table for the
purpose of recording that fact or that
event
we also have
factorless fact tables fact tables that
don't have maybe measurable items like
sales amount a student taking a class an
interaction with a customer we do a lot
of work with healthcare companies and
education and a lot of times their fact
tables don't actually have numerical
data in it they're doing a lot of counts
and distinct counts and things like that
but it's tracking the event that
occurred when they had an interaction
with a member whether it was a phone
call or an in-office visit or whatever
it might be so there's different types
of fact tables um that are just relevant
to different industries and different
businesses a dimension table also known
as a descriptive table is a table that
describes your data so if i come to you
and say hey we have five million dollars
in sales you're going to say that is
awesome and that is great however
by what year by what product by what
geography by what customer by what store
and as you start saying by what that
starts to identify
our dimensions that we need to put into
our model because in those dimension
tables we're going to have a group of
related columns that will describe our
data for us and give us more information
right so those are going to be dimension
tables an attribute is just terminology
that references a column
depending on what system you're in
but an attribute is a column in your
dimension table that describes the data
diving in a little bit more to fact
tables a fact table can contain measures
sometimes it can be items that are
already pre-aggregated if you build like
an aggregated fact table but they're
they're generally items we're going to
aggregate in some way whether it's a
count an average a sum a max whatever it
might be
and it's
defining a business process right we're
recording a business process that we
want to measure
examples might be things like claim
claim amount for an insurance company
screenings which might have no
measurable items but it's more of event
based total claims cost sales any of
those kind of things are examples of
fact tables and just for the sake of
everybody else here if you have an
example of a fact table put it in the
chat window we would love to see what
your example of a fact table might be so
everybody can get ideas of what fact
tables would be measures are generally
going to be sliceable right by year by
country by state by whatever and then
examples might be by month month by
member by customer by geography whatever
it might be so that's going to be your
fact
fact table as we've talked about before
the fact is an event that may or may not
include measures
one of the things that's very
interesting about fact tables in general
is that when we define our fact table
historically speaking and you'll find
this in the ralph campbell book you'll
find this in the star schema we
generally recommend that you store the
data at the lowest level of information
or what is known as the the granularity
is very atomic very detailed the reason
for that is because you can always roll
up the data you can always roll it up to
the month level you can always roll it
up to the category you can always roll
it up to the store but if you store it
at the store level you can't roll down
you lose descriptive capability you lose
what we call dimensionality and i've
been to customers i've been consulting
for a long time i've been to customers
who went through the process of building
an enterprise data warehouse and they
rolled it up to the day level because
they didn't think they would ever need
the individual transaction
and they realized after they released it
to the end users that the end user
started asking questions about the data
that their model couldn't
answer and in those enterprise data
models you might spend six months or a
year building out a very large data
model and i've been to customers where
we've had to rebuild their entire data
model because they did not plan that out
ahead of time right
and so
you want to store it at the very most
granular level of detail however this is
a little bit of a challenge in power bi
sometimes because mitchell what do i do
if i have hundreds of millions of rows
billions of rows of data and i know i
can't import all that into power bi now
i have to start looking at different
options for optimizing performance but
getting all the data that i need
can we do that yes yes we can there are
some awesome options that are available
to us to handle those types of scenarios
but that's why it's important to
understand at the very beginning how
much data do you have what are your
business requirements what are you
trying to accomplish we talked about
that right all right so that is
granularity
a dimension table is descriptive as we
talked about before it's descriptive and
it contains those descriptive attributes
that define how the fact should roll up
so i'm going to roll it up to the
product roll it up to the category roll
it up to the country roll it up to the
year whatever those descriptive
attributes are
examples of this might be by month by
customer by geography and i threw out a
bunch of other ones as we've gone
through this as
well laura said been there done that
hopefully you're not talking about
having to rebuild an entire data model
laura uh that's rough that's a rough
rough time
relationships are very important in
power bi if you're not familiar with
relationships go take a look at my three
hour dax presentation because dax is all
about relationships right every dax
calculation that we write is about the
active relationships that are defined in
the data model and it is very very
important
generally speaking a relationship an
ideal relationship between a dimension
to a fact should be a one to many the
one side of the relationship is the
dimension the mini side is the fact
table and what that means is that the
one side whatever column you build a
relationship on date key customer key
product key geography key whatever it is
in the dimension table
that column the values should be unique
you never find a duplicate in the fact
table it could show up multiple times
because every time a customer buys a
product they get recorded in the fact
table so that's the mini side of the
relationship so that's a one to many
that's ideal i don't want to see in my
data model a relationship where i have
the date multiple times over here and
the date multiple times here and i build
it on a mini to mini relationship
directly that can cause problems should
you have a date table in your data model
i see that conversation happening over
here in the chat window the answer is
absolutely yes that is going to make a
big big difference do not use the date
directly from your sales table create a
date table that has all of those
relevant things and we'll kind of cover
that quickly and briefly later on in the
presentation all right
so
when we're talking about the dimensional
model it's going to be look very very
similar to what you see on the left hand
side here where you have your fact table
and then you have your different
dimension tables
a lot of people that are working inside
of power bi today come from an analyst
background or come from working with
excel and so what we see instead is
what's called
a highly denormalized database that
looks like this all of the columns all
of the data is in a very
simple single table and at its core that
actually can work really really well it
can perform really well but there are
later down the road you can run into
storage problems you can run into
problems where it's not adaptable and
flexible as you want to add more stuff
to it
so you want to build it out as a star
schema from the very beginning also it
can make writing dax very complex as
well so we want to build out a star
schema lots and lots of benefits there
including making it easy to understand
when we're talking about a dimensional
model and i'm going to keep using these
terms so you get familiar with them i've
seen a lot of other people write blogs
and youtube videos over the years and
they change the names they try to make
them easier i don't think it's hard to
to pick up
when we're talking about a dimensional
model there's two types of tables that's
it there's facts and there are
dimensions now there are variations of
these there are aggregated fact tables
there are snapshot fact tables there are
accumulated fact tables there are many
different types of dimensions no doubt
but again in power bi generally speaking
you're not working with all of those
different types of fact tables and
dimensions because you're working with a
much simpler data model than at the
enterprise level many many times
so now if we go down here
to the next slide the dimensions
themselves define the who the when the
what the where the why and the how
context surrounding the business process
that we are measuring whether that's a
cell or a claim or an interaction with a
customer that's what dimensions do i
think we've already figured that out we
talked about it let's go ahead and slide
on to the next slide here
one of the things to know about
dimensions when you're building
dimensions is that
dimensions generally are going to be
very very wide meaning that they have a
lot of columns because again we're going
to take a lot of tables that have
related attributes or related columns
and we're going to join them together
into a single table to get it more into
that star schema that we're talking
about so they're generally going to be
very wide
also as we talked about a moment ago
with relationships generally our
dimension table is going to have what's
known as a unique identifier or a
surrogate key
we don't do this as much in power bi as
we do in an enterprise data warehouse
but it's a unique identifier on that
table
that doesn't necessarily relate back to
the source system so you might pull in
from your product
data wherever that is and it already has
a key in there that's already unique
well for the purpose of
data warehousing a lot of times we'll
add in our analytical warehouse data
warehouse the new tables that we're
building we'll add another key that's a
hundred percent unique
very interesting there
there's the natural key the natural key
is that unique key that came from the
original source system the best
attributes in a dimension table should
be descriptive you shouldn't have things
in there like product code right product
code that says you know ab123 people
don't know what that is what i'm never
going to use that in a report so it
doesn't necessarily need to be inside of
my model right however
descriptive attributes are going to be
awesome
start date and end date you'll see a lot
of times because that can be relevant
for some more advanced scenarios and
then flags and i saw a lot of
conversation here on date tables date
tables always have a lot of flags on
them is working day is holiday is
weekend those kind of things and that
those flag columns are very very
important on your dimension tables
because they actually help when we get
into writing dax later on when we're
trying to look at very specific
scenarios
having those flags there make it a lot
easier to filter your reports make it a
lot easier to write your dax make it a
lot easier to do slicers and things like
that
peter as always i see you answering all
these questions you're doing awesome
thank you peter for your help all right
so jump into the next slide here
oh that was it demo time wow that was
quick came upon me faster than i thought
so what we're going to do is we're going
to jump right in now and we're going to
start looking at the data that we're
going to be working with today and we're
going to go through a practical example
of how do we
take this data and start to model it out
into a star schema pull it into power bi
and then start scheming uh cleaning that
up and let's see if we have a couple
questions here can we have more than one
star scheme in the same model that came
from fahad the answer is absolutely yes
right because you can have multiple
facts and those multiple facts would
have their kind of be their own star
schema within a model so yes good job
are flags bullion from paul yes they are
generally they are going to be a one or
a zero one equals true zero equals false
some people do yes and no and that's
okay as well if that makes a little bit
more sense for you i'm in accounting
what is a date table a date table logan
is a table that has a list of all your
dates from the very first day that's in
your data to the very last day and then
you can have a lot of other things in
there like the calendar year the fiscal
year the fiscal month things like that
all right
tons of questions here we'll dive into a
bunch more later on all right so let me
go ahead and hit that escape button
right there and then in your class files
i'm going to go ahead and open up the
student files and bring that over and
what i want to talk about as we do this
is
as i always tell you when i'm teaching
and i'm doing these three hour events i
try to pack these things full with a ton
of information for you so i go a little
bit faster through the demo so if you're
trying to follow along it's probably not
going to be the best experience in the
world and that's why we record it and
make it available to you and encourage
you to go sign up for our boot camps and
on-demand learning so i am going to go
through it a little bit faster i would
encourage you not to follow along if i
look in the chat window and you're like
too fast i already told you i've warned
you i gave you a heads up so i'm going
to go a little bit faster but let's look
at the data that we're going to be
working with today so i have in the
files that you downloaded this excel
file right here and this excel file is a
very flat table that has all of our data
in it so we'll give that a moment to
load and then it's loading over here
it's thinking about it this is actually
got about 700 000 rows of data so it's
quite a bit
not a ton but it's quite a bit of data
and i'm going to bring it over now this
is probably the most common scenario
that i see
the users that we work with usually have
these really big excel files that have
hundreds of columns
and that's what they try to bring into
power bi and it doesn't really work
causes all kind of problems because
power bi is much different than excel so
if i start with a model that looks like
this i want some help in the chat window
and it is about 15 or 20 seconds behind
but start telling me what are dimensions
that you see
inside of this table right here so what
are dimensions that you guys see in this
table i'm going to open up another excel
file
so that we can build out a logical model
here
so we'll give it just a second i'd love
to get some interaction from the group
here
all right so first while we're waiting
for the lay the delay and the lag in the
video to catch up the first thing that i
notice is that
each record in this table currently
represents a transaction that occurred
right so we know that the granularity of
our fact table just by looking at this
data is that one row in our fact table
is going to represent
a transaction that occurred
by
what and when we get oh there we go
product customer
segment awesome date geography man you
guys are crushing it great job yes all
of those are
dimensions 100 correct right so that's
exactly right and so let's kind of build
out i'll show you how i build out a
conceptual model here
inside of excel i generally go over to
insert
you guys are still crushing it there
customer
segment manufacturer i didn't see that
one earlier good job
so what i'm going to do is go into
insert
and under insert i'm going to tell it
that i want to do
smart arch let me go ahead and make it a
little bit bigger right here
and under smartart i'm just going to
bring in something that i can work with
like this all right
now inside of this we can start
identifying our conceptual model
remember this doesn't have columns or
anything like that but the first thing i
have here is going to be my fact cells
right we can see when i first look at
this excel file that
each record pretty much represents a
cell or a transaction that occurred and
so we get that
the next thing is we start identifying
our dimensions right and so dimensions
you guys have already started saying
them like date data is a very important
dimension that pretty much every data
model should have so we're going to have
our dim date let me see if i can make
this a little bit bigger here
another dimension that we identified is
going to be our customer you guys
brought that out right gem customer
geography is an interesting one let's
talk about that in just a moment
um we're also going to have product
because we're selling a product so we
want to be able to break it down by
product product product category so
we'll have dem product here
and then another one that we're going to
have
is
y'all mentioned campaign i'm actually
going to skip campaign for the sake of
the presentation but that is a valid
dimension for sure normally i would add
that in i'm going to skip manufacturer
also we have category and segment that
goes with product
we have geography geography is an
interesting one we'll come back we also
have customer i forgot about customer
here so we have customer let's add that
in
so this is going to handle oh i did do
customer already so let's see we got dim
customer product date
i think that is the majority of it from
the beginning we also have
geography i'm going to skip manufacturer
i'm going to skip campaign that's going
to be the majority of it now geography
can go a couple of different ways
and you can argue this point a lot
it could be that we take the geography
and we just put it in the customer table
so we just put the city the state the
region the district for each customer we
track that in the customer table so it
just becomes more of the customer table
and we filter from there
that that has some pros and some cons i
won't go through all of them
pros is that it might
it it's good it simplifies the model i
can filter from there the cons is it
might not be flexible versatile for the
future because in the future i might
have some other fact table that i bring
in that doesn't track customer like
inventory that's a good example we bring
in a fact table that tracks our
inventory levels and that table
connecting to that i might connect on
geography but it doesn't relate to
customer at all so now how do i build
the relationship from dim customer to
that table when that table doesn't have
customer so it might make more sense to
actually have here
my dim geography as a separate table
that i relate directly to fact cells
another way to do it another way is you
can actually relate dim geography
through dim customer to fax cells and
actually normalize this or snowflake
this out a little bit
again
none of those are inherently bad but i'm
just kind of talking through different
ways to do it
all right
so
that's going to be kind of our
conceptual model now that we've gotten
that let's pull it into power bi and
let's walk through and let's talk
through how do we actually
bring this all together
we got a question here can we merge city
and state
so i would keep them separate but you
could also merge them as another column
just so that when you put it into like a
visualization a map visualization in
power bi it has less chance of actually
mapping it in correctly so less
ambiguity so many many times i do merge
them together eric let's see what you
got here date customer manufacturer
product dim category dim geography yes
category we can move into the product
table but we're going to do it actually
as a separate dimension good job we'll
talk about that here in just a little
bit
you oh if i had more than four for this
sue you can just come over here and you
can add them here
all right
so let's go ahead and move this back
over here i'm going to open up power bi
and we're going to actually jump into a
practical
demo which is what everybody loves
anyway because that's how we learn
so i'm going to pull power bi open
and pull that over to our screen let me
close that excel workbook right there
excel's a funny one sometimes when
you're working with csv or excel files
if they're open you can't really connect
or do anything and so what i want to do
very briefly here is connect to that csv
file that's inside of our class files
and i'm going to go ahead and open that
up
all right
and then of course we need to clean this
data so this is the practical way that
you would do this you say mitchell i'm
not an i.t professional i'm not i
haven't worked in business intelligence
for years i don't work with
you know i don't work with things like
sql server databases i don't know how i
would break those tables up the way that
you're saying that's okay that's great
i'm going to show you exactly how you
can do this right here in power bi and
make this work so the first thing that
we're going to do is connect to your
data that big flat excel sheet that you
have
and then go ahead and click on transform
data
when you click on transform data you
know i know everybody knows that is
going to open up
power query editor and the power query
editor is specifically designed for
cleaning transformation and curating
your data right
michael said would you add category as a
separate dimension on the fact table or
an extended dimension from the product
dimension i would actually michael
generally speaking add the category into
the product dimension so i'd have the
category the subcategory the description
all of that in the product dimension as
one single dimension generally speaking
there are other ways to do it but that's
how i would normally do it in this class
in this session i'm actually going to
split it out and i'll show you why when
we get there later on so stay tuned for
that
the dimensional model is a denormalized
data model yes it is it's not fully
denormalized like a flat table but it's
it's not normalized like third normal
form like re you know like erp systems
crm systems like that that are designed
for different purposes so yes it is a
denormalized data model all right so
i'm going to avoid questions for just a
moment so we can dive in now when you
look at this table this actually is very
convenient for us because it already has
a bunch of ids and keys but i'll talk
about that here in just a moment the
first thing that i want to do is over
here on the left we have our sales table
i'm actually going to go ahead and give
that a name we'll call that fact cells
and i'm going to right click and i'm
going to duplicate that table a couple
of times and i'll explain
exactly what we're doing and why we're
doing it here in just a moment you can
also use something known as reference
reference is great but you can't use
reference if you're going to join back
to the table which many times when
you're doing the process that i'm doing
right now you actually would
join back because it doesn't have a key
like a lot of times your flat table
doesn't have a customer id it just has
customer information and you have to
create a customer id a lot of times it
doesn't have a product id and you have
to create it which i'll show you how to
do that here in just a moment so i'm
going to go ahead and
let's duplicate this one more time and
this will get us a couple of our
different tables that we want today
now the next thing i'm going to do is go
ahead and rename this one so let's build
the product table first there was a lot
of conversation there about products so
kelly my table is about 700 000 rows i
think so not too big
sizeable but not too big
i'm going to right click right here
let's go ahead and hit f2 and we're
going to call this one our dem product
table
now once you're done with your model
generally speaking i don't actually
leave the prefix of dim in fact on my
table names i would remove those for the
end user but for the sake of the class
i'm going to leave those in there so
that everybody understands where our
dimensions are and where our fact tables
are all right so that's going to be the
dim product and then what i want to do
is get rid of all the columns in here
that are not related to the product
right because we're creating our product
dimension so up here at the top i'll
click choose columns this is basic power
bi stuff very powerful very capable
and i'm going to go down the list and
get rid of everything that's not product
related so we're going to keep product
id
we're going to keep our product name
category and segment are great unit cost
and unit price
might be relevant per product
and then our i think that ends up being
it because that goes with the city yes
so we're going to click ok
and that gets us this right here
much smaller table that only has the
information that's related to product
now the next step is we have a bunch of
duplicates because we pulled this from
the fact table so
each time a product was sold it was
recorded we don't want duplicates in our
dimension table as we talked about
before so i want to remove duplicates so
i'm going to select all of my columns
using the shift key
right click at the top
and tell it that i want to remove
duplicate rows and that's actually going
to take it from being well it doesn't
tell you in the power query editor how
many rows it is but it's going to reduce
it a lot so this just reduced
a lot of the rows of data that we have
again you'll notice that we already have
product id which will automatically join
back to the fact sells table but
sometimes you don't sometimes you don't
actually have product id so you have to
add your own the way that i would add my
own key if it didn't already exist
is i would do it like this i would go up
to add column and i would tell it i want
to add an index column
and i would click the drop down and i
would say from one and so that will
actually create a new column over here
that starts with a value of one and
automatically increments all the way
down giving me a unique value now
assuming that i did not have a product
key okay i'm giving you some extra stuff
here what i would now do is i'd have to
merge back to my fact table so i'd come
back over to my fact table and i would
join the two tables together and i'd
bring the key from the product into this
so i would join them on product name
product name is probably sufficient
enough i join on product name and add
the product key back into this table
very important step if you don't have
that product id we don't need that but
we will later so later i will have a
practical example that we'll do so this
is a precursor kind of building the
foundation of what we'll talk about
later in this class
i'm going to come over here and remove
that index now
this table is done there's really not a
whole lot more to do later on we're
going to break this out some of you
mentioned a category dimension we're
going to break it out
and kind of snowflake this a little bit
later on
this one right here i'm going to go
ahead and name this this is going to be
my dim
customer table all right so we're going
to build out a dim customer table
just like we did with the product the
process is going to be
remove all the customer all the columns
all the columns that are not related to
customer we want to go ahead and remove
all of those right
so what i'll do is come back up to the
home ribbon here at the top
and tell it that we want to choose
columns all right and i'll go ahead and
choose columns and then we're going to
get rid of all the columns that are not
related to my customer so we want
customer id
we want our
what else do we got we don't actually
have a whole lot here we have our email
name we have city state region district
and country now this is actually a
critical point where we have to make a
distinction here and i'll talk about it
again there's a lot of conversation in
the ralph kimball book and the star
scheme of book even though i haven't
actually read them in a few years
the geography could be
kind of modeled out in a few different
ways and so you've got to plan ahead and
know how might i build out my model in
the future
because when i'm building this out so
jeremy's asking what's the benefit of
breaking out these so why build a star
schema i would say go back and watch the
recording if you join a little bit late
because we talk about the importance of
building out the star schema flexibility
versatility adaptability also it's going
to make things like security performance
writing dax easier by modeling it this
way and it's going to take up less
storage
do duplicate tables refresh when
the master gets updated when you refresh
your data model everything that's in the
data model will refresh unless we turn
it off and we're not going to turn it
off so yes it will so we're going to get
rid of i'm going to keep geography now
here's the thing
you can create geography as its own
separate table which gives you more
adaptability and flexibility in the
future if you add more fact tables right
so you can directly filter to those fact
tables on the geography you could add it
into the customer table so that you can
just quickly filter down cells by
customers that live in certain areas
nothing wrong with that you can also
break it out
into another table that filters through
the customer table to the fact table so
you don't have to store that geography
key
in the fact table and all three of those
are relevant ways to build out the
geography table in fact i'm actually
going to build a geography table and i'm
going to snowflake it just so you can
see what that snowflake looks like
so i am not going to include city state
region district or country just the zip
code to relate back
so all i'm going to keep is customer id
zip code and email name and click ok
again we're going to have a lot of
duplicates in here because we're getting
our list of customers from our fact
table and every time the customer was
part of a transaction we recorded that
information
so now i can go across and grab all of
my columns
right click and then remove duplicates
again
again we already have our customer id
right customer id is already here so we
don't have to worry about creating one
let's make sure that step worked and it
did
so now the other thing i want to do is
let's break this out and get our email
our last name and our first name so some
real quick data cleansing steps in the
power query editor that's what i'm going
to do now
all right i see people talking about
type 2 historical dimensions getting
crazy over there matthew i didn't see
the context behind that conversation
though all right so i'm going to right
click here and i'm going to first of all
split this column
by delimiter
and i'm going to split it by the the
colon here so we'll say custom and it's
going to be colon and a space
just like that and i'll click ok
and that gives me two separate columns i
also want to go ahead and split out my
name into a last name and a first name
column so i'll split that out real quick
right if you've ever worked in power bi
you know how easy these actions are i'll
right click split column by delimiter
again
and i'm going to go ahead and tell it my
big head might be in the way apologies
for that i'm going to tell it that we're
going to do custom and then we're going
to do comma
space
and then we'll click ok one more time so
now we have our last name and we have
our first name let me go ahead and
rename those columns
i'll put spaces in here as a best
practice and then we'll do the same
thing for the first name so this is i'm
trying to keep this very simple so we're
not doing a lot of data cleansing
but this is definitely part of the steps
all right so last name and first name
i also want to clean up the email to get
rid of any of these open and closing
parentheses so i will go ahead and
david hunt says third normalization form
is the best for better modeling i
disagree david i disagree not for
analytical and reporting purposes it's
way too normalized way too many joins
way too hard to understand
i'm going to right click on this and i'm
going to replace values i'll get rid of
we'll do open parentheses replace with
nothing i'll do it again right click
remove values if you have replace values
if you have never done this stuff in
power query you're getting a
real quick kind of breakdown of how to
do these things and so we'll call it
email so now we have this table that has
our zip code our email our last name and
our first name i noticed that zip code
was turned into an integer that's not
good so we got to fix that
that was probably on this step right
here
yep right there
so i am going to go back over to my zip
code
see if we can find it real quick and
we're going to turn that back into a
text value
make sure if you're following along with
this recording later that you're doing
it you're clicking on the change type
and then coming over to the zip code
here changing that to a text we'll do an
insert and we're going to replace the
current so we're going to replace it
we're going to get rid of the fact that
it was turned into an integer and turn
it back into a text value
a lot of times
integer values will have leading zeros
on them so if you turn them into a
number you lose that and globally
they're not always numerical all right
so definitely be careful with that
all right so
we can go back down to the very bottom
step here and we now have our customer
table
now if we're going to build a geography
table again there's a couple different
ways to do it i have a way that i'm
going to do it and i'm going to talk
about that here in a little bit so
i'm going to now build out our dim
geography table real quick we'll right
click rename this right here
and then i'm going to call this one dim
geography
all right so this is another duplication
of that fact table
and then i'm going to go ahead and get
rid of all the columns again that are
not related to our geography
so i'll go up to the very top we'll do
choose columns and then we'll get rid of
all of the columns that are not zip code
city state region district and country
now the benefit of this i think david
said you know third normal form
is good so the benefit of third normal
form is that third normal form reduces a
lot of redundancy as far as storing data
the bad thing about third normal form is
that as you
build your model out in a way to reduce
redundancy so where you're storing so
that's kind of what i'm doing here by
moving these out of the customer table
you start to create a lot of extra
tables generally and now i have to join
from my geography through my customer
through this through that to get there
and that can cause performance problems
that can cause difficulty writing dax
that can cause complexities in your
model and it can also make it very
difficult to make that model adaptable
as you add additional fact tables in the
future so there can be a lot of debate
there but
third normal form for dimensional
modeling for analytical purposes i would
not i would not consider the best
practice all right so we'll click ok
there
and that brings us back to here
again i need to take my zip code and
make sure i convert that back to a
string this is something that power bi
does so you got to be aware of that and
i'm going to go ahead and take that and
turn it back into a text value insert
and then replace
to take that and convert it back
so now we have our geography table if i
want to relate this to the fact table
right if i want to relate this to the
fact table
the fact table has a zip code in it so i
could hypothetically use the zip code
ideally i would prefer to create a key
on this table and then relate it through
the key but that takes a little bit
longer than we're going to have in our
presentation today so i'm not going to
do that but what i will do is i want to
get rid of all of the duplicate rows
that are in here because again we're
going to have a lot of duplicates so i'm
going to go ahead and remove duplicates
if you're noticing a very distinct
pattern with what we're doing here
that's because it is this is a very
common pattern for taking data and
actually building out your dimensional
model
inside of power bi and again most of the
models that i see are these kind of big
flat tables that are coming from excel
or coming from views that some erp
system has provided so the end user is
used to looking at those views so then
we take those views and we break them
out into this dimensional model that
gives us a lot more of that flexibility
and i see desktop data crunching in here
that's devin knight mr devin knight from
pragmatic works joining us so thank you
for that uh comment there all right so
now we have our dim geography table that
we have created
the other thing we need to create very
important here and i gave it to
mr matt peterson to share with the group
is we want to go ahead and create a date
table there are a lot of awesome date
tables created by the community that
have hundreds of colleges tons and tons
of columns i'm going to actually go over
here and i'm going to pull up
desktop data crunching over here mr
devon knight i'm going to go to his
website and i'm going to go down to
power query and i'm actually going to
copy out some elm code that devin has
here that i always use for my projects
it's a very simple
im query that generates a date table and
generates it with quite a few necessary
columns but it's not too much
so i'm going to copy this out
matt will drop the url the link to this
in the chat window so you guys will have
access to that if you want that later so
let me go ahead and copy this
and then i'll pull that back over
and i am going to create a new source
and i'm going to do a blank query so a
blank query is if you wanted to write
inquiry yourself everything we're doing
in power query editor every click every
transform is actually creating ilm query
in the background and generally the ui
does a really great job at that but from
time to time you might actually want to
write your own code
and it's very very powerful language so
we're going to do a blank query here
and when i get my blank query i'll go
into the advanced editor delete
everything there and then paste in the
code that i copied from devin's website
then i'll come down to the bottom and
click done
and that's going to create something
known as a function
and when i run the function it'll
actually create another object in here
that's a table so i think this data goes
all the way back to 2011 so i'm going to
do 1 1 2011. i'm going to go all the way
back and then i'll go to the end of this
year
like so
you can go into the future matt peterson
and i were looking at a really odd
scenario the other day where a customer
was getting some really weird results
and it's because their model was in the
future so when they were doing like
current year and year over year for
multiple years the numbers were the same
and it was the same because they had
built their data model to go into the
future so you can get some odd results
if you plan ahead and you go into the
future but a lot of people do future
proof their model and go 10 years ahead
so they don't have to worry about coming
in here and updating the date table
every year so that's something to think
about something to consider i'm going to
click invoke when i click invoke it
creates a date table i'll give that a
nice awesome name
that'll be our dim date and then i'll
have to go ahead and change the data
types real fast
date that'll be a whole number
that's going to be a text
another whole number very important to
go through this step
because these abc123s that you see here
are actually notified as like in any
data type meaning power query is
confused it didn't properly give them a
data type so i'm going to go through and
just modify them very very quickly here
all right
basic stuff but important stuff now
we now have our date table that we can
relate back now you might be noticing
right that there's a lot of stuff that's
actually missing from my date table and
some of those items are going to be
things like fiscal year fiscal month
month number of year is weekday is
holiday very important things that a lot
of times when i'm working with customers
we actually wind up adding into the
model
to make it easier to build our reports
and to make it easier to offer more
complicated dac scenarios that they have
but this is a very simple model that
gives us what we need for analytical
purposes
now the next part that i want to cover
here is taking a look at what this looks
like in the data model and then talking
about how we can come back and add
additional
um adding additional tables to our model
and then that will get us to a very good
point where we can take our break right
here in the middle of the session today
so i'm going to go ahead and jump back
over to
up here at the very top
and i'm going to go ahead and click
close and apply which is going to close
the data and load it into my data model
generally speaking i like to actually
save the model first so let me go ahead
and save it
and i'll call this
i'm going to do save and i'm going to
say apply later the reason i do apply
later is so that if it happens to crash
i at least saved all the work that i did
now i'll hit close and apply just in
case i actually in my dax video if you
guys saw that i actually had something
happen right before break
where it broke and i had to go back and
fix it kd said you called ilm language
what it's actually it used to be known
as the power query formula language but
everybody called it em language so then
they um just renamed it ilm officially
do you have a recommendation for a good
power query ilm language book
i do have a book
it's probably the best one that i have
heard of or seen and it's this one right
here
i never read it it is smells very very
clean because i just don't find myself
doing a lot of inquiry to be honest with
you there's a lot of other ways to clean
and transform the data but that's
probably the book i would recommend
there was one before that called ilms
for data monkey or something
and that was pretty straightforward but
this is probably the book that book
right there all right
so the model has been loaded
whenever you load your data into the
data model
the first thing that i want to do is go
take a look at those relationships and
make sure the relationships have been
defined and built the way that we
expected them to right so i'll go over
to
the model view
and in the model view i can start to
kind of
see if this is going to look like a star
schema now you're gonna notice right out
the gate and we're gonna have to make
some modifications to this so we're
gonna be diving into some of these other
conversations and concepts once we get
back from our break
but
i already have this snowflake that's
starting to occur right here where
geography filters through dim customer
through the fact to the fact table
again we could model this differently
i'm going to go with this for the
purpose of this session i also need to
build a relationship from my date table
to my fact sales
if you're not aware of this i'm going to
show you something real quick if i come
over to my model my report view and i
build a very simple visualization here
that is going to be
my year
let's do year here and we'll make sure
it's not summarized
and then i bring in from my fax sales
table my total sales let's see if we can
find that real quick i think it's unit
price there we go
you're going to see that we get a
duplicated value all the way down now i
talk about this in some of my other
presentations i know i talk about this
in my three hour dax presentation
but this right here shows me the same
values all the way down which tells me
something is wrong with my active
relationships in the data model doesn't
exist it was built incorrectly it's not
working the way i thought it was
supposed to be working something's going
on and so this is what we're talking
about when we talk about how we want to
have those active relationships because
they will automatically do
filtering and grouping and those kind of
things for you automatically inside of
your model so i'll come back over to the
report view
and in the report view i'm going to
build that relationship from the date
table over to the fact table on the date
column sometimes you'll have a date key
if you have a date key that's like a
integer value so it's not an actual date
you want to make sure you mark your date
table as a date table
but i'm going to come back over to the
report view
and now it's working correctly right
it's automatically filtering
based on the active relationship
in the data model
so this is
looking at our model this puts us in a
really good place now we're actually
doing incredibly great on time even with
answering questions so what i want to do
is i want to dive in and talk about
multiple fact tables and what that looks
like
it's very common to have multiple fact
tables let's just think about this for a
moment right
if i come back over here and i look at
my report we have a pretty standard data
model but you might have other fact
tables that you want to add to this
things like we have we have sales but we
also have returns returns is a separate
fact table right returns is a separate
fact table where
i would build that into my model so if i
had a fact table here called returns
somebody asked about this earlier would
that still be a star schema and the
answer is essentially yes now in my fact
returns table i'd build a relationship
to my date table i'd build a
relationship back to my product table
i'd build a relationship back to my
customer table and
it would have really it'd be sharing the
same dimensions that my fax sales table
has
this is important so in the dimensional
modeling world
when we build our dimensions we want to
build those dimensions so that they can
form across the multiple source systems
but also so that they can form across
our
different fact tables so we could build
a fact returns table and i would not
consolidate this i would keep this as a
separate table what are some other
examples of tables we could build in
here well we could also build a fact
table that might be inventory levels
right that would be very common to have
some kind of fact inventory
another one that i come across a lot is
budgets forecast right we're trying to
forecast what it should have been for
this year so what
what should have been our total sales
for this month for this year for this
product
that's a very interesting one
so that right there would be fact
budgets or fact forecast whatever it
might be
right
and so that is another fact table and so
it's very common
that yes zoom it is what we're using
zoom it zoom it good job katie so
it's very common that in the future
you're going to want to add additional
facts in here because now you can find
out how much did we sell how much do we
have in inventory what was our forecast
are we on budget are we on plan are we
on goal are we on target right like all
those questions we want to answer that's
why at the very beginning we have to
know
what it is we're trying to measure and
what we might be trying to measure in
the future so we can start building our
model that's going to be good in a way
to be flexible and versatile if that
makes sense so what i have inside of our
class files
is i have another file in here
called fact budget now
the original format of this that we
actually cover in our class that was
originally from a microsoft class that
we teach in collaboration with microsoft
was a crazy file that requires a lot of
data cleansing so
i was unsure of timing so i went ahead
and cleaned it all up but i'm going to
go ahead and bring in the fact budget
now i'm going to open it up first and i
want somebody here to tell me what do
you see
that's different the date looks weird in
here i'm not going to mess with that
you're not actually going to be able to
answer this question so i'll just tell
you
this table tracks a combination of
category and segment
at the day level and it gives you a
forecast or a budget for that category
that segment at that day level
so actually it's at the month level and
so this is this is very important
because this is a different level of
granularity a different level of detail
than our fact table so that's a lot
that's a lot let me close that down i'm
going to bring that data in real quick
let me make sure
that it was a csv file
and it was all right
so i'm going to bring in that csv file
real quick by going up to get data at
the top
and i'm going to go ahead and do text
forward slash csv grab that file and
then do open all right
all right so let's talk about this i'm
going to bring it into the power query
editor just in case i need to clean the
data because i don't quite remember
so this fact table like i said it had a
lot of work that needed to be done i
already did it but a lot of times when
you get like aggregated fact tables or
budget tables or even inventory levels a
lot of times they are rolled up to a
higher level
this is a really really important
concept if you look at this you'll see
that urban
convenience for the forecast
is being stored at the month level so
we're forecasting it at the month level
not the day level nobody said hey we
expect that you know for this category
for this segment for mountain bikes for
bikes mountain bikes for january first
we're gonna have twelve hundred dollars
in sales right nobody nobody does it at
the daily level generally for budgets or
forecasts we do it at the monthly level
we do it at the yearly level
the problem with that is
it's at a different level of detail than
what's in your fact table now in my
advanced dax class in our online
training and in our advanced power bi
class i kind of talk about how to model
this out and how to write the dax to
solve that problem if i'm looking at
like july 15th if i'm looking at july
15th and there's 31 days
how do i take the monthly forecast and
break it down to only 15 days within the
filter context so i can see if i'm on
track or not
very important that's something that's a
little bit more advanced that we cover
in some of our other classes
but you'll notice that
i don't have a product here we're not
doing it at the product level we're
doing it at a higher level at a
sub-category level right
and so i can't build a relationship
directly from my product table
to
this table like there's not a
possibility really for me to do that
because it doesn't exist because this is
at a higher level of granularity so when
you roll a fact table up when you create
like an aggregated fact table so you you
roll it up to the day level you roll it
up to the month level when you do this
you lose dimensionality i cannot filter
this any longer by the individual
product because it's not here we lost
that dimensionality that's why you got
to be careful especially with like your
primary business process rolling it up
because you can lose that
descriptibility and that's what causes a
lot of people to have to go back
and literally have to rebuild their data
model from scratch because you can't go
back and change all of that
so
in order to build this relationship so
that i can drill across from fact cells
to fact budget
this is where we need to build out that
category dimension that many of you
caught and many of you mentioned earlier
when we were going through our initial
kind of
phase of looking at the data so
generally speaking
i would build my
category into the product table i would
generally do it that way however
now that i'm looking at this data here i
realize that if i break out the category
into its own dimension i can filter both
the fact budget table and our fact cells
from that one dimension and drill across
which is very important right
so here's what we're going to do
that's a weird john that that formatting
that you see there is just a weird
excel thing so i'm not messing with that
too much here's what we're going to do
i'm going to make this into a date we're
going to take our break our break is
going to be 15 minutes long and when we
get back i'm going to show you how to
build the category segment dimension and
then we're going to get into some other
concepts and things that we want to talk
about specific to data modeling with
power bi some really cool stuff that
we're going to have when we come back
from our break
all right so i'll see everybody back
here in
oh that messed up
let's see if we get rid of that change
that over to a date real quick
i'm going to fix that in a minute i'll
re-import that while we're on a
10-minute break or 15-minute break see
everybody back here in 15 minutes
all right hello and welcome back
quick but much needed break there we're
going to dive right back in and i was
looking through some questions so we're
going to talk about some of those
questions here as we dive back into it
and matt let me know that i had actually
forgot a critical step because i don't
actually have very specific notes here
in fact i have no notes i'm just kind of
walking through what i knew i wanted to
talk about today
and so i'm going to pull back over
excuse me our power query editor and
there's a couple really important
questions i want to talk about
first of all let me fix our original
fact table so if i go back over to my
original fact table once we duplicate
that and we break all of our different
dimensions into separate tables we then
need to come back and just remove all of
those extra attributes and columns that
we don't need inside of our fact table
we don't need them there anymore right
so i'm going to go through and remove a
bunch of that information so i'll do
choose columns here i think somebody
asked about that
and so i'm going to go down the list and
i'll get rid of we need product all of
our id columns we need we didn't keep
campaign so i'll get rid of it
i'm going to keep product goes segment
goes because we can get all that
information from the product table right
manufacturer goes we'll keep unit cost
and price because that'll give us our
cost and ourselves
zip code is going to go email city state
region
there we go so very very small a lot of
times fact tables are not very wide but
they have a lot of rows all right so
i'll click ok
and then before i forget i have a very
special thing that i want to share with
the group here today and it is primarily
uh just thanking peter for his
contributions and all of these trainings
and so peter
you are going to be awarded with the
certificate of top technological
superiority it's awarded to you for
earning the distinct honor of accruing
the most mitchell bucks so if you want
that i'll send it to you in pdf format
you can put it on your linkedin put it
in the office in the break room i'll
send it to you just send me an email but
we thank you for your contributions
in the chat
now we're getting a lot of questions
about data modeling a lot of awesome
questions maybe we can do some kind of
follow-up event where we do a q a that
would be awesome so let us talk about
that internally and keep your ears out
for that
what i would like to do now is kind of
dive into what i heard about
type 2 dimensions type 2 dimensions are
a little bit advanced i had to do this
all the time when i did
you know data modeling for customers and
consulting what a type 2 dimension is or
a historical dimension is it's this idea
that when something changes in the
dimension you want to track history that
is great that's one of the big benefits
of a dimensional warehouse where your
original relational
data warehouse doesn't normally store
that information right so we're going to
in my let's give an example in my
product table if the price of a product
changes i want to be able to go back and
historically look and say how much of
that product did i sell at this amount
versus this amount what was the quantity
what was my gross profit margin all of
that right
that is historical information so what
you do is every time and this goes back
to a very key concept it's a little bit
more advanced but i'm going to introduce
it to answer some questions this goes
back to actually adding into your table
what's known as a surrogate key a unique
identifier that is different than the
original id column that came from your
source so if you think about this in
this product table we have an original
id column right here that came from the
source and right now we've been using
that
as our relationship to the fact table
however
based on the questions we've been
getting from kd and others if you want
to track historical information you
can't use that key to join back to your
fact table instead you have to actually
create a surrogate key in your
dimensions
so what will normally happen
is i would come in here and sort my data
in some very specific way so maybe i
sort it by
um usually some kind of date so if you
have a date on here remember we talked
about like start date and date if i
could sort this table first of all by a
start date that can kind of enforce
consistency so my surrogate keys don't
change in the future somebody asked
about that
but you would have a date normally on if
you have a historical you'd have a start
date end date i would sort it by the
start date end date in this case we'll
sort it by product key so i'm going to
sort right here
i'm going to sort this in ascending
order
on my product id
all right and then once you do that we
now have to create an sk and the sk is
what we store back in our fact table not
the product id
so we'll do that the way we talked about
earlier you go back up to the top add
column add an index column starting from
one and this becomes my product s k now
let me talk about that just a little bit
more this is getting a little more
advanced than most people care about for
power bi but i'm trying to answer the
question
we'll call this my product sk
if you have a product that changes price
and you want to track historical
information on that price then that
product id will now show up in this
table twice it'll show up once
for ten dollars
and it'll show up again when the price
changed to eleven dollars
so that product id is no longer unique
so it no longer gives us that
one-to-many relationship that we talked
about earlier that we wanted back to our
fact table right
so that's one of the main reasons why we
do the surrogate key
like we talked about a moment ago
so
this is what i would store in my fact
table now because my fact table doesn't
have that what i would actually have to
do is join back to my fact table very
interesting i'm thinking through this as
i talk
i joined back to my fact table on
whatever makes a row unique so that
would be product id
and then you'd have to i actually have a
youtube video on this next part uh maybe
we can put that in the description or
something later we'd have to do a range
look up when did the product sell and
then you'd have to look up to this table
because remember a product shows up in
this table multiple times for each list
price so then you'd have to join back to
this table and you'd have to say
where the start date the date of the
sold product is between the start date
and the end date of the product here to
get the id to add back to the fact table
this is all part of you know etl here
this this kind of idea of extracting
transforming loading data and getting
the dimensions and the fact tables
correct
i don't see this a lot this idea and
this concept of type 2 historical
dimensions i don't see that a lot in
power bi but that is effectively how you
would deal with that if you go take a
look at our on-demand learning online
and you look at my data modeling class
that is very specific to dimensional
modeling in general again i created that
class years ago i probably don't have a
goatee in there but the concepts have
been relevant for years and i would talk
about this there i'm sure all right
so
that's not for everybody that's extra
that's free no charge thanks to the
questions from kd and others so that
covered our fact table cleaning it up we
would not remove duplicates on the fact
table because everything in there should
be legit why not unit cost and so i left
unit cost in the fact table dentist i
actually left it in there it should
still be there right because that gives
me my cost yeah i still have it there so
i can quickly figure out my total cost
total
um sales and then gross profit and
things like that so i actually still
have it
all right so now the next thing i want
to do is let's talk about how to build a
relationship to fact budget i'm going to
hit close and apply so you can see the
problem
that we currently have when we bring in
the fact budget table and i appreciate
all the positive comments in the chat
window that's awesome glad you guys are
enjoying it and having a great time that
is our goal we love training at
pragmatic works brian devin myself
manuel our team we used to be
consultants and when we sold consulting
we were very happy to keep the training
company and keep doing this because we
really love training
people to do this so now i have my fact
budget i can build a relationship to the
date table
on the date and that right there is fine
right but how do i build this
relationship over to like my product
table
i can't right because i have product id
but we're not tracking our budget at the
product level we rolled it up to the
category we rolled it up to the segment
so this is where i have to do what you
guys mentioned earlier in the class
where you said let's build a category
dimension
right and so we'd have category that
filters product that filters cells so it
is also kind of going more towards
normalization it's going more towards
snowflaking than a star schema again
there are multiple ways to model this so
i'm going to show you a way to model it
doesn't mean that i would necessarily
model it this way myself it would depend
on other criteria and other requirements
that i gathered during that gathering
requirements phase when talking with the
customer looking at existing reports and
seeing what all was relevant there so
here we go we're going to go back into
the power query editor and build yet
another critical dimension for our data
model so i'm going to go back over to
our transform data
and in the transform data we are going
to i'm going to actually i could do this
a couple different ways
i think i'm going to
duplicate the work that i've already
done here so i'll right click on dim
product to duplicate that table
and from dim product we're going to
create a dim cat segment table category
segment so we'll give that a name
all right
and so this is only going to be every
row in my dim category segment table is
going to represent one combination of
category and segment right so it's going
to be unique that's going to be my
unique identifier so i'm going to grab
category and segment right click and i'm
going to remove all the other columns
because we don't need that that's step
one those columns are out of here
now once those columns are gone
we're going to now add a surrogate key
we're going to add a unique identifier
on this table so that we can come back
so we do need to remove duplicates
before i forget
there we go
if you wanted to sort them first
alphabetically or what have you
you can sort them before adding your
surrogate key but i'm going to go over
to add column
index column from 1. we did this before
kind of as a precursor to this module
earlier in this session today so now i
have my index that part is awesome and
we're going to call this our uh i would
normally call it something like cat
seg whatever the name of my dimension is
you know key or sk or whatever so i'll
call it sk
so now i have my key now here's the deal
i need to join this sk
back to my product table because the way
this is going to look so we can all be
very clear is you have your category
that filters down your product that
filters down your sales table
so
what key relates your category segment
to your product table so that it can
filter it down accordingly well it's the
surrogate key but remember our product
table don't have the surrogate key so
how do we get that key into the product
table that's what we haven't done yet in
this session so this part is really
important let's do it together so i'm
going to go back up to the top and i'm
going to go over to my product table
and i'm going to do a join back to this
so we'll go over to home
and we're going to do what's called a
merge a merge is like a vlookup in excel
or a join in sql we'll do a merge and
i'm going to say merge queries
now this is why i've been doing
duplicate if i would have done a
reference
it won't let you merge back because it's
dependent on that item so then you get
this weird circular dependency issue but
if you duplicate it you can join back
which is what we needed to do
i'm going to tell it that i want to join
from my product table over to my dim cat
segment and we have to join on category
and segment in both tables so what makes
a u row unique is the combination of
those two
all right we'll do a left outer is fine
inner join really would be sufficient
here but we'll do left outer and then
i'm going to go ahead and click ok
and that's going to if you've ever done
a merge in the power query editor that's
going to give us a new column here that
has a table and we tell it all right
from that table return these columns and
return these rows of data right so
that's the next step that we're going to
look at here
so now i'll go over to expand and tell
it what columns i want to return i don't
need category or segment we already have
that in our table we don't need that we
only want to bring back the category
segment key right there so then i click
ok
and now we have that category segment
key in our product table so this is how
you build those dimensions out if you
don't already have the id also
as many of you astutely noticed earlier
in the class we need to go in and remove
from this table
we can now get rid of category and
segment because we don't need it here if
i want to filter from category or
segment i filter from my category table
so i can go in here and say you know
what let's go ahead and right click and
remove columns and we're good to go all
is right with the world now
we've done that
this is now going to release over relate
over to our fact budget table
which also we need to join it to that
one so this table will also have the key
so we got to do the same exact process
again because right now
i can't really filter from my dimension
to this table because this table does
not have that sk there
so now oh what is on demand learning
what a great question so on-demand
learning is recorded classes that we
have we have over 70 recorded classes on
our platform on azure powerapps sql
server ssis of course power bi data
modeling so it's online learning that
you have access to for a full year if
you purchase it and you can go through
and watch them as many times as you want
over and over again so that is on demand
learning
all right so we got our fact budget here
i'm going to do another merge
we're going to merge our segment table
over to this one
and we're going to join on category and
segment so that we can get the key into
this table there we go we'll click ok
and then i'm going to expand that right
there and we're going to return only the
category segment key that's what we want
and then again once we get that key in
this table we no longer need the
category or the segment so we can remove
it from the table reducing that
redundancy
how did you fix the date issue in the
fact budget oh i split it out gary i
just did a um split
so i came up here split by delimiter
split it out by the space to get rid of
the time and then i was able to convert
it to a date sorry about that super
quick all right
i am going to
get rid of these two columns right here
because we got our key so that looks
great right click and remove columns
beautiful this gets us where we want so
let's go back and load the data into the
data model take a look at what we got
and then we still have about another
hour that we'll talk about some other
concepts that i want to dive into today
so if i hit close and apply
we should have our new dim category
segment table we'll build the
relationships and then i'll talk about
how that relates to another fact table
right the fact budget table that we
brought in as soon as it's done loading
here in just a second
let's see if we have any questions while
we are waiting
so steve that's an interesting question
it's very rare to filter down a dim
table to records that you don't need
generally you'd have to look at the
number of you know if i'm only bringing
back the last two years of data from my
fact table then maybe get a distinct
list of you know the keys that are in
your fact table and do an inner join to
the dimension the inner join will
automatically get rid of all those other
ones and only bring in those relevant
keys that's probably the best way
because you got to be careful filtering
down dimensions because if you filter
down the dimension then you don't have a
match in the the table and you get
blanks when you start to to do group
buys and filtering right steve so i
would say look at your fact table get a
distinct list of your product key get a
distinct list of your customer key
for whatever the time frame is
then you can use that list do an inner
join to the dimension
inner join of course only brings back
the rows that match on both sides of the
join that would get rid of those extra
ones and then bring in only those rows
to power bi so i would clean it up in
sql i would do that in sql for sure
alright guys so now if you look at this
we have our category segment table that
filters down our dem product that
filters down our fax cells and then we
have our fact budget that is being
filtered by both dim category and dim
date
now a lot of people will start to
mistake adding additional fact tables
into the model as you know normalizing
the model or turning it into a snowflake
but that's not necessarily true in fact
one of the really cool things about
power bi is we can kind of look at a
diagram of our separate fact tables
right so if i come down to the bottom
right here
i can add a layout they're called
layouts and i can call one called fact
cells
and in this layout right here i'm going
to bring in those relevant tables just
so we can
you know as your model grows and your
model should if you build it really well
from the beginning it'll grow you'll
have an opportunity to continue growing
that model and adding in flexibility
versatility so on and so forth right and
so i'm going to add in a couple of
our tables here real quick we'll add
that in we're going to bring in our
customer and then we'll bring in our
geography so this right here is our fact
table and how it is related to the
dimensions in the model that are related
to that table
right there so that's a layout for fact
sales now i'm going to create another
layout real fast that's just for the
fact budget
so i'll go back over here and click
right there
and we'll do fact budget
and we're getting trolled by david again
um with his third normal form absolutely
not third normal form david
uh we're going to go over here and bring
in our fact budget and then i'm going to
bring in a couple of dimensions so the
dimensions that were related to this was
the dim category segment and the date i
did not change the data model i did not
change the relationships or anything
like that all i did just now was say
this is an easier way to look at the
data and understand it right so i can
understand the relationships that are in
my model which are very important when
you get into writing more complicated
and more advanced dax scenarios
now what i want oh that's cool carlton i
didn't know that i did not know that
let's test that out carlton says
i might have to take that um
how do i remove it from this one
from there we go remove from diagram
if i right click on this add related
tables there you go good job carlton i
like it
good job
all right
so
what do i want to do next i want to show
you guys some other things about just
data modeling in general that you should
be aware of i'm going to go back over to
the
report view here
and in the report view what i want to do
in the report view is i want to go ahead
and build a couple of very quick
measures now by the way for those of you
who are kind of following along and
looking at the stuff here
if you pull up the class student files
and you look right here you'll notice
that i have the starting file i have the
completed file and then i have the
completed file with dax so there are
different versions of the files that you
can open up and you can look at if
you're going back through this at a
later time and you're following the
recording all right
so let me go ahead and slide that back
over right there
kevin said classes are out of date so
they're actually right now kevin we are
in the process of updating both the
introduction to power bi class and
multiple other classes so unlike a lot
of other training platforms we actually
do re-record the class i think power bi
we're re-recording for like the 15th
time because the ui does change many
classes don't necessarily change right
like sql writing sql writing you know
data modeling so those don't get updated
as often but the azure ones and the ones
on power bi we unfortunately have to
update very often so we have team
members doing it manuel quintana right
now this week is updating the
introduction of power bi class all right
so i am going to do what create a couple
measures let's create a couple measures
because i want to show you something
really really cool that we can do with
our data model inside of power bi
and this is going to introduce actually
i forgot to add it but i'm going to add
another column in a minute so let me
create actually i want to talk about
something first i forgot to add my
column i want to talk about something
known as
role-playing tables
a lot of companies have role-playing
tables even if you're not quite sure
what they are right so a role playing
table is where you have a date maybe a
date that you want the date to play the
role of the order date the ship date or
the due date or you have
an address and you want the address to
play the role of the bill to the ship to
um and the invoice too right different
uh maybe geography locations and so i
built a model and everything in my model
is predicated off of the order date so
marketing loves that sales loves that
but then my manufacturing team or my
production team or my shipping team they
come over and they say mitchell i want
to see sales and quantity sold and all
of that i want to see it based on the
shipping date
and if you know
power bi and you understand the active
relationships in the model you know that
the filtering that occurs and the
grouping that occurs automatically
happens on the active relationship and
our active relationship right now is
predicated on the order date right
so let me fix this and cause a problem
that we can then fix so we're going to
cause a problem we can fix i'm going to
go back over to the home tab go back
into transform data and i'm going to go
into my fact table
and we're going to add another column
here in fact let me
let me make this a date first of all
and then i'm going to call this order
dates
so this will be the order date the date
that it was purchased and then i'm going
to add a new column on this table
inside of ilm using the ilmquery
language i'm going to write a custom
column here
we'll call this one our ship date
and we're just going to hypothesize that
every time the ship date is going to be
three days after the order date right so
we'll do something like
date dot add days
and then
let's see what the intellisense tells me
i think you have to bring in your order
date first
and then the number of days i was hoping
intellisense was going to help me out
here but no such luck
it looks like that's going to work oh no
date is not how's that getting passed
all right i think that's it
729 perfect all right
so i created a table that is a little
bit a column that's a little bit
different right so 729 is the ship date
order date is 726. so this is going to
cause a problem for us let's go take a
look this is going to be great so if i
go back over to home i'm going to do
close and apply load the data again and
then what we're looking at right here
right if you look at this table
this table is representative of the
total sales by the order year the year
that it was ordered because the
relationship in our model is based on
the order date in fact
very important to understand that by
going back and looking at it right so if
i look at my dim date table right here
oh wrong one sorry about that we got to
go over to our actual fact sales table
if i go over here you'll see that the
relationship is from date to order date
so the filtering occurs on the order
date and that's what i want to take a
look at now so when i'm looking at this
right here i see the relationship so
look at the numbers right
10 595. now if i go over to my model and
i change my model over to
the ship date because you know they say
hey we want to see it based on ship date
so i change that over and click ok
and i come back over here is it still 10
595 no it's not it's not 10 595. now the
number has changed and it's changed
because now we're filtering on a
different column we're filtering on the
date that it shipped not the date that
it was ordered okay
so how do you get both how do you get
the best of both worlds i have one team
that wants order date one team that
wants due date one team the one ship
date one day team that wants invoice
date and paid date how do we get them
both to work here
well there's a few different ways of
handling that i'm going to show you a
really cool method i have a youtube
video on this matt actually has the
youtube videos and matt if you can drop
that in the chat window so people can go
back and look at that later and it says
i'm out of focus so let's see if we can
um
how would i even fix that here
that refresh
let me see i don't want to spend too
much time on that let me get out of here
unfortunately
let's go back to full
picture and picture it's still out of
focus
short of kind of turning it off and
turning it back on which i can't do
right now i'm just going to be out of
focus a little bit so we'll go full
because that's aggravating all right so
what i want to do is let's go ahead and
talk about how you would solve this
problem generally speaking first of all
let's build a couple of very quick
measures for those of you who are
following along again be aware that
following along is not necessarily
the the best option here because i am
going to go a little bit faster but i'm
going to create three measures on this
table so i'm going to go and create a
new measure that'll be called total
sales
so this will be my total sales measure
which is going to equal the sum
of my fax sales
and then unit price
so that'll be the first measure that we
create and we know that that right there
represents
the total sales
from my fact table within the current
filter context which from my date table
is the order date right so we'll go
ahead and give that a format of
english united states
there we go
and then i'm going to create another one
real quick so let's create another
measure on this table this is going to
be called my total cost
that's going to equal the sum
of my fact sales
unit price
unit cost there we go
we'll do that
and then we'll format that one as well
and then let's build two more quick
measures so the next one's going to be
total transactions
total transactions are going to equal
count rows so i'm just going to count
all the rows from my fact sales table
within the current filter context of
course we'll give that a thousand
separator and then just for the heck of
it because everybody loves time
intelligence i'm going to build a
year-to-date sales calculation so we'll
build one more measure on this table and
this will be our year to date sales
and that's going to equal total ytd
our measure of
total sales
and then we'll pass in the date column
from our date table
all right and we'll make that
a currency as well there we go
all right so the reason i'm building a
couple of measures is because i want to
show everybody kind of what this looks
like
once we start to put this into our
visualizations right so
if i bring in total sales total cost
and then total transactions in
year-to-date sales all of these measures
are being filtered by this year right
here and that year is now my ship year
remember we changed it from order year
over to our ship year right
so
i'm going to go ahead and switch that
back real fast
and we're going to switch it back over
to order date and then click ok again
now there's a couple ways to handle this
one of the ways that you can try to
handle this is you can try to handle
this by essentially going in here
and
i'm going to try to turn on
picture-in-picture again
i'm back in focus perfect um you can go
back in here and you can handle this by
creating multiple date tables
sometimes that's just not feasible
multiple customer tables multiple
geography tables multiple so on and so
forth
another way to handle it is through dax
inside of your data model so i could
build another relationship from the date
to the ship date and so what i have is i
have two relationships i have one that
is active that's on order date and i
have one that is inactive that's on ship
date by default the system will only use
the order date the active one right so
how do i get both how do i see
my total sales my total cost my total
transactions my year to date based on
the order date and i can see all of
those same measures based on
my shipping date well normally if i have
50 measures in my model i got to rebuild
all 50 measures overriding the current
filter context like so but i'm going to
show you a cool trick so hold in there
if i go back over to my report view
i can come in here and say all right
let's build let me show you i can say
i'm going to build a new measure
and this measure is going to override
the current filter context and use the
inactive relationship so i'll do
something like you know total sales
for shipping date
and that's going to equal
and then i'm just going to type this out
now i'm not going to teach you dax here
go take a look at my three hour session
that somebody just dropped in the chat
window
and i'm going to do total sales and then
i'm going to tell it i want to use
that existing relationship so i'm using
a different relationship than the
default
and i'm going to take it from let's see
fact cells
and we're going to use the ship date
here
and we're going to relate that back to
our date table
like so use relationship does not create
a relationship it will use a
relationship that you've already defined
so you must define that relationship in
your model first all right i'm going to
go ahead and hit enter
and then i'm going to get this cleaned
up so everybody can see the results side
by side real quick
there you go
this is total sales based on the active
current relationship in the model and
the other one is total cells
where we have modified the filter
context
using
the use relationship function and
calculate now
if i wanted to see total cost total
transactions year to date year over year
all of my other measures for the ship
date what would i have to do
if i had 50 measures i'd have to go
rebuild all 50 measures to do this and
if you had multiple dates ship date
order date due date invoice date paid
date you'd have to build five versions
of every measure to satisfy all those
requirements that is not feasible or
maintainable however thanks to external
tools thanks to things like tabular
editor we can use something known as
calculation groups and so calculation
groups is what i'm going to talk about
real quick it's one measure that can
it's like the the the you know one ring
to rule them all one measure to rule
them all so let me show you something
real quick all right i'm gonna go ahead
and just delete that measure we don't
need it we're gonna delete it from the
model and then i'll delete it from my
table
and let me show you something really
cool so under external tools
there is in my video i talk about how to
kind of set this up
and i'll go over to my tabular editor
here
and in tabular editor i haven't actually
done this in a while so if i can
remember how to do this i'm going to go
into my model and create a new
calculation group
this is going to be really cool once you
see the results of this now what would i
normally call this so new calculation
group
let's call this
measures
i'm going to do measures by ship date
okay so measures by ship date
and then i'm going to go into my
calculation group and we're going to
create a new calculation item and this
is actually going to be
current measure
normally you'll do two different ones in
here you'll do like current measure and
you'll do like measure by ship date so
you can see them side by side so let's
do that i'm going to do current measure
and current measure is going to use a
function index
called oh it didn't name it
that's weird
there we go it's going to use a function
in mat dax called selected
measure
all right so that's dynamic it's going
to automatically pick that up
then i'm going to create another
calculation item here
and this one is going to be called
um
measure let's just call it ship date or
buy ship date or something like that now
i haven't done this in a while so
hopefully i'm not messing this up and
i'm going to come up here and i'm going
to say all right let's do calculate
and then what measure are we going to
put in we want this to work for every
measure sales cost
transactions year to date whatever so if
i want this do you work for every
measure i can't type in total sales i
have to type in something else and what
i'm going to type in here is going to be
selected measure
right so select and measure then we're
going to use use relationship again like
we did before the only thing i'm
changing that's different than what you
saw in power bi is selected measure
under use relationship i have to tell it
what relationship i want to use and i'm
going to make sure i don't mess up the
typing here because there's no
intellisense i'm on an older version by
the way so there might be intellisense
and a newer one just to be aware of i
don't know i'm going to go grab my
ship date from ah i clicked away whoops
i'm going to grab my ship date from here
and drop it right there put a comma and
then i'll go down to my date table so
let's jump back over to the dim date
table we're going to grab the date
column from the date table i've clicked
away again
i'm going to keep doing that apparently
and i'll drop it right there all right
and then i'm going to close it that's
going to be one i need to close it again
that's going to be two and hopefully
that works so now i'm going to go up to
the top and click save
all right looks pretty good we're going
to hope for the best
so i'm going to minimize this and go
away my model should prompt me here i'm
kind of
kind of odd that it's not prompting me
is my calculation group here
nope that definitely did not recognize
it
let me save that again
that would be very unfortunate i have it
in my saved file though i'll open up my
saved file if this doesn't work
so normally you create them and then you
just save it to your model save the
changes to the connected database
services yeah interesting all right
let's jump to the completed and
hopefully i saved it there in the final
file so i'm gonna open that up real
quick
yeah i don't know let me check the chat
see if you guys are helping me out here
all right so i'm opening up the
completed and i'm hoping i saved it
there because calculation groups are
really awesome
and oh i have it cool super exciting
that's why it's good to have a
backup not sure why it didn't work but
here's my um my calculation group that i
was trying to show you guys that i was
creating a moment ago
so what i want to do is let me rebuild
the visual real fast right so i'm going
to build a matrix here and in my matrix
i'm going to bring in the year like we
did before keep it very simple
and by the way if you're following along
you can open up the completed file just
like i just did and i'm going to go over
to my fact sales and i'm going to bring
in my total sales all right
so now i have my total sales remember
this is total sales by order date this
is based on the order date because
that's the active relationship in my
date data model
now if i go down here to the bottom
where i have my column name i'm going to
drag that and drop it into the columns
on this matrix and watch what happens
with that value 10 5 9 5.
this kept the current measure that's
what i was trying to do so you can see
the current but now i also see the shift
date but here's the cool part this works
for all my measures so if i bring in
total cost
i now have the total cost by the current
relationship and by the ship date if i
bring in my total transactions
right expand this a little bit more now
i have my total transactions by the
order date and by the ship date so i've
created one measure that dynamically is
going to work across all of my measures
instead of having to create countless
versions of that now i went through this
quickly which i always do to get you
more information from the presentation
and talk through concepts but i have a
youtube video on the youtube channel
pragmatic works that actually is called
calculation groups role playing tables
and i'll walk through this exact demo
there so if you want to go back and
watch it again you can either rewatch
this entire thing or you can go watch
that video which is very very helpful
this is also helpful for other types of
calculations though not just this
role-playing idea so if you say mitchell
i need to build total sales year-to-date
total cost year-to-date total
transactions year-to-date profit mark
you know what profit margin is not not a
good one but if you had 20 columns and
you had to build a year to date for all
of those you could go over to
calculation groups in the tabular editor
and you could say total ytd selected
measure
date date close it out and that works
across all of those calculation groups
are really really cool definitely part
of your data model to make this work and
save you a lot of time and effort so
thank you to microsoft and the power bi
community for giving us those kind of
tools
all right
so that was that part right there the
next thing i want to talk about is
i want to talk about aggregated tables
aggregation tables i really want to talk
about two things okay so we're going to
pivot a little bit let me make sure i
didn't miss any questions
and i'm going to close out the completed
one completed one is good i'm going to
close it out go back to our model real
fast
and then no idea what happened with
tabular editor so if somebody knows let
me know
let's see
if we have any questions that i can
answer real quick for the group since
we're transitioning
bald booktuber said
best invention ever calculation groups i
don't know about that but they're
definitely awesome
tabular editor rock says peter so really
just a lot of great
comments and stuff here so thank you
everybody for that
how can i parameterize
start and finish dates of the model
so there is a way to
if you want to add filters to your model
that's a good question let's take a look
at that real fast
if you want to parameterize your model
so that i'm only importing data for like
the last two years or three years but
you want to be able to change that in
the future you absolutely can do that
through something called parameters and
so i'll show a very very brief
example of that and then we'll switch up
to the next section that i wanted to
talk about all right so if i go into
transform data here
and i go to my fax cells you know that
we can add filters right and so i could
come in here and add a filter that says
hey only bring back the data that is
after
2015 or after 2016. so i can add some
filters in here but we want to make it
more dynamic so that we can parameterize
it and change it from like the power bi
service so we publish it to the service
and then inside of the power bi service
when i'm doing a scheduled refresh i
want to be able to change the parameter
there right
you can do that and that's really a
great way to do this you shouldn't have
to go back and open all your power bi
reports to update those parameters
so what i can do is i can say up here at
the top that i want to manage parameters
you have to i think you have to turn
this on in your model and i'm going to
tell it that i want to go ahead and
manage parameters and you would normally
come in here and create a couple of mana
parameters so click one click two i
might create one called start date
you might even just call it only start
date and then you just do everything up
until that start oh i deleted the wrong
one
there we go
data type i would do a date and then um
just a selected value so you put a value
in here that would be 1 1
and let's go with 2013 right
so now i have a parameter that you can
modify from outside of
the power query editor
or outside of power bi desktop in
general you can modify it a couple
different ways the primary method is
usually from the power bi service when
you refresh this so this is a question
this is not planned this is based on
your questions in the chat window so now
i can click ok
and i can go back over to i have this
parameter that i can change i can go
back to my fact table and say look i
only want to bring in the data that's
after whatever date i specified so i can
go in here and say date filter i can say
after right so we're just adding a basic
filter
and then i can say is after parameter
and there's my start date and when i
click ok assuming i set it up correctly
we're only going to get data that is
after 2013. there's a lot more you can
do with this you can make it dynamic you
can use expressions but we're just kind
of using a parameter when i publish this
to the power bi service the service is
going to recognize that parameter and
you can modify it there you can change
it in the future so that is hopefully
going to answer that question now back
on track
what i wanted to cover next we did the
parameters and then i said i wanted to
cover what what did i say i was going to
cover next before i got distracted there
oh aggregate so let me talk about a very
important concept when it comes to data
modeling so one thing that we had to do
a lot of time with customers that we
worked with is you know when you build a
table i had a customer one time
that had one table and they had like 20
of these tables one table they were a
telecommunications company had 400
million rows of data per day 400 million
rows for that type of sms text messages
had 200 million all right
so
that's a lot of data and we did it in
sql server we didn't do it in dedicated
pools or snowflake or some mpp massive
parallel processing architecture
we
did it
inside of sql server on-prem and so when
you go to run your reports against a
database that has literally hundreds of
billions of rows you don't necessarily
get the best performance so something
that we would do for customers like that
is we would create different types of
fact tables to try to satisfy their
requirements so we would create like
aggregated fact tables that roll it up
to a higher level so you still have the
granular level of detail but we also
would load snapshot fact tables or
aggregated fact tables right
so
that's what i want to talk about with
power bi real quick so i can do the same
thing in power bi i can create my fact
cells table i can say hey this table let
me remove that filter by the way so we
can get all our data i'm going to clear
that filter
i can take this table and i can say you
know what i want to i want to duplicate
the table again because we're going to
build an aggregate table real quick
now
when you bring data into power bi you
can import data
you can do direct query or you can do
live connection to analysis services but
those are really your three primary
options if you have hundreds of millions
of rows of data for most people
you're not going to be able to import
that data to power bi right and
generally speaking direct query does not
perform super great right like if you
import the data it performs awesome when
you're doing direct query and your dax
has to get converted to like you know t
sql and has to run against an engine and
bring that data back generally it
doesn't perform that well
what if i could leave my data in the
original database for the very detailed
data
but i could also have an imported
version of the table that was aggregated
up that answered most of my questions
automatically so if i know that my
customers are you know off ten times
looking at sales by month by customer i
could create a fact table
that is sales by customer by month that
maybe goes from
200 million rows to only 50 000 rows i
can import 50 000 rows and so any query
that runs against the aggregate table is
going to return lightning fast right so
aggregate tables from a data modeling
perspective are very very important and
that's the conversation that we're now
rolling into is kind of how do we work
with that how do we set that up power bi
has some really cool capability now i'm
not going to be able to do it here
and i agree with peter
column store indexes were useful because
i was using 2014 and we used column
store indexes for that database
but what i want to do here for this
table
is i want to create a duplicate and so
i'm going to go ahead and duplicate this
table real quick
and we're going to call this one our
fact aggregate cell so i'll rename it
real quick here we'll call that fact
aggregate cells
and then we're going to go ahead and
aggregate this up so
again a lot of times i would do this
type of cleaning i would do this in sql
server i would do it in my query before
i bring it up but for those of you who
are like hey i'm an analyst i come from
excel how do i do it here you can do it
the way i'm about to show you right now
so i'm going to go up to the transform
tab here at the very top and i'm going
to tell it i want to do a group by
and when i click group by and i pull
this up i'm going to tell it that i want
to group by whatever columns are
important to us now you can create
multiple versions of an aggregate table
you can do this however you want so
let's say
that we want to roll this up to product
id we do want to see it by product id
we also want to see it by another
grouping and i say let's do it at the
day level so i want to see this by i
want to see my sales my cost my total
transactions i want to see it by product
by order date all right so let's look at
what this looks like so now i'm going to
go down to my new column name we're
going to create one called total sales
by the way as we're going through this
if you guys like the training that we do
here at pragmatic work make sure to take
a look at our on-demand training make
sure to take a look at our live training
boot camps that's what keeps me employed
and i love doing what i do so you know
tell your family and friends put it on
um we would appreciate that so i'm going
to do total sales and i'm going to do
some
of unit price all right i'm going to
create another one in here this is going
to be called total cost
and that's also going to be a sum right
that's going to be the sum and we'll say
sum of unit cost and then let's add in
account rows like a total transaction
like we've done before right so we can
do
a total by day and that's going to do
account rows on the table now this is
creating an aggregate table remember the
original
table had 700 000 rows we're going to
see how many this one has when we load
it but it's going to be significantly
less so that original table we didn't
import the data because we could not
right it was too much but the aggregate
table we can and that can significantly
improve performance especially on
queries that are only doing a filter by
the product id and the order date so if
we're not going down to the customer
level because we do lose dimensionality
if we have to do a filter on customer
this table won't work if we have to do a
filter on
what else was in there geography this
table won't work right so now if i go
back and click ok
sarah is looking forward to the dax boot
camp in october i am too sarah i'll see
you there
so i'm going to go ahead and run this
real quick now this aggregated up my
data and here's what you see
this first row of data product idea 449
on july 26th
had 86 rows originally
in the original
um
in the original fact table but now we've
taken those 83 rows and we've
consolidated them down to one row so you
see how you start to reduce the size of
your data model and if you can use these
aggregate tables in a very effective
manner again you can greatly improve
performance so let's let's take a look
at this so i don't have a bad performing
model we're not going to be able to see
any great performance tuning tips here
but you get the idea of modeling for
success right so now if i hit close and
apply over here
unfortunately we're not going to be able
to do the coolest feature here because
we're not using direct query but you'll
see in a moment so i'm going to hit
close and apply
all right that's going to load the table
in we're going to go look at that table
let me see how many rows of data that is
it might tell us right here let's see if
we have any other questions
just wondering is it okay to sum up unit
price
it is but it depends on the data right
so like i know that in this data every
row represents only a quantity of one so
it's okay but if i had quantities that
were more than one like five or seven
then it doesn't make sense to do that
first you'd have to multiply the
quantity times the unit price to get a
new column then you would sum up that
column so in this data model we only
have
uh one item on each row so it's okay but
it depends can you show how to use group
by in a model that i created from
joining multiple tables
i don't have that model but if you send
me an email i could try to help you out
with that so send me an email admin
coe22
can that parameter update from a web
service yes yes when you publish it out
to the power bi service you can update
it from there if you happen to be
running some kind of update publishing
your power bi report in a different way
you can pass those parameters in all
right
so
let's take a look at our fact aggregate
cell table real quick we've covered a
ton of material here today i'm glad you
guys were able to join us
and if i go over to my fax sales table
we had 675 000 rows so right at about
700 000 rows if i go back over to my
fact aggregate sales table we went from
almost 700 000 down to 82 000. so it's a
10th of its original size right
so this is a huge reduction in the size
of my table and it might give me that
flexibility to be able to now import
that in if my original table was a
direct query right so for those of you
working with bigger data models you're
like how do i make this perform better
aggregate tables are great
now the way my model is designed right
now would require me as an end user to
know to use one table versus the other
right if i'm like hey i'm in here and
i'm trying to build a report
right here let's say that i was trying
to build a report in my report let me
make sure i've built my model do i have
a relationship in here
i have a relationship on product that's
great and i need a relationship on order
date as well let me add that in
all right
so
i'm going to come over here and let's
update this visualization real quick to
be product name
we'll get rid of year and so we're going
to have product name by total sales now
let's say that you know you can come up
here to performance analyzer turn it on
look at performance this performance is
fine but let's say performance wasn't
that good
then what i could do is say all right i
don't want to sum up the sales that's
coming from that table that has hundreds
of millions of rows direct query what i
want to do instead is i'm going to
remove total cells from this table go
over to my fact aggregate cells and i'm
going to bring in total cells from that
table all right and so now i have the
total sales and it's going to be a lot
faster because it's not having to do a
sum operation across hundreds of
millions of rows but only across maybe
those 50 000 rows that are imported in
my model in memory ready to go right so
that can have tremendous improvement in
performance by doing that now the
problem with this let's think about this
is that now you have to go to each of
your end users and they have to decide
themselves when they build a visual
which one to use and
maybe today i build this report and it
is only filtered by
product but then somebody comes in later
and they say well i'm going to go grab
what like geography and so i add a
slicer in here on country right so let's
turn that into a slicer
and i click usa and unfortunately it's
one country so let's do let's do
something other than country let's do
state i click alaska and it doesn't
change i click alabama and it doesn't
change and it doesn't change because
that fact table has absolutely
positively no relationship to my
geography that's a problem because now
that causes a lot more work for your end
users and you training them however
however
power bi
has a really cool feature that's
available
that we can take advantage of and that
feature is called
aggregations that are managed by power
bi right
if you want the presentation slides guys
email me directly i'll get them to you
i'll send you the pdf apologies for not
having that in the class files i'll drop
my email in there for everybody again
all right i will hook you up just let me
know sorry about that all right so
what power bi could do what if what if
your end user didn't even know
that this fact aggregate sales table did
not exist right they didn't even know it
exists they have no idea they build
their reports like they always do they
go to the original sales table
they bring in the cells from that cells
table right
and then
sometimes it performs really awesome and
sometimes it doesn't because the engine
in the background can figure out and can
determine if it should use the aggregate
table or if it should use the other
table power bi can do that it is
phenomenal this is a really awesome
feature so
the way you set that up and it's not
going to work for me because we're not
actually doing direct query or anything
like that
is you come over to the left you go to
your report view
and you find the table
that you want to be managed by power bi
automatically so power bi will
automatically decide should i use that
table or should i not right
and what you do is you'll right click on
that aggregate table there are some
requirements to this that are a little
bit frustrating you've got to make sure
that whatever columns are in your
aggregate table match exactly the data
types that are in the original tables so
if that doesn't match it won't map up
correctly and you'll get kind of some
issues with that also i've never really
done aggregates when you've imported
both tables because it kind of doesn't
make sense i could see where you get a
very small possible performance
improvement there
but generally when you're going to be
doing this is when your original sources
are going to be direct query and you can
see tremendous performance gains by
doing this so all of my data is imported
into power bi so this isn't going to
actually let me do this but i'm going to
show you the process anyway right so if
i come over here and click manage
aggregations
i'm going to get a screen that looks
like this right and so what i'll do is
say all right fact aggregate scales
tells us the right one
for order date and product id i already
did the group by
in the original query and i've already
defined the relationship in my model so
since the relationship is defined i
don't have to do anything with those
just keep that in mind then you have to
go down to total cost total sales and
transactions and really duplicate the
work that you did before so i'd come in
and i would tell it that i want to do a
sum
and i'd click here and i'd go down to
my fact sales
and in my fax sales i would tell that i
want to sum up my
fax sales unit cost then i would do the
same thing for total sales i'd say sum
and i would go down to
my fax sales and that's a fact sales
unit price and i would sum those up
so with this the reason it's grayed out
i think the reason it's grayed out as
you see right here it says fax sales oh
i actually could not find this in the
documentation and i just saw it pop up
there son of a gun i can't freeze it
fact cells must be a direct query table
to be used in the detail for an
aggregate in other words it doesn't
support the way i'm currently using it
but if you had
left your data in the original database
and did a direct query to that not
importing it in
this right here would actually allow you
to do this so i'm showing you the steps
although i didn't have time to load all
this into like an azure sql database
before class today so i can't follow
through the full demo
but this is really awesome and so you do
that then you come down to total
transactions and you tell it hey i want
to count the table rows you choose your
table again not going to work because
it's not direct query
and then when you're done you click
apply all that will automatically hide
the table
and then it will automatically hide the
columns your end user doesn't know it
exists
but it will use it automatically
so that's really one of the coolest
features there
oh window shift s to freeze i am not
going to test that out in the middle of
a live presentation because i might be
getting trolled but i will test that out
later i'm writing it down right now
that would be hilarious if i hit that
and just shut everything down
let's look at some questions real quick
i'm going to zoom out i know i saw a lot
of stuff coming in sorry i joined late
what are the main difference between
snowflake schema and star schema star
schema is ideal because we want to keep
our model as simple as possible as the
model grows and it grows in complexity
and we have more tables there will be
times where we need to add in some
snowflakes like we did here
where we actually did have a snowflake
in here a little bit with this category
segment but we built that in so we could
drill across our fact budget and our fax
cells so now the benefit of that we
didn't really show it right but if i
come in here and build a visual
that pulls information from
both of those tables so i go over to my
dem cat set um cat segment let's pull in
category i can now drill across both
fact tables this is the big benefit of
multiple fact tables here so i can go
over to
let's go down to fact
sales and bring in our total sales
and then let's go down to our budget
table and bring in our value there
all right and so i'd have to mess with
this because it looks like it's not
summing that let's see if we can sum
that value up
what is that data
type it says it's default to sum
let me do a real quick
actually it's not
yeah it's duplicating there let me do a
real quick sum on that that's
interesting so we're going to create a
measure on that real quick to make sure
we're doing that and then let me go
verify the relationship again
so we're going from
dim cat segment
that should be good
i want to show you guys the the drill
across functionality i'm going to verify
data types real fast here because they
should map so that is an integer value
and then in my fact budget that is also
an integer value
they are 1 through 10 that is perfect so
we're going to wind up summing up the
value and then we'll go with either
forecast or budget here so let's go back
and then i'm going to create a measure
real quick so we can kind of look at
that and then i'll come back to
questions
fact budget let's create a measure and
we'll call this one total budget
equals we're going to do the sum
and then the name of the table is fact
budget
value
english united states
we're going to get rid of value from
this table let's bring that in there we
go and so now what i see is i'm able to
drill across no idea what was happening
with that column no time to debug it
right now but i'm able to drill a cross
category and see what my total sales
were and what my budget was for that now
this is across all years so if we want
to look at years because year relates to
both tables too we can bring that in as
a slicer so this is how you handle
multiple fact tables in your model it's
actually really easy and this goes back
to that question people were asking
earlier why would i build a star schema
and not just leave it in a flat table if
you left it in a flat table drilling
across multiple fact tables like this
would be very very difficult it doesn't
give you that flexibility versatility
adaptability for the future like we
talked about before aside from all the
other benefits but because we've built
the model the way we did and i had not
tested this before the class today i
just knew it would kind of work in this
way
we're able to now add additional facts
in the future to continue to build this
out so if i come in here and i add in
the year
right here and we add that as a slicer
where's my slicer at it's hiding from me
we'll just click on it here you can see
that it's filtering it accordingly based
on the year as well so we're able to see
my total sales and my budget for 2016.
my total sales and my budget for 2015.
this is
really really awesome capability
multiple fact tables dimensions that
conform across our different business
areas in our model this is really really
awesome all right so we are actually
near the end of our day here i'm going
to start looking through and answering
questions here for
really about the next five or so minutes
and then we will uh we'll wrap things up
again
devin actually mentioned to me it would
be a great idea to do some kind of
follow-up after this where we actually
do some kind of q a where we get
together and take questions from the
group like what you all have here and i
think that would be awesome so keep your
ears open for that maybe we'll do like a
one hour or 30 minute event where you
can send in questions and we
go through that i think that would be
super cool all right let's see how how
do how to do a run rate calculation
basis month number if it is april
selected it should do number
times 12
divided by four
uh you'll have to email that to me i
don't quite understand that calculation
you can publish your data model to power
bi online and have others use it as a
power bi data set yes great point so
facts with dax i love that point right
this is actually something i should have
had on my list so i built a really
awesome data model i've named it i've
created my calculated measures i've
built in multiple facts and i don't want
everybody on my team to have to go build
their own data model that has different
variations and different logic what if i
could take it and publish the entire
thing to the power bi workspace
give them access to build from that
power bi model so then they connect
doing a live connection and they just
build reports they don't have to worry
about building the data model you can do
that and that is actually a best
practice so that is a great job thank
you for bringing that up
will this session be made available yes
it will be recorded on our youtube
channel for your
viewing pleasure so please feel free to
go back and look at that is it possible
to extract monthly budget from the
yearly budget figures easily i guess
with a dex measure absolutely i actually
do that in our advanced dax class and i
do it in our advanced power bi class
essentially what you have to do is when
it's on the date if you have the date
there like if you were looking at the
month you'd have to count the number of
days in the month divide the yearly
total by that and you would have to do
that in the dax which could be kind of
complex depending on what you're trying
to do but yes you could do that with dax
can you talk through the importance or
lack of importance of marking a date
table as a date table please so in our
situation we're okay with not marketing
as a date table
but if you had turned on in your model
the automatic time intelligence which i
actually recommend against let me talk
about that so i recommend in your model
under options and settings i recommend
turning off the auto date time
intelligence there's a lot of reasons
for why you should turn it off i do not
turn it off i don't keep it on you can
turn it off in two places one
under data load i believe it is right
here you can turn it off and that's for
global all power bi reports in the
future and under current file i would
turn it off for the current file so once
you turn that off
then it doesn't automatically create
date hierarchies in the background for
every date column that you have right
so first of all turn that off if you did
not turn it off and you mark your date
table right here as a date table by
right clicking
make sure my head's out of the way and
you mark that as a date table that will
automatically turn it off for that date
table anything that's in there
so that's number one
if you were doing the relationship from
your date table to your fact table to
your fact budget to your fact inventory
whatever
on a date key that's an integer kind of
smart key so like today
is august 11th of 2022
what you would do is you'd create a key
that would be like
20220811. it's an integer it's a smart
key if you were building a relationship
on a key instead of the date time
intelligence those built-in time
intelligence functions actually won't
work so you have to mark your date table
as a date table for that to work we're
actually okay with what we're doing here
because i'm using just the original date
but as a best practice we should be
using those smart date keys because it
is going to be optimal for joins and
performance and things like that how do
you compare compute year over year month
to date and other comparison measures
those are actually really easy to do if
you go take a look at my three hour dax
presentation that i did i'm pretty sure
i cover all of those in there
but also our on demand learning and our
dax boot camps i think the one in august
is sold out so i posted it but it's sold
out the one progress 29th but we do them
about every month but if you go there we
talk through all of those different time
intelligence things in our boot camps
good question
oh if i wanted power bi visualization in
powerpoint slide microsoft actually just
released that that is awesome has
nothing to do with this presentation
but microsoft just released that i
forget i was testing it out the other
day it is awesome so once you publish i
think you've got to publish it to the
power bi service first then you can go
up to file in the service and you can
say powerpoint and it actually will load
it into powerpoint the live interactive
power bi report which is incredible
we've been waiting for that for a very
long time that's a great question
could you give us a glimpse of the
scenario you mentioned earlier with
breaking out budget tracking from the
monthly amount so no muhammad because
that's just a longer calculation
ultimately what we would do with that
and i kind of mentioned it a minute ago
is inside of a table let's say i had a
table right here i'll kind of give you
the scenario and this will probably be
the last question and then
we'll wrap things up for the day but if
i had a table here with the dates in it
and then i had in that table with the
dates let's say my fact sales where is
that at
and i bring that over and drop that
right there
if i wanted to be able to look at like
my year-to-date let's do year-to-date
that's even better year-to-date sales
maybe i want to be able to compare my
year-to-date sales to my year-to-date
budget right but my budget isn't at that
level this is a different level of
granularity so then we'd have to write
dax that figures out hey
you're at 23 days into the year there's
365 days in the year divide 23 by 365 to
get a percent multiply that percent
times the budget that's at the year
level it's at the month level you would
do that calculation so that's the gist
of it right at a high level that's kind
of how you would calculate it out of
course we're working with varying filter
context
so we have to consider all the filters
that are applied there and then somebody
said mitchell is my favorite
thank you for that your 20 is on the way
you pulled through i'm just kidding no
20 bucks for you but i do appreciate
that i'm your favorite um
yeah you might need facts with dak said
you might need a uh uh
an add-on yeah all right
hey everybody we're ending a little bit
early today which is okay we covered a
ton of material it was great phenomenal
interaction from everybody a lot of
people still on the call two and a half
later hours later which is crazy i'm
glad you guys learned a lot thank you
for all of the great feedback peter make
sure to send me an email i'm gonna send
you that pdf uh it's gonna have your
name all over it and then for everybody
else again we just want to tell you
thank you for joining us and we'll see
you next time
thank you everybody
you