hey everyone i've been so excited to
share this with you as for the longest
time i have been wanting to do some
content around machine learning and ai
and today is the day
not only i'm going to be teaching you
this from a super beginner perspective
i'm going to show you how you can sell
your own ai tools where they go for 500
to 10 000 a month
so what we are going to be doing is
building an ai tool that can take a file
with thousands and thousands of articles
sift through it and basically tell you
what each article is about
okay so here's an example of an article
and the text that makes it up
rai is going to look through all the
words and return back the category the
article should be assigned to
pretty cool right
so this tutorial is going to take on a
super simple approach as i said in
building and teaching you some machine
learning skills using python
once we complete this you should have
the knowledge to go forth and build your
own ai tools so not just for financial
articles that you can sell to others
online
so what are we waiting for let's get to
it so first off let's start on the
gravity ai platform you can see a full
catalog of all the live ai models we're
just going to sign up to create our own
models so let's just go ahead and do
that
this is essentially where we're going to
be hosting our own model such as the
ones that we've just seen so we can sell
access to it to individuals or companies
now to build a model we are first going
to have to train our model okay and for
this we are going to need a lot of data
in fact the more data the better in
machine learning a data set of text is
also called a corpus and i prepared a cv
file for the occasion
it includes over
2500 financial articles scraped from
google news which already have a
category assigned to them okay so it
just looks like this so that our ai can
learn from it
you will find the csv file in the
description of this video along with a
few other files that we will be using so
just go ahead and grab that now it's
called training data cv
great so we have the trading data now
let's get to creating our ai model
so first off i'm going to go into a
directory of my choice
and simply create a new directory or
folder whatever you want to call it
called gravity ai upload
and next go into the directory
and make a file called classify
financialarticles.py
the py is for python so that our code
editor essentially knows to treat this
like a python file
great now let's open up this project i'm
just going to use the shortcut code dot
to open up vs code
great
so
now i need to prepare vs code to use
python by making sure the python
extension has been installed which for
me it has and installing a python
interpreter
to install the python interpreter i'm
simply going to write this command in my
terminal
and let it run
once that has finished you can check
that it has all worked by checking the
version
if you are watching this in the future
and something isn't working later on
down the line in this tutorial it could
be down to the version that you are
using okay so make sure to revert back
to this version if that is the case
okay now i'm going to search for the
python interpreter by opening the
command palette by pressing shift
command p
on a mac to search and start typing
python select interpreter and make sure
the latest version is selected and let's
just go ahead and click on that for good
measure
okay
we are all set up
let's
check that everything is working as it
should before carrying on i'm just going
to run a simple print
hello
and let's use this play button that
should now be visible for you
and great
hello has been printed in the console
log down here
now
let's write our code so the first thing
we're going to do is need a package
called gravity ai it is essentially a
helper package that simplifies code
model deployment to the gravity ai
marketplace that we just saw
in fact there are a few packages we will
need so i'm just going to write this in
a requirements text file which we needed
for our model deployment 2 later on
so here are the package we will be using
let's go ahead and install them all
using the command
python 3m
install
our requirements text so just exactly
like that
okay
and once that is done i'm just going to
import two of them into the file so we
can use them the other three will be for
our pickle file which we are yet to
write
okay so the first one is pickle the
pickle module contains functions for
serializing and deserializing python
objects
pickling is essentially the process
whereby a python object is converted
into a byte stream and stored in a
binary file
we will be creating some pickle files
later so pickling and depicting is kind
of important for us
next we have the pandas
pandas is an open source library for
data analysis and manipulation and
essentially provides data structures and
functions designed to make working with
structured data super easy fast and
convenient
so now let's get to using these packages
well first off i'm just going to
open some pickle files that we are yet
to create
the pickle files will be for our model r
t f i d f which i'll explain later
vectorizer and our encoder
tfidf or term frequency inverse document
frequency is another machine learning
term
it is going to help us with the weighted
count of the frequency of each word in
our vocabulary or in other words all the
words that appear in our corpus
okay so two machine learning words there
term frequency inverse document
frequency which will help us with the
word frequency and vocabulary which
means every word in our corpus
great
so we'll go back to this first let's
finish off the code that we have here
so now i'm going to write a function
that's going to deal with an in a path
and an out path this won't make much
sense now until we build our pickle
files in the next section so just bear
with me for now
the function is going to read the csv
file we put into it saving it as input
df
you will notice we are using the pandas
package to utilize the read csv method
next we are going to use our term
frequency vectorizer to vectorize our
data to figure out which words weigh
heavily in our articles and we will only
want to do this on the body column of
our data and then we save the results as
features
now let's actually predict the classes
and we will do this by using the model
which relies on the financial text
classifier pickle file that we are yet
to write
once again this will all become much
clearer when we get actually writing our
financial text classifier pickle file
and finally let's actually give a verbal
category into the category column for
the article and save the results to a
csv file which has an id and a category
great now let's pass the function into
the gravity ai helper just like so and
before we run this we of course need to
supply the missing files
so let's go ahead and do that now let's
create those pickle files
so
go ahead and open up google colab in
your browsers
this is a free platform used by many
computers scientists and will help us
create our pickle files
i'm just going to rename this
file for organizational purposes and get
to writing some code
now collab is quite handy as it already
has a bunch of packages already ready
for us to use without us having to
install them
so the packages we are going to be using
are sklearn pandas jason and pickle
i'm just going to go ahead and import
them along with the methods we want to
use for them now you might recognize
some of these from the requirements text
file this is because we will need them
for our pickled files when we take them
offline and upload them onto gravity ai
the sklearn library contains a lot of
efficient tools for machine learning and
statistical modeling including
classification regression clustering and
dimensionality reduction and is one of
the most popular libraries for machine
learning and pandas jason and pickle we
have already covered
now we are going to read the csv file
containing the core burst we will be
using to train our machine learning
model
okay so there we have what we need let's
run the code and carry on
so once again we'll be using the pandas
read csv method for this
so i'm literally going to drag the
training data csv in here and when that
is done let's
run the code
and
now
just get the results so type in
financial corpus df again
and run the code and there we go we can
see the content of the csv file as it is
so with the articles in the body column
the title of the articles and the
category it belongs to okay so we have
our training data we have it right here
let's carry on
first let's find out the different types
of categories that we have available to
us using the pandas unique method to
find this out
so that's what i'm doing just looking in
this file and finding every single
category and then just returning the
unique one so it doesn't return like
multiple categories to me
so click run and here we have all the
unique categories that exist in our
corpus
next we will now build a machine
learning model that can predict the
article category given the content
in order to do this we must first
transform the data into a format that
our machine learning algorithm can work
with
first we will convert the categories
into numeric values
sklem provides a label encoder module
that will take care of this process for
us so first off let's instantiate the
label encoder
and next let's fit the label encoder to
the categorical data
and now let's create a column in the
data frame containing the encoded
categories
and click
run
now to see all the labels given to each
category i'm just going to create an
array of them and make sure to get the
unique ones
so
there we go and
ta-da we have successfully transformed
the article categories into numbers
so now if we have a look at the
financial corpus so let's get the
financial corpus
again
and click run you will see that we have
created a new column called label and we
have labeled each category so
international finance now has a label
five as you can see here and you can see
here each time you see the category
so it's the same so we now know that
international finance has a label of
five
great so each category now has a number
that we can work with and just like we
created a numerical representation of
each category we will now create a
numerical representation of the body of
each article
for each article we will actually be
doing the following we are going to
tokenize an article that is break the
article body into a list of tokens or
words
next we're going to convert that word
into lowercase so that everything you
know is um standardized in a way
and we're going to remove these stop
words now stopwords are just words like
and or
the
you know the commonly used words that we
don't really want in there
by removing these words we are
essentially removing a low level
information or noise from our data and
will allow our machine learning
algorithm to really focus on the words
that carry more significance
next we're going to remove the
punctuation marks because just like
these stop words they're kind of useless
and finally we're going to create a bag
of words which is essentially a vector
representation
of a document based on the number of
times each word in a vocabulary appears
in a document
okay
so essentially this is going to help us
figure out which words appear the most
in a document
using the
tfidf algorithm or the term frequency
inverse document frequency on which we
already touched on briefly now this is
obviously a lot luckily we don't have to
write any of this code by hand as
sklearn has a tfidf vectorizer module
that will essentially do all this for us
so that's what we are going to use just
right here
next up i'm going to create two
variables one for the body of my corpus
so that's what i am doing here and i'm
saving it as x
and one for the label
okay so we're gonna get the label from
my corpus and save it as the variable y
so hopefully that makes sense we have
one more thing to do before we start
creating these files and that is
vectorize the body text
we are going to do this using the random
forest algorithm to create a model that
can classify the articles
the random forest algorithm can be a bit
more complex so if you want to have a
read on it please post here and do so if
you wish
so
let's instantiate that
and now let's pass through the
vectorized body and the label
to our random forest classifier
and great
so now let's get our classifier which we
saved as rf underscore clf
and save it as a pico file
using pickle
jump
next let's get our vectorizer which we
now know vectorizes all the words thanks
to the sklearn package that has tf idf
vectorizer in it and save it as a file
called financial text vectorizer
pkl and finally let's get the label
encoder that we wrote which as a
reminder helps us give numeric labels to
the categories and save it as a
financial text encoder as a pkl file
and click
run
so now we have the three files we need
so let's go ahead and download them and
just drag them over into our vs code
project along with the python file and
the requirements file
and let's just get those files and
simply just going to get the path to
them and put them in the correct
location that it is meant for
so that everything runs correctly
great
so we are now nearly done i'm just going
to add a gravity ai build json file in
here so that we can upload this onto the
gravity ai platform successfully
and there we go
we are now all ready to essentially
upload this onto the gravity ai platform
so this is all we need we're going to
zip this and then we're going to upload
it onto gravity ai okay so make sure you
have all this
and let's do it
okay so back on the platform i'm simply
going to click on my account
as we are going to first have to create
an organization so i'm going to call
this
anius space you can of course call it
whatever you want
and now let's go ahead and create our
project so i'm going to choose this
first option right here
and go to the next tab
in which we are asked to name our
project
so i'm going to choose to call this
categorize financial articles and i'm
just going to put in a quick summary
that is not longer than 150 characters
and click create
great
now i've already pre-prepared some
markdown for us so here is what i have
written this is exactly what people will
see when they come across my project
feel free to write whatever you wish as
well please take your time and making it
as you know thorough as possible and
once you are ready just click next
next we are going to have to actually
upload our files so make sure this is
python archive and the files that i'm
going to upload well i'm actually going
to zip this up
okay so make sure to zip up the project
that we have just been making and once
that is zipped up i'm just going to drag
it in here and wait for that to upload
and
wonderful
once that is done you'll just be asked
to fill out some more information which
python script
file is the main entry point for your
code well that's going to be the python
file
and are we using a requirements text
file yes we are so just go ahead and
make sure that is linked up correctly
and
great the last thing we need to do on
this is actually define our schema
okay so essentially with a csv file that
we are going to be uploading making sure
to tell this it's a csv file i'm going
to say that our input so the file that
we are going to input is going to have
an id and a body because we are going to
you know
be giving it some
uh data that is just some text and the
output well the output is also going to
be a csv file of course you can choose
to work on whatever files you wish but
this is what i am going with for now
and the output is going to have an id
and a category
okay so that is what i want my output to
look like we're going to upload some
text and then what is going to return is
going to be the category of the articles
okay so now let's carry on
now once that has finished building and
only once it's finished building just go
ahead and click on the manage tab as we
are next going to have to run our docker
container so that we can actually use
this ai model so to do this make sure to
have docker installed on your computer
it should look like this
and once you have just go ahead and
download
the
file
so just go ahead and download that and
once it has downloaded i'm just going to
actually move this to my desktop so i'm
just moving that file to my desktop
and we're gonna have to run the commands
in green
so making sure to gravitate to the same
place that i have downloaded that file
so i'm gonna go into the desktop i'm
just gonna copy that line of code
and run it
okay
so that is now finished and then let's
run the second line of code
and wonderful
so now if we go to localhost 7000
you will see our ai model and it is
ready to use so let's go ahead and check
it out first off we're going to have to
upload a license key so let's go back
to here and just download that license
key okay i'm just simply downloading it
from here so that we can upload it onto
our dashboard here running on localhost
7000. and next we're just going to
actually
uh put in some data so as we know our
data needs to have an id and a body and
essentially i'm uploading this data so
that we can
check if it works right so i'm putting
in the data it has an id it has a body
and i want the output of this to be
categories so i want to know what
category each one of these uh i guess
cells belongs to so the data in each
cell belongs to
so i've actually provided this file for
you again in the description below along
with the other file so just go ahead and
import that
and let the job
work
okay
and once the job has finished you will
see
ta-da
okay so we've taken this file and then
we've used our ai model and it has
returned back the categories for our
data so all the data here now has its
own category based on our ai model
okay so this was super easy uh we have
checked it it works the last thing to do
is just publish it so i'm going to go
ahead and click the publish button here
and once we are done with that i'm
literally just going to hit
publish and
there we go we have now officially
launched our ai model people can come
and buy it and use it and you will
essentially see that revenue from the
usage of your ai model
so hopefully this tutorial has been
useful for you to go forth make your own
ai models
or just have a go at you know building
this one and see how you get on
thanks so much again for watching and i
hope to see you in the next tutorial