Forem: Brett Hammit

Automated Pitch Report Generator in Python: Trackman Data

Brett Hammit — Thu, 03 Jun 2021 17:00:13 +0000

The times are changing in baseball and we are able to easily gather so much raw data and it can be pretty tedious to take the time to gather and clean and put it all together through excel sheets to make it into readable content for players to understand. I solved this by writing a script that packages this up and is easy to use for beginners who might have little to no experience in coding at all or a lot. The only thing necessary is following some instructions for software installation and following some straight forward directions. That being said let's jump right into it!

The Installation of Anaconda(Python) and Set Up
Accessing the Script
How to Work the Program

Chapter 1: Installing Anaconda and Set Up

For those of you who have not a whole lot of programming experience, but are interested in organizing your data this might be a bit tricky.

Step 1: Go to this link below and scroll all the way down to Anaconda Installers and select 64-Bit Graphical Installer for whichever operating system you have Windows, Mac or Linux. Select all the defaults and it might take a minute to get everything installed for you!
https://www.anaconda.com/products/individual

Step 2: Go into Anaconda either by clicking the desktop icon or in your launch pad for Mac users. (I know if this does not work you search "terminal" on your computer and type in that "anaconda-navigator". This should open it up.

Step 3: Launch JupyterLab and then click file on the top left, new and then notebook. Name this file whatever you may like.

Step 4: Last step and maybe the easiest in your documents go and create a new folder named "Pitch_Report". Finally you need to get the TrackMan data that you have in that messy csv and drag it into the sidebar below the cleaner. You're now set up and ready to get the script put in!

Your screen should look something like this with the cleaner and the csv file with your TrackMan data

Chapter 2: Accessing the Script

For this you can go to my GitHub link below and take the code and from Pitch_Reports take the code from Pitch_Report_Generator.ipynb and paste it into the file you made in JupyterLab. Now you should be ready to get into how the program works and how to run it.
https://github.com/BHam21/pitch-reports

Chapter 3: How the Program Works and Running it

The first bit of code is just importing the right libraries.

import pandas as pd
import numpy as np

For this part of the code you are simply just enter the name of the CSV file of your TrackMan data between the apostrophes. This will change for each date and different csv file.

df=pd.read_csv('11_4_Scrimmage.csv', index_col=False)

This next part of the code you don't have to worry about because it is just selecting and cleaning the data for you.

df=df[['Pitcher','TaggedPitchType','RelSpeed','SpinRate','SpinAxis','Extension','InducedVertBreak',
      'HorzBreak','VertApprAngle','pfxx','pfxz']]

#Rounding the decimals in the Data Frame
df=df.round(decimals=1)

Final_Report=df.groupby(['Pitcher','TaggedPitchType',]).mean()
Final_Report=Final_Report.round(decimals=1)

This is the final part of the code and the one of two parts you need to interact with, the only thing you need to do is go into your files and where your Pitch_Report folder is click the folder once and go to the upper left and click "copy path" paste this in between the apostrophes in the code below. This should give a similar result of the code below, but with your username where it says "YourName".

Now the code is set up and the only thing you need to do from here on out is change the date of your file inside the code below where it is says "11_5_FinalReport.csv" if we were using the next days data you would change it to "11_6_FinalReport.csv"

Final_Report.to_csv(r'C:\Users\YourName\Documents\Pitch_Report\11_4_FinalReport.csv')

Running the Program

Now that we have it all set up we can finally run the program. If your Trackman data is on the left side with your cleaner file and the name of your csv is where it should be at the beginning go ahead and click "File" save. After this click "Run" and then Run All Cells.

Now you should be able to go into your documents and where your Pitch_Report folder is there should be a cleaned, sorted and nice looking file with all your TrackMan pitching data!

Below is an example of what your Final Report should look like in your Pitch Report folder with it being sorted by each pitcher and their pitches that they threw from that day.

End

Thank you for taking the time to read this article and check out my work, for any questions message me on Twitter or wherever and I will try my best to help you out. Thanks again!

Intro to Machine Learning in Python: Part III

Brett Hammit — Sun, 17 Jan 2021 01:55:25 +0000

Model Evaluation
Predictions From the Model
Regression Evaluation Metrics
End Thoughts

This is the last part in my 3 part series of Introduction to Machine Learning with Python, it has been a blast to write and hopefully some of you have got something out of reading these. In the last post of Intro to Machine Learning: Part II we talked about how to split, train, test and fit our linear regression model. In this one we are going to be diving into the process after we have built and tested our model. To do this we need evaluate the model by printing its intercept and coefficients, looking at the predictions from our models and then evaluating it from a quantitative approach. There's our summary so let's go see how this is done with examples and some code!we need evaluate the model by printing its intercept and coefficients, looking at the predictions from our models and then evaluating it from a quantitative approach. There's our summary so let's go see how this is done with examples and some code!

Model Evaluation

To start evaluating our linear regression model we are going to be printing the intercept and coefficients from our model. The first thing that we need to do is print our intercept with the code looking like this:

print(lm.intercept_)

This should then give you the intercept below. After you've completed this you can move on to printing the coefficients out. The coefficients are related to the columns in X_train. So by doing this we are going to create a data frame out of the coefficients to make them more readable and easier to interpret them. The code in Python could look like the following:

coeff_df = pd.DataFrame(lm.coef_,X.columns,columns=['Coefficient'])
coeff_df

This will make a data frame that has your related rows and then their corresponding coefficient next to them. From here we move onto how to interpret these coefficients.
To interpret the numbers to the right of your column it means that with holding everything fixed a 1 unit increase will result in an increase of however many units that coefficient is. So depending on the data and what question you are trying to answer interpreting these are going to be case dependent.
Now that we've gotten some insights from our coefficients we can go into looking at the predictions data from our model.

Predictions From the Model

To analyze our predictions from our model we need to:
*Store our predictions from our test data in a variable
*Use matplotlib to make a scatter plot
*Use seaborn to make a distplot

To store our data of our predictions in a variable its going to be one simple line of code and the help of the scikit-learn package and will look like:

predictions = lm.predict(X_test)

Now that we've got our predictions in a variable we can visualize this to qualitatively analyze our model by looking at linear model, and the distribution of it.
To do this we will use matplotlib to create a scatter plot of the test values versus the predictions by using:

plt.scatter(y_test,predictions)

Your result of the scatter plot should look a little something like this:

We want a tight linear straight line which means your predictions versus the test model are very close to one another.
The other way that we are going to qualitatively analyze our model is by looking at the distribution of it. To do this we will use the package seaborn and use the distplot by coding:
sns.distplot((y_test-predictions),bins=50);
After running this it should yield a plot that looks something like this:

We want the residuals of this plot to be evenly distributed bell curve like we had talked about in Part II to ensure us that we have fitted our model decently.

Now that we have qualitatively analyzed our model we can go one step further and quantitatively analyzing by using Regression Evaluation Metrics.
Regression Evaluation Metrics
When it comes to analyzing our linear regression model we want to not only visually see, but to also see it in numbers. Lucky for us scikit-learn can do this math for us. That doesn't mean though it isn't good to know how we got there.

Here are the 3 most common Regression Evaluation Metrics and the best way to compare these metrics is:
*MAE is basic because it is just the average error
*MSE tends to be better than MAE because it punishes larger errors which is more commonplace
*RMSE is the most popular of the 3 because it is more readable in terms of the "y" units

It is important to note that these are all loss functions so we are trying to minimize these as much as we can with our models. These will become more clear when you run the code to show these.
This code is going to look like:

from sklearn import metrics
print('MAE:', metrics.mean_absolute_error(y_test, predictions))
print('MSE:', metrics.mean_squared_error(y_test, predictions))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, predictions)))

This should give you all three of your Regression Evaluation Metrics listed with their errors. Remember we want LESS errors in our models, so the smallest number of the three will be the best model.
Now we have finally learned how to analyze our linear regression model qualitatively as well quantitatively. This is the last step and we've completed the series of Introduction of Machine Learning!

End Thoughts

This concludes my 3 part series and I hope you guys enjoyed maybe 1 of these articles or at least found something to takeaway from it. I am by no means an expert in this stuff and am writing to possibly help out people like me who might be starting their journey and looking for some tips to help the along the way. I know that I really enjoy it when I find good articles that speed up my process of learning so I'm that maybe some these articles can help someone reading this. Thank you for taking the time to read and if you didn't catch my past articles I'll link them below if anybody would like to read. If you enjoy this leave a like or a comment telling me things that I could do better and some stuff you'd like to see me write about next. Thank you again and good look on wherever you're at in your journey.

Intro to Machine Learning: Part I
Intro to Machine Learning: Part II

Intro to Machine Learning in Python: Part II

Brett Hammit — Sat, 19 Dec 2020 18:32:11 +0000

In Intro to Machine Learning in Python: Part I we talked about what linear regression is, how to analyze the data that we have and then how to plot out some visualizations to give us a better understanding of our data and the correlations between it. In this post we will be going through what machine learning is in basic terms, as well as how to split, train, test data and fit our models. Let's hop right into it!

Machine Learning Summary

To sum machine learning up simply you could say that we feed AI large amounts of data and it uses statistics to make predictions based upon the data that we give it. This is achieved with the different variety of algorithm's that we discussed in the first post. Machine learning is used in a variety of different ways and one example of this is betting sites using machine learning models to set the lines for different bets making them as accurate as possible. S/O @GrahamPinsent

The steps of machine learning normally go with first gathering data, cleaning the data, putting majority amount of data in training data, and then the rest into test data, from there we go into model testing and finish with deploying the model into real world applications. Visually looking like this:

Now that we have an idea of what machine learning is we can get into how to do the first step in linear regression which is splitting the data up.

Splitting the Data

Like I talked about in Part I I am using the scikit-learn package within Python. So in order to be able to split the data we need to first set the data into x and y arrays. The y array will contain what we are trying to predict and the x array will have all the other non-text data in our data frame. Linear regression will not work with text data unless for natural language processing which I am looking to get into after machine learning. You might want to get a good view at all of your column names so to make this easier I just make a call for a sorted list to view all of them. To do all this in code it would look like this:

sorted(df)

y=df['ColName']
X=df['ColName','ColName','ColName','ColName']

Now that we have split the data into their appropriate arrays we can go about training and testing our data.

Training and Testing Data

To train and test our data it is very simple where we just need to import the train_test_split method from the scikit learn package. After doing this we are going to test and the train the data looking like this:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)

This is making sure that we are training and testing both data in the x and the y array as well. The test_size parameter is the amount of data being put into the test data. Normally 30 percent goes into test data and 70 percent goes into training data, but this can be subject to change based on the model. To clear things up maybe a bit for example in the code above we are giving .3 out of 1.0 so that means .7 will go into the training data. The random_state parameter is the number of random splits in our data.

We have now successfully split the data, and then trained and tested it. The next step is to start training our model!

Creating and Training the Model

In order to create and train the model we will:

import linear regression from sklearn
cast linear regression to lm to make it easier for us to call
fit the the model while passing it our training data

Doing these tasks the code will look like:

from sklearn.linear_model import LinearRegression

lm = LinearRegression()

lm.fit(X_train,y_train)

What fitting the model does is we give our model the data so that it specifically knows the context of what it is working with and what tools to use. After fitting it shows the estimator parameters of the machine learning algorithm and what they are set to. These can be changed to fine tune the model that you are making. This link will take you to the documentation of scikit-learn's linear regression if you want to really fine tune your model or learn more about the depths of linear regression.

Wrapped Up

Congrats! We have now done all the framework that goes into creating a linear regression model! To go over what we learned: what machine learning is, how to split train and test data, and finally how to create and fit a linear regression model! Now that we have created our model we need to analyze and evaluate the model that we made which will be shared in Part III of this series!

Thank you for taking the time out to read this and if you enjoyed it drop a like or a comment telling me what you liked or disliked or with any questions or topics that you might wanna hear about next!

Intro to Machine Learning in Python: Part I

Brett Hammit — Wed, 11 Nov 2020 03:42:21 +0000

After messing around with really getting to know the in's and outs of data frame management and other sides of data science in Python I have been reluctant to get into Machine Learning with the worry of not having the time I would like to commit to it and get as good as I would like. Like everything sometimes you just gotta do it. So here we go.

Starting Point:

Where I am starting is Supervised learning, which basically means there is known input and outputs and you are just modifying the parameters of your model to predict future outcomes.
-An example of this would be Positive vs. Negative movie reviews

I am doing doing this work in Jupyter with the library scikit learn in Python which has algorithms already in it, which makes it much easier to fit models, split test and training data etc.

Linear Regression

Linear Regression is the step up after correlation, it is when we try to model the relationship between of two variables by fitting a model to predict a value.

Within Machine Learning there are some base algorithms and it can be hard to decide what is the best model for your data. This cheat sheet really gives a pretty good guide of what you should be doing based off your data.

Working With Our Data

So the first thing is we need data to work with in order to try to build a model. When you have your data readily available the first thing to do is to analyze what you are working with.

The first step to this is taking your data and setting it into a data frame. We can do this by using "pd.read_csv('YourData')" or whatever type of file you are working with to read to. Creating this data frame will allows us to dig deeper to see what we need to do with our model.

Analyzing Our Data

A good starter on where to first look within your data is use the .describe() and .columns methods on your data to see your columns names and some additional info about them.

With Seaborn in Python being imported we can use "sns.pairplot(YourDataFrame)"
to give us a good idea of the distribution of our data.

An example of Normally Distributed Data vs. Not Normally Distributed Data

After that we can look at the correlation of our data by using "sns.heatmap(df.corr(), annot=True)" to see a heat map of our data as well as the correlations on top of them. 1 means that they are perfectly correlated with one another.

Lastly, in analyzing our data we need to pick what we would like to predict so we can choose the column of what we want to predict and use "sns.distplot(YourDataFrame['ColumnName'])" to pull up a distribution plot of that column. It should be normally distributed like I talked about above.

Conclusion

In this post I mainly talked about my first day in Machine Learning primarily working with Linear Regression and analyzing your data for getting ready to fit it. My next post should be more about actual ML and training, testing and fitting our model!

First Post: My Road in Computer Science

Brett Hammit — Tue, 01 Sep 2020 05:28:57 +0000

It has been a long time coming and I have finally gathered the motivation to write my first post. Since this is my first blog post I guess I need to give some context of who I am and why I'm writing this.

About Me:

My name is Brett Hammit and at the time I'm writing this I'm 21 years old and in my third year of college. I am a Computer Science Major at Abilene Christian University and play baseball as well. This blog is primarily going to just be somewhere that I can give updates on my journey in Computer Science and a place I can share projects and things that interest me.

So Far in Computer Science:

I wrote my first ("Hello World") in Python about 2 years ago and found just a bit of interest in programming. I had no idea what I wanted to do with it or even if I wanted to work in that line, but it was just kinda something that I knew that I liked doing. Over the 2 years that have transpired I wanted to go into school for Computer Science, but was told that the degree would be too hard for a Division 1 athlete to balance, so I went into a Business Major. After my first year of business classes, I felt that I just wasn't in any classes that truly interested me. During the time I was doing these business classes I was teaching myself Python by online courses here and there. After the year was over I realized I wanted to do what I was passionate about and go to school for Computer Science even if it meant challenging myself and pushing things to the edge. Life happened, and I ended up transferring to a 2 year college for baseball, but officially changed my major to Computer Science and could only take a limited amount of CS classes that were offered there. This was just another speed bump in my journey through programming. I was not going to let any of these obstacles prevent me from doing what I wanted to do.

// Detect dark theme var iframe = document.getElementById('tweet-1241899884241969157-810'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1241899884241969157&theme=dark" }

Moving Forward:

Fast forward to today I am now at another 4 year university and taking CS courses this year and pursuing my degree in Computer Science. It feels like a weight has been lifted off my shoulders now that I am finally able to be going to school and excited about what I am getting to learn about for the day. My current interests I am chasing right now is getting into Data Science through Python as I continue to work my way through other CS topics to see what else I am curious to learn about. The purpose of this blog or whatever it may be is just somewhere I can share my projects or things that I'm learning about that I want to share with people. My advice to anyone that is reading this is that even if people try to steer you away from Computer Science or say it will be too hard for you; go for it if you're passionate about it. This has been a long time coming and I'm looking forward to writing more here!