Data scientists who come to the career without a software background (myself included) tend to use a procedural style of programming rather than taking an object oriented approach. Changing styles is a paradigm shift and really takes some time to wrap your mind around. Many of us who have been doing this for years still have trouble envisioning how objects can improve things. There are ** a lot** of resources out there to help you understand this subject in more detail but I am going to take a “learn by doing” approach. The code used for this can be found on my GitHub.

**Goal of this post:**

- Build a very basic object to house our linear regression model
- Create a command line interface (CLI) to pass in different datasets
- Print the object to the screen in a user-friendly format

**What we are leaving for the next post:**

- Fitting a model to find coefficients
- Finding the RMSE, R^2, slope and intercept of the model
- Testing our model using pytest

Here we go!

Definition: **OOP** = **O**bject **O**riented **P**rogramming

Our object above describes what we need to house in our object.

**Data**– Obviously… **note** in this case, the data needs to be in a specific format**Fit**– Utilize the`y = mx + b`

format that we all grew up with. We’ll write the code for this in the next post**Fit Results**– After fitting the model, we typically need to be able to see the fit rather than just predicting results**Predictions**– Make the model useful by being able to predict values provided by the user

**What do we need as inputs?**

- Independent variable values (typically ‘x’)
- Dependent variable values (typically ‘y’)
- Numeric value at which we want a prediction (similar to ‘x’)

We start by making a `class`

and then we define what it takes as input within the `__init__`

method. In our case, we are asking for a list of `independent_var`

and `dependent_var`

with a single numeric value as `predict`

. .

class SingleLinearRegression: def __init__(self, independent_var: list, dependent_var: list, predict: float): """ Completes either a single or multiple linear regression. We will pass a single value to predict. :param independent_var: list :param dependent_var: list :param predict: float """ self.independent_var = independent_var self.dependent_var = dependent_var self.predict = predict

Next, we know that we will be fitting a model and predicting results. This will utilize `fit`

and `predictions`

methods. We will hold off on adding the math until next post. Finally, we add the `__str__`

method which is called when you `print(your_object)`

in order to make the output legible. You will find that there is another method called `__repr__`

available, but it is typically utilized for a different purpose. We will save this class by itself in a file called `linear_regression.py`

.

class SingleLinearRegression: def __init__(self, independent_var: list, dependent_var: list, predict: float): """ Completes either a single or multiple linear regression. We will pass a single value to predict. :param independent_var: list :param dependent_var: list :param predict: float """ self.independent_var = independent_var self.dependent_var = dependent_var self.predict = predict def fit(self) -> dict: pass def predictions(self) -> dict: pass def __str__(self): return f""" This class returns a dictionary of results from your on your linear regression: {{ 'independent_var': {self.independent_var}, 'dependent_var': {self.dependent_var}, 'fit': {{ 'coefficient': coefficient, 'constant': constant, 'r_squared': r_squared, 'p_values': 'p_values' }}, 'predictions': {{ 'predict': {self.predict}, 'result': result_of_predictions. }} }} :return: dict """

There we have it, our first class. By itself, this doesn’t do a whole lot for us. We have to convert our `class`

into an `instance`

with all of our inputs. Before we go too far, let’s take a look at our folder structure.

We have a `data`

directory with 2 `csv`

files to use as “data”. We also have a `linear_regression.py`

file which holds our `SingleLinearRegression`

class that we just created. We also have a `run_me.py`

file which will be used to run everything. You will also notice the `requirements.txt`

file, this houses all of the required packages.

What should our `run_me.py`

contain? It needs to import our `SimpleLinearRegression`

class, take data and print out results. Looking at `my_function()`

below shows us that we will need to provide a `dataset`

(filename and location) and the `predict`

value. Note that reading in the `csv`

data is quite long, we will trim this down in the next post. We instantiate our object with our data utilizing the `dependent_data`

and `independent_data`

that was read from the `dataset`

.

import csv import click from linear_regression import SingleLinearRegression def my_function(dataset: str, predict: int): print('Starting run_me.py') # Read in csv data independent_data = [] dependent_data = [] with open(dataset, 'r') as csvfile: reader = csv.reader(csvfile) next(reader, None) # Removes header row for row in reader: independent_data.append(row[0]) dependent_data.append(row[1]) # Create instance of SingleLinearRegression model single_linear_regression = SingleLinearRegression( independent_var=independent_data, dependent_var=dependent_data, predict=predict ) print(single_linear_regression)

We aren’t quite done, this will not do anything if we run the `run_me.py`

file. We need to set this up to take an arbitrary dataset in and run. This is where the `click`

library comes in handy. There are ** a lot** of different ways to pass arguments in from the CLI, but I prefer

`click`

for its simplicity.Each `@click.option`

should be self explanatory. You provide the dataset location and the predicted value. The rest is handled in the program. We have also set default values for each. Utilizing the `__name__`

and `main()`

is pretty typical in Python and you will see it all over the place, it’s a good way to setup your projects.

import csv import click from linear_regression import SingleLinearRegression @click.command() @click.option('-d', '--dataset', default='./data/fake_data.csv', help='Dataset with independent variable in first column and dependent variable in second. \ Dataset has a header row.') @click.option('-p', '--predict', default=2.5, help='Dependent variable value you would like to use the fit to predict.') def main(dataset: str, predict: int): print('Starting run_me.py') # Read in csv data independent_data = [] dependent_data = [] with open(dataset, 'r') as csvfile: reader = csv.reader(csvfile) next(reader, None) # Removes header row for row in reader: independent_data.append(row[0]) dependent_data.append(row[1]) # Create instance of SingleLinearRegression model single_linear_regression = SingleLinearRegression( independent_var=independent_data, dependent_var=dependent_data, predict=predict ) print(single_linear_regression) if __name__ == '__main__': main()

Finally, we can run this! Since we have default values (utilizing the dataset `fake_data.csv`

), we can simply run:

> python run_me.py

**Terminal Output:**

Starting run_me.py This class returns a dictionary of results from your on your linear regression: { 'independent_var': ['1', '2', '3'], 'dependent_var': ['5', '6', '8'], 'fit': { 'coefficient': coefficient, 'constant': constant, 'r_squared': r_squared, 'p_values': 'p_values' }, 'predictions': { 'predict': 2.5, 'result': result_of_predictions. } } :return: dict

We can see that we have a nice description of our output, including dynamically populated values for `independent_var`

, `dependent_var`

, and `predict`

. If we want to pass a different `dataset`

or `predict`

value in it is simple…

> python run_me.py -d data/fake_data2.csv -p 312

**Terminal Output:**

Starting run_me.py This class returns a dictionary of results from your on your linear regression: { 'independent_var': ['100', '200', '300'], 'dependent_var': ['500', '600', '800'], 'fit': { 'coefficient': coefficient, 'constant': constant, 'r_squared': r_squared, 'p_values': 'p_values' }, 'predictions': { 'predict': 312.0, 'result': result_of_predictions. } } :return: dict

You’ll notice that the variables have changed in the output! In the next post we will dive into making something a bit more useful.

*I need to state this explicitly, I am not an expert in object oriented design. These types of patterns are very specific and experts in the field have been doing this for many years with a lot of mentoring. If you are taking anything into a production environment that people depend on, please take the time to have someone with lots of experience take a look at your code to help you gain confidence and grow your skills.*