Predictive Analytics using Deepnote

Deepnote is a versatile notebook that revolutionizes analytics and data science workflow by seamlessly combining the power of SQL and Python with a wide range of data frameworks.

The focus of this guide is to display the ease of integrating Hydra into Deepnote to perform predictive analytics using Python scripts which can be used to anticipate customer behavior, improve operational efficiency and enhance your business decisions.

To cater to a broad audience, the data and KPIs were chosen to be as agnostic as possible. We will leverage a simple salary dataset on Kaggle with two columns:

  • Years of experience

  • Salary

Your datasets will likely differ in scope and complexity, but this will hopefully serve as a general tutorial on how to leverage your data with Hydra & Deepnote for predictive analytics.

Setup

Building the table with Hydra

  1. Sample data from Kaggle will be used for this guide. Download the CSV file from their website.

  2. Connect to your Hydra database and import the CSV file to the database. To do so, follow the documentation for importing a CSV file.

Creating the Connection

Now we can connect Deepnote to our sample data in Hydra.

  1. Start by signing up for Deepnote if you haven't done so already.

  2. Within your Deepnote account, create a workspace name. In the case of this tutorial, the workspace name will be "Hydra-Team".

  3. Next, choose "PostgreSQL" as the data source as shown in the image below.

  1. Create a new project by clicking on "+" under "Projects" within the left panel of the Deepnote console.

  1. Within your new project, navigate to "Integrations" on the right panel and choose "PostgreSQL".

  1. In the "Connect to PostgresSQL"' pop-up, add your Hydra database credentials including hostname, user, password, and database. You can obtain these credentials from the Hydra dashboard.

  1. After establishing the connection, you can view the imported salary table within Deepnote.

Using a SELECT statement, you can run SQL queries on the salary table which enables you to retrieve and analyze specific data from the table. Let's run a simple query to make sure everything is working like we expect:

SELECT * FROM public.salary LIMIT 100
  1. To access the table using Python scripts because predictions can not be done using SQL, an engine has to be created with the data warehouse credentials:

import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('postgresql://user:pass@server:5432/database')
data = pd.read_sql_table('Salary',con= engine, schema= 'public')

Congratulations! You've successfully connected Hydra with Deepnote! What's next?

Making the predictions

Now that we have connected Deepnote to Hydra, we can use Python and SQL scripts to access the data. We will use the Linear Regression model to create predictions leveraging the salary data.

The linear regression model and matplotlib for our visualization are imported as shown below:

#import Classes and lib needed
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

The dataset is divided into dependent (y) and independent (x) variables in order for the linear regression model to understand the relationship between the variables and make predictions. We train the linear regression model using the dataset. During the training process, the model learns from the data by identifying the patterns and relationships between the independent variables (years of experience) and the dependent variable (the salary).

#defining variables X and Y
X = data.iloc[:,0].values.reshape(-1,1)
print(X)
Y = data.iloc[:,1].values.reshape(-1,1)
print(Y)
#creating a Linear Regression model for prediction
model = LinearRegression()

#Fitting the model on the Dataset
model.fit(X,Y)

Y_pred = model.predict(X)  # make predictions

Now you can create visualizations to show the difference between the actual salary versus the predicted salary:

Y_pred = Y_pred.reshape(-1)
difference = pd.DataFrame({'Real' : data. salary, 'Predicted': Y_pred, })
print(difference)
graph = difference.head(10)
graph.plot(kind='bar')
plt.title('Actual vs Predicted')
plt.ylabel('salary')

Your first prediction is complete! You can predict the range of a person's salary based on years of experience.

Conclusion

At this point, we have performed a basic predictive analytics project by integrating the Hydra database with Deepnote. This basic guide can be leveraged to experiment with your own business metrics, but don't hesitate to reach out for more advance support through the Hydra #experts channel in Discord.

Last updated