Machine Learning With Python: Basic Code Examples

Nov 17, 2025 by Alex Braham 50 views

Hey guys! Ready to dive into the awesome world of machine learning with Python? This article is your starting point, providing basic code examples to get you rolling. No need to be intimidated; we'll break it down step by step. Let's get started!

Setting Up Your Environment

Before we jump into coding, let’s make sure your environment is set up correctly. You’ll need Python installed, preferably version 3.6 or higher. I recommend using Anaconda, which comes with most of the necessary packages pre-installed. If you haven't already, download it from the Anaconda website and follow the installation instructions. Once Anaconda is installed, you can create a new environment to keep your projects organized. Open your Anaconda prompt or terminal and type:

conda create -n myenv python=3.8
conda activate myenv

This creates an environment named myenv with Python 3.8. Now, let’s install some essential libraries. We’ll need NumPy for numerical operations, Pandas for data manipulation, Scikit-learn for machine learning algorithms, and Matplotlib for data visualization. Run the following command:

pip install numpy pandas scikit-learn matplotlib

With these libraries installed, you’re ready to start coding. Trust me; this setup process is crucial. A well-configured environment will save you from countless headaches down the road. So take your time, double-check everything, and ensure all the libraries are correctly installed. Once that’s done, fire up your favorite code editor (like VSCode, PyCharm, or even Jupyter Notebook) and let’s get coding!

Why This Setup Matters

I can't stress enough how important this initial setup is. Imagine building a house on a shaky foundation – it’s bound to collapse eventually. Similarly, without the right environment and libraries, your machine learning projects will be prone to errors, compatibility issues, and overall frustration. Anaconda simplifies the process by managing dependencies and creating isolated environments for each project. This means you can have different versions of the same library for different projects without conflicts. Furthermore, using pip to install the necessary packages ensures that you have the latest and most stable versions. This minimizes the chances of encountering bugs or outdated features. So, take the time to set up your environment correctly, and you'll be well on your way to becoming a proficient machine learning practitioner.

Basic Machine Learning Steps

Alright, what does a typical machine-learning workflow look like? Glad you asked! Here's a simplified rundown:

Data Collection: Gather your data from various sources.
Data Preprocessing: Clean and prepare your data for analysis. This includes handling missing values, encoding categorical variables, and scaling numerical features.
Model Selection: Choose an appropriate model for your task. The choice depends on the type of problem you're trying to solve (e.g., classification, regression, clustering) and the characteristics of your data.
Model Training: Train your model using the preprocessed data. The model learns patterns and relationships in the data during this phase.
Model Evaluation: Evaluate your model's performance using metrics appropriate for your task. This helps you assess how well the model generalizes to unseen data.
Hyperparameter Tuning: Optimize your model's hyperparameters to improve its performance. This involves experimenting with different hyperparameter values and selecting the ones that yield the best results.
Deployment: Deploy your trained model to make predictions on new data.

Example 1: Linear Regression

Let's start with a classic: Linear Regression. This algorithm is used to predict a continuous output based on one or more input features. It’s simple, yet powerful, and provides a great foundation for understanding more complex models. We'll use Scikit-learn to build and train our model. So, here’s some Python code to make it happen:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Generate some sample data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 5, 4, 5])

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a linear regression model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

# Plot the results
plt.scatter(X, y, label='Actual Data')
plt.plot(X, model.predict(X), color='red', label='Linear Regression')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression Example')
plt.legend()
plt.show()

Code Breakdown

First, we import the necessary libraries: numpy for numerical operations, matplotlib for plotting, train_test_split for splitting the data, LinearRegression for the model, and mean_squared_error for evaluating the model. Then, we generate some sample data using numpy. Feel free to replace this with your own dataset! Next, we split the data into training and testing sets using train_test_split. The test_size parameter specifies the proportion of the data that should be used for testing (in this case, 20%). The random_state parameter ensures that the split is reproducible. We create a LinearRegression model and train it using the training data with the fit method. This is where the model learns the relationship between the input features and the output variable. After training, we make predictions on the test data using the predict method. Finally, we evaluate the model's performance using the mean_squared_error metric, which measures the average squared difference between the predicted and actual values. We also plot the results to visualize the linear regression line and the actual data points. This helps us understand how well the model fits the data. Remember, linear regression assumes a linear relationship between the input features and the output variable. If your data doesn't exhibit a linear relationship, you might need to consider a different model.

Example 2: Logistic Regression

Next up, Logistic Regression! Despite the name, it's a classification algorithm used to predict categorical outcomes. Think of it as answering a yes/no question based on the data. It is commonly used in binary classification problems, where the goal is to classify instances into one of two classes. For example, predicting whether an email is spam or not spam, or whether a customer will click on an ad or not. In this example, we’ll predict whether a person will purchase a product based on their age.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix
import seaborn as sns

# Generate some sample data
X = np.array([20, 30, 40, 50, 60]).reshape(-1, 1)
y = np.array([0, 0, 1, 1, 1])  # 0 = No Purchase, 1 = Purchase

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a logistic regression model
model = LogisticRegression()

# Train the model
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

Diving Deeper into the Code

Just like before, we start by importing the necessary libraries. This time, we include LogisticRegression for the model, accuracy_score for evaluating the model's accuracy, confusion_matrix for analyzing the classification results, and seaborn for visualizing the confusion matrix. We generate sample data with numpy, representing ages and purchase decisions. We split the data into training and testing sets, create a LogisticRegression model, and train it using the training data. After training, we make predictions on the test data and evaluate the model's accuracy using the accuracy_score function. The accuracy score measures the proportion of correctly classified instances. In addition to the accuracy score, we also compute the confusion matrix, which provides a more detailed breakdown of the classification results. The confusion matrix shows the number of true positives, true negatives, false positives, and false negatives. We visualize the confusion matrix using a heatmap from the seaborn library. This helps us understand the types of errors the model is making. Logistic regression models the probability of the positive class (e.g., purchase) as a function of the input features. It uses a sigmoid function to transform the linear combination of the input features into a probability value between 0 and 1. The model then classifies instances based on this probability value. If the probability is above a certain threshold (typically 0.5), the instance is classified as positive; otherwise, it is classified as negative. Logistic regression assumes a linear relationship between the input features and the log-odds of the positive class. If this assumption is not met, the model's performance may be suboptimal.

Example 3: K-Nearest Neighbors (KNN)

Now, let's explore a different type of algorithm: K-Nearest Neighbors (KNN). KNN is a versatile algorithm that can be used for both classification and regression tasks. It is a non-parametric algorithm, meaning that it does not make any assumptions about the underlying data distribution. Instead, it relies on the proximity of data points to make predictions. In this example, we’ll use KNN to classify different types of flowers based on their features.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn import datasets

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create a KNN classifier
model = KNeighborsClassifier(n_neighbors=3)

# Train the model
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")

Dissecting the KNN Code

As usual, we begin by importing the necessary libraries. Here, we include KNeighborsClassifier for the KNN model, accuracy_score for evaluating the model, and datasets for loading the Iris dataset, which is a built-in dataset in Scikit-learn. The Iris dataset contains measurements of sepal length, sepal width, petal length, and petal width for three different species of Iris flowers: Iris setosa, Iris versicolor, and Iris virginica. We load the Iris dataset using datasets.load_iris() and split the data into training and testing sets. We create a KNeighborsClassifier model with n_neighbors=3, which means that the model will consider the 3 nearest neighbors to make predictions. The choice of the number of neighbors is a hyperparameter that can be tuned to improve the model's performance. We train the model using the training data and make predictions on the test data. Finally, we evaluate the model's accuracy using the accuracy_score function. KNN works by finding the K nearest neighbors of a given data point in the training data and assigning the class label that is most frequent among those neighbors. The distance between data points is typically measured using Euclidean distance, but other distance metrics can also be used. KNN is a simple and intuitive algorithm that can be effective for a wide range of classification and regression tasks. However, it can be computationally expensive for large datasets, as it requires calculating the distance between each data point in the test set and all data points in the training set.

Conclusion

Alright, guys, that’s it for our basic machine learning Python code examples! We covered linear regression, logistic regression, and K-Nearest Neighbors. These examples should give you a solid foundation to start exploring more advanced techniques. Remember, practice makes perfect, so keep coding and experimenting with different datasets and algorithms. The world of machine learning is vast and exciting, and I can't wait to see what you'll create! Happy coding!