Data Visualization: The process of finding trends in our data (Quantitative and Qualitative) by representing it in graphical form is called Data Visualization. To perform data visualization in python programming we can use various python programming data visualization modules such as Seaborn, Matplotlib, Plotly, etc. Data visualization is an essential aspect of AI (Artificial intelligence) and ML (machine learning) applications. You can gain key insights into your data through different graphical representations.
- Matplotlib and Seaborn
- Line Charts
- Bar Graphs
- Histograms
- Scatter Plots
- Heat Maps
Matplotlib
The matplotlib library is a low-level, easy-to-understand library for data visualization in Python programming it is built on NumPy arrays. It allows you to build plots like – bar graphs, scatter plots, histograms, stem charts, step graphs, box plots, pie charts, violin plots, etc.
To install this library you can use the following command to install the package.
- pip install matplotlib
- Conda Install Matplotlib
Seaborn:
Seaborn is a library that uses Matplotlib library underneath to plot graphs. It is used to visualize random distributions.
- import matplotlib.pyplot as plt
- import seaborn as sns
- import matplotlib.pyplot as plt
import seaborn as sns - sns.distplot([0, 1, 2, 3, 4, 5])
- plt.show()
Scatter Plot: A scatter plot is a two-dimensional plot that represents the relationship between any two value and variable. scatter plot the independent variable is plotted on the Y-axis and the dependent variable on the x-axis. Your data contains paired numerical data (0-9).
import matplotlib.pyplot as plt
x = [5,7,7,2,17,2,9,4,11,12,9,6]
y = [99,87,88,111,86,103,87,94,78,77,85,86]
plt.scatter(x, y)
plt.show()
Preparation of Scatter Data
- Collect data are relationship is suspected.
- Draw a graph with the independent variable on the vertical axis and the dependent variable on the. Horizontal axis.
- Divide points on the graph/chart into four quadrants
pip install matplotlib seaborn bokeh
pip install tensorflow pandas
# Importing from tensorflow and keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Reshape
from tensorflow.keras import utils
from tensorflow import dtypes, tensordot
from tensorflow import convert_to_tensor, linalg, transpose
# For math operations
import numpy as np
# For plotting with matplotlib
import matplotlib.pyplot as plt
# For plotting with seaborn
import seaborn as sns
# For plotting with bokeh
from bokeh.plotting import figure, show
from bokeh.models import Legend, LegendItem
# For pandas dataframe
import pandas as pd
# load dataset
(x_train, train_labels), (_, _) = mnist.load_data()
- # Choose only the digits 0, 1, 2.
total_classes = 3
ind = np.where(train_labels < total_classes)
x_train, train_labels = x_train[ind], train_labels[ind]
# Shape of training data
total_examples, img_length, img_width = x_train.shape
# Print the statistics
print(‘Training data has ‘, total_examples, ‘images’)
print(‘Each image of size ‘, img_length, ‘x’, img_width)
Line Charts
A line chart displays the evolution of one or more numerical variables. It is a common chart type, so it can be built using any python viz library, like seaborn,
matplotlib
or plotly
.
Also Read: Vector Quantization in Python with Example
# importing the required libraries
- import matplotlib.pyplot as plt
- import numpy as np
# define data values
- x = np.array([1, 2, 3, 4]) # X-axis points
- y = x*2 # Y-axis points
- plt.plot(x, y) # Plot the chart
- plt.show() # display
Bar plot
A bar chart is a graph representing the category of raw and facts and figures with rectangular bars with lengths and heights that are proportional to the values represented. The bar plots can be plotted vertically or horizontally. A bar chart describes the comparisons between the discrete values. One of the axes of the plot represents the specific value of the categories being compared, while the other axis represents the measure values corresponding to those value categories.
- import NumPy as np
- import matplotlib.pyplot as plt
- # creating the dataset
- data = {‘C’:20, ‘C++’:15, ‘Java’:30,
- ‘Python’:35}
- courses = list(data.keys())
- values = list(data.values())
- fig = plt.figure(figsize = (10, 5))
# creating the bar plot
- plt.bar(courses, values, color =’maroon’, width = 0.4)
- plt.xlabel(“Courses offered”)
- plt.ylabel(“No. of students enrolled”)
- plt.title(“Students enrolled in different courses”)
- plt.show()
Histogram
A histogram chart is a graph showing frequency distributions.
The graph shows the number of observations within each given interval.
- import numpy as np
- x = np.random.normal(170, 10, 250)
- print(x)
- import matplotlib.pyplot as plt
- import numpy as np
- x = np.random.normal(170, 10, 250)
- plt.hist(x)
- plt.show()
Conclusion:
Data visualization provides a good, easy-to-understand, interpreted, organized pictorial representation of the data, observation, analyze. The main goal of data visualization is to make it easier to identify trends, patterns, and outliers in large or huge data sets. including information graphics, statistical graphics, and information visualization. The data visualization can be divided into three main following categories
- to explore
- to monitor
- to explain.