Tutorials|4 Faster and Easier Ways to Visualize Data in Python


By George Seif

Participation: Geek AI, Xiaokun Liu

Source: heart of the machine (almosthuman2014)

heat map、 two-dimensional density map、 spider web diagramharmony tree diagram, Have you used any of these visualization methods??

Data visualization is a very important part of a data science or machine learning project. Often, you need to do exploratory data analysis (EDA) early in the project so that you have some understanding of the data, and creating visualizations can really make the task of analysis clearer and easier to understand, especially for large, high-dimensional datasets. It's also important to present the end result near the end of the project in a clear, concise and compelling way that your audience (usually non-technical clients) can understand.

Readers may have read my previous post: tutorial | 5 Fast and Easy to Use Python Matplotlib Data Visualization Methods I introduced you to 5 basic data visualization methods with that article: scatter plots, line plots, histograms, bar charts, and box plots. These are simple but powerful visualization methods through which you can gain insight into your dataset. In this article, we'll see 4 more ways to visualize data! This article goes into a little more detail about these methods and can be used next after you have read the basic methods in the previous article to extract more in-depth information from the data.

heat map

heat map(Heat Map) is a matrix representation of the data, where the value of each matrix element is represented by a color。 Different colors represent different values, Associate two or two features that need to be compared together by the index of the matrix。 heat map Ideal for showing relationships between multiple feature variables, Because you can know the size of the matrix element at that position directly from the color。 By looking at heat map Other points in the, You can also see how each relationship compares to the other relationships in the dataset。 The colors are so intuitive, So it provides us with a very simple way of interpreting the data。

Now let's take a look at the implementation code. Compared to 'matplotlib', 'seaborn' can be used to draw more advanced graphs, which often require more components, such as multiple colors, shapes or variables. "matplotlib" can be used to display graphs, "NumPy" can be used to generate data, and "pandas" can be used to manipulate data! Drawing is only one of the simple functions of "seaborn".

# Importing libs
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Create a random dataset
data = pd.DataFrame(np.random.random((10,6)), columns=["Iron Man","Captain America","Black Widow","Thor","Hulk", "Hawkeye"])

print(data)

# Plot the heatmap
heatmap_plot = sns.heatmap(data, center=0, cmap='gist_ncar')

plt.show()

two-dimensional density map

The 2D Density Plot (2D Density Plot) is an intuitive extension of the 1D version of the density plot and has the advantage over the 1D version of being able to see the probability distribution about the two variables. For example, in the two-dimensional density plot below, the scale plot on the right indicates the probability of each point by color. The place where our data has the greatest probability of occurring (i.e., where the data points are most concentrated) seems to be around size=0.5 and speed=1.4. As you know by now, a two-dimensional density map is useful for quickly identifying the most concentrated areas of our data in the case of two variables, as opposed to a one-dimensional density map where there is only one variable. When you have two variables that are important to the output and want to understand how they work together to distribute the output, it is very effective to look at the data in a two-dimensional density plot.

Once again, writing code with "seaborn" has proven to be very convenient! This time, we'll create a skewed distribution to make the data visualization results more interesting. You can tweak most of the optional parameters to make visualizing the results look clearer.

# Importing libs
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import skewnorm

# Create the data
speed = skewnorm.rvs(4, size=50) 
size = skewnorm.rvs(4, size=50)

# Create and shor the 2D Density plot
ax = sns.kdeplot(speed, size, cmap="Reds", shade=False, bw=.15, cbar=True)
ax.set(xlabel='speed', ylabel='size')
plt.show()

spider web diagram

Spider Plot is one of the best ways to show one-to-many relationships. In other words, you can plot and view the values of multiple variables associated with a particular variable or category. In a spider web diagram, the significance of one variable relative to another is clear and obvious, as the area covered and the length from the center becomes larger in a given direction. If you want to see how the several different classes of objects described using these variables differ, you can plot them side by side. In the chart below, it's easy to compare the different attributes of the Avengers and see where they each excel! (Please note that these numbers are set randomly and I am not biased against the members of the Avengers. )

Here, we can directly use "matplotlib" instead of "seaborn" to create visualizations. We need to make each property equally spaced along the circumference of the circle. We will set labels on each corner and then plot the value as a point whose distance to the center depends on its value/size. Finally, for a clearer display, we will use a translucent color to fill the area surrounded by the lines obtained by joining the property points together.

# Import libs
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

# Get the data
df=pd.read_csv("avengers_data.csv")
print(df)

"""
   #             Name  Attack  Defense  Speed  Range  Health
0  1         Iron Man      83       80     75     70      70
1  2  Captain America      60       62     63     80      80
2  3             Thor      80       82     83    100     100
3  3             Hulk      80      100     67     44      92
4  4      Black Widow      52       43     60     50      65
5  5          Hawkeye      58       64     58     80      65

"""

# Get the data for Iron Man
labels=np.array(["Attack","Defense","Speed","Range","Health"])
stats=df.loc[0,labels].values

# Make some calculations for the plot
angles=np.linspace(0, 2*np.pi, len(labels), endpoint=False)
stats=np.concatenate((stats,[stats[0]]))
angles=np.concatenate((angles,[angles[0]]))

# Plot stuff
fig = plt.figure()
ax = fig.add_subplot(111, polar=True)
ax.plot(angles, stats, 'o-', linewidth=2)
ax.fill(angles, stats, alpha=0.25)
ax.set_thetagrids(angles * 180/np.pi, labels)
ax.set_title([df.loc[0,"Name"]])
ax.grid(True)

plt.show()

tree diagram

We've been using Tree Diagrams since elementary school! Tree diagrams are natural and intuitive, which makes them easy to interpret. Nodes that are directly connected are closely related, while nodes with multiple connections are less similar. In the visualization below, I've plotted a tree diagram of a small set of Pokémon games based on Kaggle's statistics (Life, Attack, Defense, Special Attack, Special Defense, Speed).

Thus, the most statistically significant matches of Pokémon will be tightly connected. For example, at the top of the graph, the Arbor and Spikebill are directly connected, and if we look at the data, the Arbor has a total score of 438 and the Spikebill has 442, which is very close! But if we look at Lada, we can see that its total score is 413, which is a big difference from Arbor Monster and Spikebird, so they're separated in the tree chart! As we move up the tree, the Pokémon in the green group are more similar to each other than they are to any of the Pokémon in the red group, even though there is no direct green connection here.

as far as sth is concerned tree diagram, We actually need to use「Scipy」 To draw! After reading the data in the dataset, We will delete the string column。 This is just to make the visualization more intuitive、 Easy to understand, But in practice, Converting these strings to categorical variables will give better results and comparisons。 We also set the index of the data frame, so that it can be properly used as a column referencing each node。 The last thing I need to tell you is, (located) at「Scipy」 Calculated and plotted in tree diagram Just one simple line of code。

# Import libs
import pandas as pd
from matplotlib import pyplot as plt
from scipy.cluster import hierarchy
import numpy as np

# Read in the dataset
# Drop any fields that are strings
# Only get the first 40 because this dataset is big
df = pd.read_csv('Pokemon.csv')
df = df.set_index('Name')
del df.index.name
df = df.drop(["Type 1", "Type 2", "Legendary"], axis=1)
df = df.head(n=40)

# Calculate the distance between each sample
Z = hierarchy.linkage(df, 'ward')

# Orientation our tree
hierarchy.dendrogram(Z, orientation="left", labels=df.index)

plt.show()

Link to original article:https://towardsdatascience.com/4-more-quick-and-easy-data-visualizations-in-python-with-code-da9030ab3429


Recommended>>
1、A brief introduction to what blockchain is
2、Good Youth Past Resources
3、What should be taken into account in the development of law in the age of artificial intelligence
4、A short summary of python crawler knowledge
5、Build an ethereum network and mine to produce ethereum ETH

    已推荐到看一看 和朋友分享想法
    最多200字,当前共 发送

    已发送

    朋友将在看一看看到

    确定
    分享你的想法...
    取消

    分享想法到看一看

    确定
    最多200字,当前共

    发送中

    网络异常,请稍后重试

    微信扫一扫
    关注该公众号