Histogram are charts use rectangles represent frequency of the range of a continuous attribute. In this folder, we will go over how to create histogram with Python and Plotly.
The following scripts are used in this chapter:
- SimpleHistogram.py
- NormalizedHistogram.py
- OverlaidHistogram.py
- StackedHistogram.py
- CumulativeHistogram.py
This chapter requires the following packages for the scripts used:
- Plotly
- Pandas
Data is a list of go.Histogram(), each go.Histogram() represents a set of histogram.
go.Histogram() has the following parameters:
- x: Value
- y: Value in a horizontal histogram
- histnorm: Normalized Histogram
- probability: Probability of an event happen of each bin
- percent: Percentage of the occurrence with respect to total number
- density: Number of occurrence in a bin divided by the size of the bin interval
- probability density: The Probility of an event happen of each density bin
- xbins: Setting of each bin
- size: Define the interval size of each bin, number
- ybins: Same as xbins but for horizontal bars
- cumulative_enabled: Enable Cumulative Histogram, True/False
- opacity: Opacity, from 0-1
- marker_color: Bar colour (Take colour spelliing in string or RGB in string)
- hoverinfo: What information to be displayed when user hover over the bar, all the options are:
- percent
- label+percent
- label
- name
Genetic Layout parameters suggested to use:
- title (Dictionary): Chart title and fonts
- text: Chart title to be displayed
- x: text location on x-dimension, from 0-1
- y: text location on y-dimension, from 0-1
- xaxis (Dictionary): X-axis setting
- tickmode: Setting of ticks
- tickangle: Degree the tick rotate (-: Anticlockwise, +: Clockwise)
- categoryorder: Sort the order of attributes on X-axis, either ascending or descending
- category ascending: Sort attribute (attribute in name in Data) in ascending orders
- category descending: Sort attribute (attribute in name in Data) in descending orders
- total ascending: Sort value in ascending orders
- total descending: Sort value in descending orders
- min ascending/min descending: Sort by minimum value
- max ascending/max descending: Sort by maximum value
- sum ascending/sum descending: Sort by summation value
- mean ascending/mean descending: Sort by average value
- median ascending/median descending: Sort by median value
- array: Follow the sorting order defined in categoryarray
- categoryarray: Define the sorting order when categoryorder is array
- yaxis (Dictionary): y-axis setting
- tickmode: Setting of ticks
- tickangle: Degree the tick rotate (-: Anticlockwise, +: Clockwise)
- barmode: How the sets of histogram are displayed
- stack: Histograms are drawn on top of another
- overlay: Have different data set sharing the same bins
- bargap: Gap between bars, in pixel
- histfunc: Specifies the binning function (count: Count occurrences, sum: Sum the values, avg: Average the values, min/max: Display minimum or maximum value within the bin)
- histnorm: Type of Normalization used for histogram, None by default (probability: Bar display in %, density: Bar is calculated by occurrences divided by size, probability density: Bar is calculated by occurrences divided by size, all bin sum to 1)
Histogram Exclusive parameters:
- cumulative_enabled: Enable Cumulative Histogram, True/False
- marker_color: Bar colour (Take colour spelliing in string or RGB in string)
- barmode: How the sets of histogram are displayed
- stack: Histograms are drawn on top of another
- overlay: Have different data set sharing the same bins
- bargap: Gap between bars, in pixel
- histfunc: Specifies the binning function (count: Count occurrences, sum: Sum the values, avg: Average the values, min/max: Display minimum or maximum value within the bin)
- histnorm: Type of Normalization used for histogram, None by default (probability: Bar display in %, density: Bar is calculated by occurrences divided by size, probability density: Bar is calculated by occurrences divided by size, all bin sum to 1)
# Data
data = []
data.append(go.Histogram(x=df['salary']))
# Layout
layout = {'title':{'text':'Histogram of Salary among Friends', 'x':0.5}}
# Data
data = []
data.append(go.Histogram(x=x, histnorm='probability'))
# Layout
layout = {'title':{'text':'Distribution of 500 Random Numbers', 'x':0.5}}
data = []
for group in df['group'].unique():
df_temp = df[df['group']==group]
data.append(go.Histogram(x=df_temp['salary'],name=group))
# Layout
layout = {'title':{'text':'Everybody\'s Salary', 'x':0.5},
'barmode':'overlay'}
data = []
for group in df['group'].unique():
df_temp = df[df['group']==group]
data.append(go.Histogram(x=df_temp['salary'],name=group))
# Layout
layout = {'title':{'text':'Everybody\'s Salary', 'x':0.5},
'barmode':'stack'}
# Data
data = []
data.append(go.Histogram(x=df['salary'], cumulative_enabled=True))
# Layout
layout = {'title':{'text':'Everybody\'s Salary', 'x':0.5}}
# Data
data = []
data.append(go.Histogram(x=df['salary'], y=df['salary'], histfunc='sum'))
# Layout
layout = {'title':{'text':'Everybody\'s Salary (Summation by Group)', 'x':0.5}}
Note: histfunc aggreagates from y-axis. A numeric column must provided for y-axis arguement, or else, Plotly treats it as a simple histogram
Plotly Documentation Histograms





