Skip to content

Latest commit

 

History

History

Readme.md

Histogram

Histogram are charts use rectangles represent frequency of the range of a continuous attribute. In this folder, we will go over how to create histogram with Python and Plotly.

Files

The following scripts are used in this chapter:

  • SimpleHistogram.py
  • NormalizedHistogram.py
  • OverlaidHistogram.py
  • StackedHistogram.py
  • CumulativeHistogram.py

Pacakges Needed

This chapter requires the following packages for the scripts used:

  • Plotly
  • Pandas

Syntax

Data

Data is a list of go.Histogram(), each go.Histogram() represents a set of histogram.

go.Histogram() has the following parameters:

  • x: Value
  • y: Value in a horizontal histogram
  • histnorm: Normalized Histogram
    • probability: Probability of an event happen of each bin
    • percent: Percentage of the occurrence with respect to total number
    • density: Number of occurrence in a bin divided by the size of the bin interval
    • probability density: The Probility of an event happen of each density bin
  • xbins: Setting of each bin
    • size: Define the interval size of each bin, number
  • ybins: Same as xbins but for horizontal bars
  • cumulative_enabled: Enable Cumulative Histogram, True/False
  • opacity: Opacity, from 0-1
  • marker_color: Bar colour (Take colour spelliing in string or RGB in string)
  • hoverinfo: What information to be displayed when user hover over the bar, all the options are:
    • percent
    • label+percent
    • label
    • name

Layout

Genetic Layout parameters suggested to use:

  • title (Dictionary): Chart title and fonts
    • text: Chart title to be displayed
    • x: text location on x-dimension, from 0-1
    • y: text location on y-dimension, from 0-1
  • xaxis (Dictionary): X-axis setting
    • tickmode: Setting of ticks
    • tickangle: Degree the tick rotate (-: Anticlockwise, +: Clockwise)
    • categoryorder: Sort the order of attributes on X-axis, either ascending or descending
      • category ascending: Sort attribute (attribute in name in Data) in ascending orders
      • category descending: Sort attribute (attribute in name in Data) in descending orders
      • total ascending: Sort value in ascending orders
      • total descending: Sort value in descending orders
      • min ascending/min descending: Sort by minimum value
      • max ascending/max descending: Sort by maximum value
      • sum ascending/sum descending: Sort by summation value
      • mean ascending/mean descending: Sort by average value
      • median ascending/median descending: Sort by median value
      • array: Follow the sorting order defined in categoryarray
    • categoryarray: Define the sorting order when categoryorder is array
  • yaxis (Dictionary): y-axis setting
    • tickmode: Setting of ticks
    • tickangle: Degree the tick rotate (-: Anticlockwise, +: Clockwise)
  • barmode: How the sets of histogram are displayed
    • stack: Histograms are drawn on top of another
    • overlay: Have different data set sharing the same bins
  • bargap: Gap between bars, in pixel
  • histfunc: Specifies the binning function (count: Count occurrences, sum: Sum the values, avg: Average the values, min/max: Display minimum or maximum value within the bin)
  • histnorm: Type of Normalization used for histogram, None by default (probability: Bar display in %, density: Bar is calculated by occurrences divided by size, probability density: Bar is calculated by occurrences divided by size, all bin sum to 1)


Histogram Exclusive parameters:

  • cumulative_enabled: Enable Cumulative Histogram, True/False
  • marker_color: Bar colour (Take colour spelliing in string or RGB in string)
  • barmode: How the sets of histogram are displayed
    • stack: Histograms are drawn on top of another
    • overlay: Have different data set sharing the same bins
  • bargap: Gap between bars, in pixel
  • histfunc: Specifies the binning function (count: Count occurrences, sum: Sum the values, avg: Average the values, min/max: Display minimum or maximum value within the bin)
  • histnorm: Type of Normalization used for histogram, None by default (probability: Bar display in %, density: Bar is calculated by occurrences divided by size, probability density: Bar is calculated by occurrences divided by size, all bin sum to 1)

Examples

Example 1 - Simple Histogram

# Data
data = []
data.append(go.Histogram(x=df['salary']))
# Layout
layout = {'title':{'text':'Histogram of Salary among Friends', 'x':0.5}}

Example 2 - Normalized Histogram

# Data
data = []
data.append(go.Histogram(x=x, histnorm='probability'))
# Layout
layout = {'title':{'text':'Distribution of 500 Random Numbers', 'x':0.5}}

Example 3 - Overlaid Histogram

data = []
for group in df['group'].unique():
    df_temp = df[df['group']==group]
    data.append(go.Histogram(x=df_temp['salary'],name=group))
# Layout
layout = {'title':{'text':'Everybody\'s Salary', 'x':0.5},
          'barmode':'overlay'}

Example 4 - Stacked Histogram

data = []
for group in df['group'].unique():
    df_temp = df[df['group']==group]
    data.append(go.Histogram(x=df_temp['salary'],name=group))
# Layout
layout = {'title':{'text':'Everybody\'s Salary', 'x':0.5},
          'barmode':'stack'}

Example 5 - Cumulative Histogram

# Data
data = []
data.append(go.Histogram(x=df['salary'], cumulative_enabled=True))
# Layout
layout = {'title':{'text':'Everybody\'s Salary', 'x':0.5}}

Example 6 - Aggregated Histogram

# Data
data = []
data.append(go.Histogram(x=df['salary'], y=df['salary'],  histfunc='sum'))
# Layout
layout = {'title':{'text':'Everybody\'s Salary (Summation by Group)', 'x':0.5}}

Note: histfunc aggreagates from y-axis. A numeric column must provided for y-axis arguement, or else, Plotly treats it as a simple histogram

Reference

Plotly Documentation Histograms