Parallel Coordinates Plot is a special plot for massive amount of observations to be visualizations different numerical dimensions. In this folder, we will go over how to create a Parallel Coordinates Plot with Python and Plotly.
The following scripts are used in this chapter:
- simple_pcate.py
- complex_pcate.py
This chapter requires the following packages for the scripts used:
- Pandas
- Plotly
This chapter may use the following data from the Data folder:
- grades.csv
Data consists of two parts: line and dimensions. line is the setup of line colours and related metadata and whereas dimensions stores the values and attribute related metadata in a list of dictionaries. Parallel Coordinates Plot parameters structure is very different different with standard visualization types. No x or y columns are accepted, but alternative arguements!
go.Parcoords has the following parameters:
- line - dictionary of the setting of the lines in the plot
- color: Accept numbers as label in order to determine what colour to be plotted. IDs or primary keys are good columns to be used
- colorscale: An array of normalized value (0-1.0) mapped to colour, or Plotly reserved words of colorscale
- showscale: True or False to show colour scale as a legend
- cmin: Upper bound of the colour domain, setting the accepted range in the colorcolumn in this dictionary
- cmax: Lower bound of the colour domain, setting the accepted range in the colorcolumn in this dictionary
- dimensions - array of attribute, each array may consist of:
- range: The range of this axis
- constraintrange (Optional): select an range within this attribute to be shown on the plot, none if not specify
- label: Attribute value, in string
- values: Value of the data points
- visible: Determine whether this trace is visible. Accept True, False, and 'legendonly'(Trace would not be drawn but appear as a legend item)
- tickvals: To set a interval of the column values
- ticktext: To display the text of the column tickvals alternatively to the original text in the column values
- unselected - dictionary of setting of the lines are not selected by user or range outside of constraintrange
- color: The colour of the unselected lines
- Opacity: opacity of the unselected lines, accept values between 0 and 1
This is the example of a parallel coordinates plot where constraint range is applied between 70 and 100.
# Read data
df = pd.read_csv('../Data/grades.csv')
students = df['name'].unique()
num_student = len(students)
# Prepare colorscale for each line
colours = ['gold','green','red','lightblue','pink']
nums = [num*1.0/(num_student-1) for num in range(0,num_student-1)] + [1.0]
colourscale_metadata = [[num, colour]
for num, colour in zip(nums , colours)]
# Prepare labels
labels = df.columns.tolist()[2:]
# Prepare the setup of the visualization
fig = go.Figure(data=go.Parcoords(
line={
'color': df['student_id'],
'colorscale': colourscale_metadata,
'showscale': True
},
dimensions=[
{'range':[70,100],
'constraintrange':[90,101],
'label': labels[0],
'values': df[labels[0]]
},
{'range':[70,100],
'constraintrange':[70,101],
'label': labels[1],
'values': df[labels[1]],
'tickvals':[80, 90, 100],
'ticktext':['Fair','Great','Excellent']
},
{'range':[70,100],
'label': labels[2],
'values': df[labels[2]]
},
{'range':[70,100],
'label': labels[3],
'values': df[labels[3]],
'tickvals':[70, 80, 90, 100],
'ticktext':['C','B','A','A+']
}
]
))
# Layout
fig.update_layout(
plot_bgcolor = 'white',
paper_bgcolor = 'white'
)
Plotly Documentation Parallel Coordinates Plot


