Skip to content

Latest commit

 

History

History

readme.md

Parallel Coordinates Plot

Parallel Coordinates Plot is a special plot for massive amount of observations to be visualizations different numerical dimensions. In this folder, we will go over how to create a Parallel Coordinates Plot with Python and Plotly.

Files

The following scripts are used in this chapter:

  • simple_pcate.py
  • complex_pcate.py

Pacakges Needed

This chapter requires the following packages for the scripts used:

  • Pandas
  • Plotly

Data Used

This chapter may use the following data from the Data folder:

  • grades.csv

Syntax

Data

Data consists of two parts: line and dimensions. line is the setup of line colours and related metadata and whereas dimensions stores the values and attribute related metadata in a list of dictionaries. Parallel Coordinates Plot parameters structure is very different different with standard visualization types. No x or y columns are accepted, but alternative arguements!

go.Parcoords has the following parameters:

  • line - dictionary of the setting of the lines in the plot
    • color: Accept numbers as label in order to determine what colour to be plotted. IDs or primary keys are good columns to be used
    • colorscale: An array of normalized value (0-1.0) mapped to colour, or Plotly reserved words of colorscale
    • showscale: True or False to show colour scale as a legend
    • cmin: Upper bound of the colour domain, setting the accepted range in the colorcolumn in this dictionary
    • cmax: Lower bound of the colour domain, setting the accepted range in the colorcolumn in this dictionary
  • dimensions - array of attribute, each array may consist of:
    • range: The range of this axis
    • constraintrange (Optional): select an range within this attribute to be shown on the plot, none if not specify
    • label: Attribute value, in string
    • values: Value of the data points
    • visible: Determine whether this trace is visible. Accept True, False, and 'legendonly'(Trace would not be drawn but appear as a legend item)
    • tickvals: To set a interval of the column values
    • ticktext: To display the text of the column tickvals alternatively to the original text in the column values
  • unselected - dictionary of setting of the lines are not selected by user or range outside of constraintrange
    • color: The colour of the unselected lines
    • Opacity: opacity of the unselected lines, accept values between 0 and 1

Example - Parallel Coordinates Plot


This is the example of a parallel coordinates plot where constraint range is applied between 70 and 100.


# Read data
df = pd.read_csv('../Data/grades.csv')
students = df['name'].unique()
num_student = len(students)

# Prepare colorscale for each line
colours = ['gold','green','red','lightblue','pink']
nums = [num*1.0/(num_student-1) for num in range(0,num_student-1)] + [1.0]

colourscale_metadata = [[num, colour] 
		for num, colour in zip(nums , colours)]

# Prepare labels
labels = df.columns.tolist()[2:]


# Prepare the setup of the visualization
fig = go.Figure(data=go.Parcoords(
		line={
			'color': df['student_id'],
			'colorscale': colourscale_metadata,
			'showscale': True
		},
		dimensions=[
			{'range':[70,100],
			  'constraintrange':[90,101],
			  'label': labels[0],
			  'values': df[labels[0]]
			},
			{'range':[70,100],
			  'constraintrange':[70,101],
			  'label': labels[1],
			  'values': df[labels[1]],
			  'tickvals':[80, 90, 100],
			  'ticktext':['Fair','Great','Excellent']
			},
			{'range':[70,100],
			  'label': labels[2],
			  'values': df[labels[2]]
			},
			{'range':[70,100],
			  'label': labels[3],
			  'values': df[labels[3]],
			  'tickvals':[70, 80, 90, 100],
			  'ticktext':['C','B','A','A+']
			}
		]
	))

# Layout
fig.update_layout(
    plot_bgcolor = 'white',
    paper_bgcolor = 'white'
)

Reference

Plotly Documentation Parallel Coordinates Plot