前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Data visualisation

Data visualisation

原创
作者头像
403 Forbidden
发布2021-05-19 13:33:41
6710
发布2021-05-19 13:33:41
举报
文章被收录于专栏:hsdoifh biuwedsyhsdoifh biuwedsy

Lecture 6: Data visualisation

-be able to explain the motivation for data visualisation

  • Converting data into a visual format
    • Reveals characteristics of the data, relationships between objects or relationships between features
    • Simplifies the data
  • Humans are very good at analysing information in a visual format
    • Spot trends, patterns, outliers
    • Visualisation can help show data quality
    • Visualisation helps tell a story ….

-be able to draw and interpret a 2-D scatter plot

  • Plotting points in 2D or 3D space, using colours to indicate classes/segments
  • Plotting points in 2D / 3D space
  • Take two columns from a dataset
  • Can show some clustering
  • Could be losing some information due to 2D nature
  • Can find the cluster
  • Can find how data are related, for example, positive related.

-be able to draw and interpret a histogram

  • Usually shows the distribution of values of a single variable
  • Divide the values into bins and show a bar plot of the number of objects in each bin.
  • The height of each bar indicates the number of objects
  • Shape of histogram depends on the number of bins

-be able to interpret a heat map visualisation of a dataset

  • Plot the data matrix
  • This can be useful when objects are sorted according to class/type
  • Typically, features are normalized to prevent one attribute from dominating the plot
  • This can be useful when objects are sorted according to class
  • Typically, features are normalized to prevent one attribute from dominating the plot
  • can find some pattern of data, for example, the busiest time of a supermarket.

-understand the advantages and disadvantages of using parallel coordinates to visualise a dataset

  • Parallel Coordinates
    • Used to plot the feature values of high-dimensional data
    • Instead of using perpendicular axes, use a set of parallel axes
    • The feature values of each object are plotted as a point on each corresponding coordinate axis and the points are connected by a line
    • Thus, each object is represented as a line
    • Often, the lines representing a distinct class of objects group
    • together, at least for some features
    • Ordering of attributes is important in seeing such groupings
  • Issue
  • Scaling axes
    • Affects the visualisation. May choose to scale all features into the range [0,1] via a pre-processing step
  • Ordering of axes
    • Influences the relationships that can be seen. Correlations between pairs of features may only be visible in certain orderings
  • Advantages: easier to find similar and different items
  • Disadvantages: the ordering is very important, bad ordering may results in hard to find difference or similarity between items. (see the lecture example of iris species)

-be able to interpret a parallel coordinates plot and understand why the ordering of the feature axes is important

  • Scaling axes
    • Affects the visualisation. May choose to scale all features into the range [0,1] via a pre-processing step

-understand why it can be useful to normalise each feature into the range [0, 1] before computing Euclidean distance between vectors

  • Ordering of axes
    • Influences the relationships that can be seen. Correlations between pairs of features may only be visible in certain orderings

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档