前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Public data release and individual anonymity

Public data release and individual anonymity

原创
作者头像
403 Forbidden
发布2021-05-20 15:59:55
3400
发布2021-05-20 15:59:55
举报
文章被收录于专栏:hsdoifh biuwedsyhsdoifh biuwedsy

Lecture 21: Public data release and individual anonymity

-appreciate that an individual’s privacy may be uncovered by linking to an external dataset or by use of background or external knowledge

-understand the terms sensitive attribute, non-sensitive attribute, quasi identifier

Terms understandind

  • sensitive attributes
    • information that people don't wish to reveal (e.g. medical condition)
  • non-sensitive attributes
    • information that people don't care reveal or not.
  • quasi identifier
    • a combination of non sensitive attributes that can be linked with external data to identify an individual (e.g. gender, age, zip code)
  • Explicit identifier
    • Unique for an individual (e.g. passport number)
  • Explicit identifier = unique for an individual
  • Quasi-identifier = a combination of non-sensitive attributes
  • Sensitive attributes = information that people don’t wish to reveal

-understand the notions of k-anonymity and l-diversity and how they protect privacy

  1. K-anonymity
    • A record satisfies k-anonymity if every record in the table is indistinguishable from at least k-1 other records with respect to every set of quasi-identifier attributes. Such a table is called a k-anonymous table.
    • k-anonymity is susceptible to two types of privacy attacks
      • homogeneity attack
        • k-anonymity can create groups that leak information due to leak of diversity in the sensitive attributes, which is caused by the sensitive information lack diversity when the attacker has some information about the person.
      • background attack
        • k-anonymity does not protect against attacks based on background knowledge. (e.g. Japanese have very low incidence of heart disease)
  2. I-diversity
    • its a figure to indicate how sensitive attributes diverse within each group

How does K-anonymity and I-diversity protect privacy

  • with higher k and i, it makes the attacker much harder to find the sensitive information of individuals.
  • k-anonymous table:
    • Every record is indistinguishable from at least k-1 other records with respect to every set of quasi-identifier attributes
    • For every combination of values of quasi-identifiers, there are at least k records that share those values
  • Achieving k-anonymity
    • Generalization
      • Make quasi identifiers less specific
      • E.g. race
  • E.g. zip code
  • Suppression
    • Remove the quasi identifiers completely
    • Moderate generalization process
    • Limited number of outliers
  • Worst case: narrow down quasi identifier to a group of k individuals

I-diversity

  • make the sensitive attribute diverse within each group
  • ensure there are at least I different values of the sensitive attribute in each group

-understand how k-anonymity is susceptible to two types of privacy attacks (background at- tack and homogeneity attack)

attack on k-anonymity

  • Homogeneity attack
    • K-anonymity can create groups that Leak information due to lack of diversity in the sensitive attribute
  • Background attack
    • k-anonymity does not protect against attacks based on background knowledge

-be able to determine whether an example table satisfies a given level of k-anonymity or l- diversity

-understand the benefits of using and sharing people’s location data

Benefits of using and sharing people’s location data

  • post on facebook or something like that
  • use GPS
  • use google map to find the nearest restaurant.

-understand the possible privacy concerns of people’s location data being shared

  • under the potential of suffer inference attacks
    • Home/work location pairs may lead to a small set of potential individuals
    • regular visit place. (e.g Alice is Japanese, and regular visit heart hospital, so the user can be guessed.)
    • learn about indivuals travlling habits.

-appreciate the tradeoff between privacy and utility when using location based services or when analysing location based data

  • anonymity: clocking
    • K-anonymity
      • individuals are k-anonymous if their location information cannot be distinguish from k-1 other individuals
    • Spatial cloaking
      • Gruteser & Grunwald use quadtrees
      • adapt the spatial precision of location information about a person according to the number of other people in the same quadrant
    • Temporal cloaking
      • Reduce the frequency of temporal information
  • Obfuscation
    • Idea
      • Mask an individual's precise location
      • Delibeately degrade the quality of information about an individual's location(imperfect information)
      • identity can be revealed
    • Assumption
      • Spatial imperfection almost means privacy
      • the greater the imperfect knowledge about a user's location, the greater the user's privacy

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档