前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Unimelb COMP20008 Note 2019 SM1 - Data formats

Unimelb COMP20008 Note 2019 SM1 - Data formats

原创
作者头像
403 Forbidden
发布2021-05-17 22:08:11
4970
发布2021-05-17 22:08:11
举报
文章被收录于专栏:hsdoifh biuwedsyhsdoifh biuwedsy

Lectures 2 and 3: Data formats

Relational Databases

-Appreciate the role that relational databases play in data wrangling.

  • Structure data is like a relational database
  • Structured: Relational databases, CSV
  • Unstructured: text
  • Semi-structured: HTML,XML,JSON
  • Advantages:
    • Easier to analyse, easier to query
    • Easier to store
    • Easier to clean, maintain consistency and security, especially with multiple users
    • Regularity
  • Relational databases, the classic method of storing structured data (banking, sales, airlines …)
    • Data stored in tables, each row is a data item and columns describe attributes of the data item
    • Can query the data using a high-level language such as SQL

Regular Expression

-be able to understand a regular expression using the operators

. ^ $ * + | [ ]

-be able to formulate a regular expression using the above operators, based on an English description

CSV

-be able to explain what is a CSV file, what is a spreadsheet, what is the difference?

  • Spreadsheet
    • The spreadsheet is a file made of row and columns that help sort data, arrange data easily, and calculate numerical data
  • CSV
    • Spreadsheet
    • Easy to use
    • Structured, but not like a relational DB
  • Differences
    • CSV are human readable
    • CSV lack the formatting information
    • CSV format is a plain text format in which values are separated by commas, while a spreadsheet is the binary file format that holds information in a file, included both content and formatting.

XML

-be able to explain the motivation for XML and XML namespaces

  • A 'meta' mark-up language
  • Extensible: user define tags
  • Facilitate better encoding of semantics
  • It's beneficial to reuse parts from existing, well-designed schemas
  • Allowing searching engines or other tools to operate over a range of documents that in many respects but use common names for common element types
  • XML namespaces are base on the use of qualified names, which contain a single colon, separating the name into a namespace prefix and the local name. The prefix, which is mapped to a URI, selects a namespace.
  • The combination of the universally managed URI namespace and the local schema namespace produces names that guaranteed universally unique

-be able to explain the differences between XML and HTML

  • HTML tags are predefined tags where as XML tags are user-defined tags
  • HTML tags are the limited number of tags where as XML tags are extensible.
  • HTML tags are case insensitive where as XML tags are sensitive.
  • HTML tags are meant for displaying the data but not for describing the data where as XML tags are meant for describing the data.
  • HTML focuses on how data looks where as XML focuses on what data is.

-be able to explain the difference between XML attributes and elements and describe situations in which the use of one is preferred over the other

  • XML element
    • An XML element is everything from (including) the element's start tag to (including) the element's end tag
    • <element></element>
    • An element contains:
      • Text
      • Attributes
      • Other elements
      • Or a mix of the above
  • XML Attribute
    • Attributes are part of XML elements
    • Attributes define properties of elements
    • Attribute is always a name-value pair

-be able to create XML documents, based on a natural language specification

-be able to both create and understand XML documents that use XML namespace syntax

  • Default Namespace:
    • Using namespace, you can define the context in which names are defined. In essence, a namespace defines a scope
    • xmlns="namespaceURI"
    • xmlns="http://info.gov.uk"
    • Defining a default namespace for an element also saves us from using prefixes in all the child elements
  • Using prefix
    • Name conflict in XML can be solved using a prefix with namespace.
    • It provides a method to avoid element name conflicts.
    • In XML, element names are defined by the developer. This often results in a conflict when trying to mix XML documents from different XML applications.
    • A user or an XML application will not know how to handle these differences.
  • xmlns Attributes
    • When using prefixes in XML, a namespace for the prefix must be defined.
    • The namespace can be defined by an xmlns attribute in the start tag of an element.
    • The namespace declaration has the following syntax.
    • xmlns:prefix="URI"

-be able to explain the purpose of XML namespaces and list reasons for why it is useful

  • XML Namespaces provide a method to avoid element name conflicts.
  • To group elements relating to a common idea together

-understand what is mean by well-formed XML and valid XML

  • Syntax-Well-formed
    • XML files must begin with declaration
      • <?xml version="1.0"?>
    • XML elements
      • Start/end tags or empty tags
      • Attributes in quotes
        • <campus>Parkville</campus>
        • <campus location="Parkville"/>
      • Appropriately nested
      • One root element
    • Comments
      • <!--comments do not affect the document, it's not part of the data that you want to represent-->
    • Some characters have special meaning
      • '<' and '&' are strictly illegal inside an element
    • CDATA(character data) section may be used inside XML element to include large blocks of text, which may contain these special characters such as &,>
    • The XML standard states that an XML document that conforms to the standard is said to be "Well-formed." The XML standard has many syntaxes, grammar and structure rules. An XML document must have a single root element, the elements must be properly nested, tag names cannot begin with a number or contain certain characters, and so on.
  • XML schema & validation
    • An XML file can be well-formed and NOT valid; it is valid if it is consistent with a particular schema.
    • XML Schema languages, examples:
      • XSD (XML Schema Definitions): a W3C standard
      • DTD (Document Type Definitions)
    • HTML5 schema for Web browsers <!DOCTYPE html>
    • Validation Tools (schema checking software)
    • XML validation is distinct from well-formed. An XML document is said to be valid if it is associated with a document type definition(DTD), or an XML schema, and complies with the constraints specified in the DTD or schema.

-be able to explain the difference between XML and JSON and applications where each is suited

  • F
  • JSON is simpler and more compact/lightweight than XML. Easy to parse.
  • Common JSON application – read and display data from a webserver using JavaScript
  • XML comes with a large family of other standards for querying and transforming (XQuery, XML Schema, XPATH, XSLT, namespaces, …)
  • XML stands for “Extensive Markup Language” and is written in a similar way as followed by HTML, whereas JSON stands for “JavaScript Object Notation” which is a subset of the JavaScript syntax and is completely language-independent.
  • XML allows complex schema definitions (via regular expressions)
    • allows formal validation
    • makes you consider the data design more closely
  • JSON is more streamlined, lightweight and compressed
    • Which appeals to programmers looking for speed and Efficiency
    • Widely used for storing data in noSQL databases
  1. JSON

-be able to read and create documents using JSON

  • Syntax
    • Object data is in name/value pairs
      • "firstName":"John"
    • JSON values
      • A number (integer or floating point)
      • A string (in double quotes)
      • A Boolean (true or false)
      • An array (in square brackets)
      • An object (in curly braces)
      • Null
    • JSON Objects
      • {"firstName":"John", "lastName":"Doe"}
    • JSON Arrays
      • "employees":[

{"firstName":"John", "lastName":"Doe"},

{"firstName":"Anna", "lastName":"Smith"},

{"firstName":"Peter", "lastName":"Jones"}

]

  • These objects repeat recursively down a hierarchy as needed.

-be able to convert an XML document to JSON and vice versa

  • JSON see last question

-be able to explain the purpose of using schemas for XML and JSON data

  • JSON Schema
    • Written in JSON itself
    • Describes the structure of other data
    • Easy to validate a JSON document against its schema using aschema validator
  • XML Schema
    • Written in XML itself
    • Schema is Extensible
    • It is easier to describe allowable document content
    • It is easier to validate the correctness of data
    • It is easier to define data facets (restrictions on data)
    • It is easier to define data patterns (data formats)
    • It is easier to convert data between different data types

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Lectures 2 and 3: Data formats
    • Relational Databases
      • Regular Expression
        • CSV
        • XML
        领券
        问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档