NoSQL如何构建数据存储模型

ImportSource

发布于 2018-05-04 10:46:28

1.6K0

发布于 2018-05-04 10:46:28

文章被收录于专栏：ImportSource

翻译内容：NoSQL Distilled 第三章数据模型详解 3.5 Modeling for Data Access

作者简介：

本节我们主要通过一个典型的电商场景，来描述在不同的NoSQL数据库中怎样构建数据存储模型，方便我们的应用程序的读取。

As mentioned earlier, when modeling data aggregates we need to consider how the data is going to be read as well as what are the side effects on data related to those aggregates.

Let’s start with the model where all the data for the customer is embedded using a key-value store (see Figure 3.2).

前面我们提到过，当使用聚合来建模时，我们得考虑数据将会被怎样读取，以及随之而带来的副作用（就是你的建模总是能满足一种情况，而不是适应多种查询需求）。

让我们从下面的例子开始来分析，下面这个例子就是我们把一个客户（customer）所有相关数据全部存入到一个key value的数据存储结构中：

Figure 3.2. Embed all the objects for customer and their orders.

图 3.2 把客户以及他的订单的所有对象都嵌入同一个聚合里边

In this scenario, the application can read the customer’s information and all the related data by using the key. If the requirements are to read the orders or the products sold in each order, the whole object has to be read and then parsed on the client side to build the results. When references are needed, we could switch to document stores and then query inside the documents, or even change the data for the key-value store to split the value object into Customer and Order objects and then maintain these objects’ references to each other.

在这种情况下，应用程序可以通过那个key来读取customer的信息以及所有相关的数据。如果需求是读取订单或者每份订单所包含的产品，那就要先把整个对象读取到然后在客户端进行解析最后生成结果。如果需要“引用”（其实就是外链或外键，你可以这样理解），我们可以改用文档数据库然后在文档内部进行查询；或者也可以就用keyvalue数据库，只需要把一个对象（就是上面那个图，一整块的图）分成两个对象 Customer 和 Order ，然后两个对象内部再增加一个对方的引用就可以了。（这个引用相当于关系数据库里边的外键）

With the references (see Figure 3.3), we can now find the orders independently from the Customer, and with the orderId reference in the Customer we can find all Orders for theCustomer. Using aggregates this way allows for read optimization, but we have to push the orderId reference into Customer every time with a new Order.

有了这个引用，我们就可以独立于Customer 来查找订单了并且使用在 Customer中对orderId 的引用来查询指定Customer下所有的订单。这样使用聚合的话，可以优化读取速度，但是在写入的时候就得辛苦点了，需要每次写入订单的同时也要在Customer中添加一条orderId的引用。

# Customer object { "customerId": 1, "name": "Martin", "billingAddress": [{"city": "Chicago"}], "payment": [ {"type": "debit", "ccinfo": "1000-1000-1000-1000"} ] }
#Orderobject { "orderId": 99, "customerId": 1, "orderDate":"Nov-20-2011", "orderItems":[{"productId":27, "price": 32.45}], "orderPayment":[{"ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft"}], "shippingAddress":{"city":"Chicago"} }

Figure 3.3. Customer is stored separately from Order.

图 3.3 把Customer的存储和Order分开

Aggregates can also be used to obtain analytics; for example, an aggregate update may fill in information on which Orders have a given Product in them. This denormalization of the data allows for fast access to the data we are interested in and is the basis for Real Time BI or Real Time Analytics where enterprises don’t have to rely on end-of-the-day batch runs to populate data warehouse tables and generate analytics; now they can fill in this type of data, for multiple types of requirements, when the order is placed by the customer.

聚合同时也可以被用来做数据分析。比如，在更新聚合时可以将包含特定产品的订单汇总信息也一并填入其中。（译者注：这听起来不可以思议，感觉一并做了分析的事情，但为了查询，你的写入可以“不择手段”，后面会讲这是一个什么设计原则）。通过这种不规范或者叫反规范化操作可以让我们快速的访问到我们感兴趣的数据，况且这种操作也正是所谓“实时BI”或者“实时分析” 的基础啊！企业再也不用像过去一样在一天忙碌的工作结束后，然后跑到数据仓库中，然后批量统计数据仓库中的表然后生成分析结果了；现在只要客户下完订单后这类型的数据就被填入了（以满足各种不同类型的需求）。你发现，是不是为了查询，我们可以对写入操作进行反规范化，也就是想方设法的写入数据啊。

{ "itemid":27, "orders":{99,545,897,678} } { "itemid":29, "orders":{199,545,704,819} }

In document stores, since we can query inside documents, removing references to Orders from the Customer object is possible. This change allows us to not update the Customer object when new orders are placed by the Customer.

在文档数据库中的话，因为文档数据库中可以快速的在文档内查找，所以我们就可以把在Customer中的对Orders的引用删除掉。这样你就不用在每次添加一个订单后还要去更新Customer了。（译者曰：这样做也感觉清晰多了，不是吗？）

# Customer object { "customerId": 1, "name": "Martin", "billingAddress": [{"city": "Chicago"}], "payment": [ {"type": "debit", "ccinfo": "1000-1000-1000-1000"} ] }
#Orderobject { "orderId": 99, "customerId": 1, "orderDate":"Nov-20-2011", "orderItems":[{"productId":27, "price": 32.45}], "orderPayment":[{"ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft"}], "shippingAddress":{"city":"Chicago"} }

Since document data stores allow you to query by attributes inside the document, searches such as “ find all orders that include the Refactoring Databases product” are possible, but the decision to create an aggregate of items and orders they belong to is not based on the database’s query capability but on the read optimization desired by the application.

由于文档数据库可以让你在文档内部通过属性来查询，所以就能够执行像“查找包含Refactoring Databases这个产品的所有订单”这样的查询了。但是把产品和订单放到一个聚合内，并不是因为考虑到数据库的查询能力，而是考虑到应用程序如何优化数据读取。（译者曰：其实就是说这样做是为了查询方便而已）

When modeling for column-family stores, we have the benefit of the columns being ordered, allowing us to name columns that are frequently used so that they are fetched first. When using the column families to model the data, it is important to remember to do it per your query requirements and not for the purpose of writing; the general rule is to make it easy to query and denormalize the data during write.

如果我们使用列族数据库来搞的话，我们就可以调整各个列的次序了，我们可以给经常用到的列起个能够排在前面的名字，这样就可以优先的读取这些列了。当我们使用列族来建模时，应该主要考虑的是查询需求，而不是写入需求；建模的通则就是要便于查询，而对写入操作则可以不遵循什么规范和模式，官方说法叫“反规范化”。（译者曰：记住这句其实就够了！）

As you can imagine, there are multiple ways to model the data; one way is to store the Customer and Order in different column-family families (see Figure 3.4). Here, it is important to note the reference to all the orders placed by the customer are in theCustomer column family. Similar other denormalizations are generally done so that query (read) performance is improved.

你可以想象，其实有很多种建模方法；一种方法就是把 Customer 和 Order 分别存储到不同的列族中（像图 3.4）。在这个图里我们看到，对于所有订单的引用都被放在了Customer的列族中。像类似这种“反规范化”的事情是经常要做的，就为了我们查询方便。

Figure 3.4. Conceptual view into a column data store

图 3.4 列族数据存储的概念图

When using graph databases to model the same data, we model all objects as nodes and relations within them as relationships; these relationships have types and directional significance.

Each node has independent relationships with other nodes. These relationships have names like PURCHASED, PAID_WITH, or BELONGS_TO (see Figure 3.5); these relationship names let you traverse the graph. Let’s say you want to find all theCustomers who PURCHASED a product with the name Refactoring Database. All we need to do is query for the product nodeRefactoring Databases and look for all theCustomers with the incoming PURCHASED relationship.

当我们使用图数据库来对同样的数据建模的时候，我们可以把所有的对象都作为node，并且将对象之间的关系变成节点之间的关系；并且这些关系的类型和方向都很重要。

每个node和其它的node的关系都各自独立。这些关系也有个名字，类似PURCHASED, PAID_WITH, 或者 BELONGS_TO （见图3.5）；这些关系名可以让你穿越整个graph。比方说“你想要查找购买了（PURCHASED）产品 Refactoring Database 的所有的Customer”，那么我们就可以先查找到名为Refactoring Database的产品节点，然后从这个节点出发，找到所有和它具有PURCHASED关系的客户节点就可以了。