cool hit counter Introduction to MySQL 5.7 JSON Implementation_Intefrankly

Introduction to MySQL 5.7 JSON Implementation


brief introduction

This article focuses on the features of JSON, an unstructured data type introduced in MySQL 5.7.7, and the specific implementation (including storage). First, we introduce why we introduced native data type support for JSON; then we introduce the JSON manipulation functions that MySQL provides to users, as well as the JSON path expression syntax, which, combined with both, allows users to manipulate arbitrary keys and data in JSON at the database level; after that, we focus on the storage structure of JSON on the server side, which is the root of a deeper understanding of many other JSON features; before finally introducing the comparison and sorting rules for JSON as a new data type, we introduce the principles of indexing JSON type data.

Why native support for JSON

  1. Document legality

Before MySQL 5.7.7 provided native type support for JSON, users could store JSON documents with TEXT or BLOB types. However, for MySQL, the data inserted by the user is just a normal string after serialization, and no checks are done on the syntactic legality of the JSON document itself; the legality of the document needs to be guaranteed by the user. After introducing a new JSON type, inserting a JSON document with a syntax error, MySQL will prompt for the error and do normalization after the insert to ensure that each key corresponds to a value.

  1. More effective access

MySQL 5.7.7+ itself provides many native functions as well as path expressions to facilitate access to JSON data. For example, for the following JSON document.

{ "a": [ [ 3, 2 ], [ { "c" : "d" }, 1 ] ], "b": { "c" : 6 }, "one potato": 7, "b.c" : 8 }

Users can use the

$.a[1][0] gain{ "c" : "d" }$.a[1] gain[ { "c" : "d" }, 1 ]

Wildcards can also be used* harmony** to perform fuzzy matching, as detailed in the next paragraph.

  1. Performance optimization

Before MySQL provided native support for JSON, if a user needed to obtain or modify the key value of a JSON document, the entire string of TEXT or BLOB needed to be read out and deserialized into a JSON object, and then the JSON data was accessed through various library functions. Obviously this is very inefficient, especially for larger documents. And the performance of native JSON, especially the read performance, is very good. According to performance tests done by Oracle for 200K+ data documents show that the same data with TEXT and JSON type query performance difference reaches **two orders of magnitude** more, and users can also index frequently accessed JSON key values to further improve performance. The improved performance of JSON data manipulation is based on the storage structure of the JSON data itself, which is described further below.

JSON manipulation interface and path expressions

  1. JSON's manipulation interface

According to the official MySQL documentation, the following conditions need to be met for the server-side JSON function to be implemented.

Requirements: ` Lets users construct JSON data values from other relational data. Lets users extract relational data from JSON data values. Lets users minimally introspect the structure of JSON values and text (validity, length, depth, keys). Works on strings which are utf8mb4 encoded. Performance should be suitable for read-intensive applications. Non-requirements: May produce surprising results on strings which are not utf8mb4 encoded. There is limited support for decimal values nested inside JSON documents. Performance may not be suitable for write-intensive applications.

The list of functions provided is specified as.

> JSON_APPEND()      JSON_ARRAY_INSERT()    JSON_UNQUOTE()     JSON_ARRAY()  
> JSON_REPLACE()    JSON_CONTAINS()            JSON_DEPTH()           JSON_EXTRACT() 
> JSON_INSERT()    JSON_KEYS()     JSON_LENGTH()        JSON_VALID()
> JSON_MERGE()     JSON_OBJECT()  JSON_QUOTE()          JSON_REMOVE() 
> JSON_CONTAINS_PATH()    JSON_SEARCH()    JSON_SET()    JSON_TYPE()

Most of the rules for calling the above functions are shaped as follows.

JSON_APPEND(json_doc, path, val[, path, val] ...) 
  • the first parameter, json_doc, is a JSON document, or a column inside a table, or it can be a nested sub-document variable inside a JSON document.
  • The second parameter, path, is a path expression used to locate the key to be accessed; path (i.e., the path expression) is described immediately below.
  • The third parameter, val, may not be present in some functions, and if it is, it indicates the value of the operation corresponding to the key.
  • JSON path expressions

To make it easier and faster to access JSON keys, MySQL 5.7.7+ provides new path expression syntax support. aforementioned$.a[1][0] is a concrete example of a path expression. The complete syntax of the path expression is.

pathExpression> ::= scope  [ ( pathLeg )* ]
scope ::= [ columnReference ] dollarSign
columnReference ::= [ [ databaseIdentifier period  ] tableIdentifier period ] columnIdentifie
databaseIdentifier ::= sqlIdentifie
tableIdentifier ::= sqlIdentifie
columnIdentifier ::= sqlIdentifie
pathLeg ::= member | arrayLocation | doubleAsterisk
member ::= period ( keyName | asterisk )
arrayLocation ::= leftBracket ( non-negative-integer | asterisk ) rightBracket
keyName ::= ECMAScript-identifier | double-quoted-string-literal
doubleAsterisk ::= **

still with

{ "a": [ [ 3, 2 ], [ { "c" : "d" }, 1 ] ], "b": { "c" : 6 }, "one potato": 7, "b.c" : 8 }

As an example, a few more examples are given.

$.a[1] The value obtained is [ { "c" : "d" }, 1 ] $.b.c The value obtained is 6 $."b.c" The value obtained is8

Comparing the last two examples above, you can see that an expression surrounded by quotation marks is treated as a string key value.

About Wildcards* harmony** to perform fuzzy matching needs further explanation.

Two asterisks in a row.** Cannot be the end of an expression with three consecutive asterisks*** single asterisk* means match all members of a JSON object [*] means match all elements in a JSON array prefix**suffix Indicates all paths that start with prefix and end with suffix

As a specific example, directly inside the MySQL command line, type.

select json_extract('{ "a": [ [ 3, 2 ], [ { "c" : "d" }, 1 ] ], "b": { "c" : 6 }, "one potato": 7, "b.c" : 8 }','$**.c');  

Get the results displayed.["d", 6]

JSON storage structure and concrete implementation

When dealing with JSON, MySQL uses the utf8mb4 character set. utf8mb4 is a superset of utf8 and ascii. For historical reasons, utf8 here is not the UTF-8 Unicode variant encoding scheme we often refer to, but the utf8 encoding scheme defined by MySQL itself, up to three bytes. The specific difference is not the focus of this article, so please Google it yourself to learn more.

MySQL represents JSON documents in memory as a DOM, and when MySQL parses a specific path expression, it only needs to deserialize and parse the objects on the path, and it's extremely fast. To figure out how MySQL does all this, we need to understand the structure of how JSON is stored on the hard drive. One interesting point is that JSON objects are subclasses of BLOBs, specializing on top of them.

According to the official MySQL documentation.

On a high level, we will store the contents of the JSON document in three sections: * A table of pointers to all the keys and values, in the order in which the keys and values are stored. Each pointer contains information about where the data associated with the key or the value is located, as well as type information about the key or value pointed to. *All the keys. The keys are sorted, so that lookup can use binary search to locate the key quickly. * All the values, in the same order as their corresponding keys. If the document is an array, it has two sections only: the dictionary and the values. If the document is a scalar, it has a single section which contains the scalar value

Let's use a schematic diagram to show its structure more clearly.

MySQL JSON Storage

JSON documents are inherently hierarchical in structure, and thus MySQL is hierarchical in its storage of JSON as well. For each level of objects, the top of the storage is the number of elements that hold the current object, and the size that the whole occupies. Points to note are.

  • JSON object Key index (orange part of the figure) are sorted, first by length, the same length according to code point sorting; Value index (yellow part of the figure) according to the position of the corresponding Key in order, the last real data storage (white part of the figure) is the same
  • The index pair of Key and Value stores the offset and size within the object, and a single index has a fixed size that can be jumped to at a distance of N by simple arithmetic
  • As you can see through the MySQL 5.7.16 source code, when serializing JSON documents, MySQL dynamically detects the size of a single object and uses a two-byte offset if it is less than 64KB, otherwise it uses a four-byte offset to save space. Also, **dynamically checking whether a single object is a large object causes the large object to be parsed twice, which is also noted in the source code as a point for future optimization**
  • Now limited by the four-byte size of the offset and storage size in the index, the size of a single JSON document cannot exceed 4G; the size of a single KEY cannot exceed two bytes, i.e. 64K
  • The offsets within the indexed storage object are for ease of movement, and if a key value is changed, only the offset of the affected object as a whole is modified
  • The size of the index is now redundant information, because the storage size can be simply obtained by the adjacent offset, mainly to cope with variable length JSON object value updates, if the length becomes smaller, the JSON document as a whole does not have to move, only the current object needs to modify the size
  • **Now MySQL has no extra space reserved for variable size values, meaning that if the value becomes longer, all subsequent storage is affected **
  • Combined with JSON's path expressions, you can know that JSON's search operation uses only the elements involved in the deserialization path, which is very fast and achieves high performance for read operations
  • **However, MySQL can be slow for update operations on large documents with variable-length keys, and may not be suitable for write-intensive needs**

Indexing of JSON

Indexing JSON columns is not supported by MySQL now, as the official website documentation explains.

JSON columns cannot be indexed. You can work around this restriction by creating an index on a generated column that extracts a scalar value from the JSON column.

Although indexing directly on JSON columns is not supported, MySQL provides that you can first create virtual columns on scalar values in JSON documents using path expressions, and then create indexes on the virtual columns. This allows users to use expressions to build indexes on the keys they are interested in. To illustrate with a specific example.

CREATE TABLE features (

 id INT NOT NULL AUTO\_INCREMENT,

 feature JSON NOT NULL,

 PRIMARY KEY (id)

);

The format of the JSON data inserted into it is.

{
   "type":"Feature",
   "properties":{
      "TO_ST":"0",
      "BLKLOT":"0001001",
      "STREET":"UNKNOWN",
      "FROM_ST":"0",
      "LOT_NUM":"001",
      "ST_TYPE":null,
      "ODD_EVEN":"E",
      "BLOCK_NUM":"0001",
      "MAPBLKLOT":"0001001"
   }
}

Use.

ALTER TABLE features ADD feature_street VARCHAR(30) AS (JSON_UNQUOTE(feature->"$.properties.STREET"));

ALTER TABLE features ADD INDEX (feature_street);

Two steps that can be performed on the STREET key under the properties key value in the FEATURES column (feature->"$.properties.STREET" ) to create the index.

Where the feature_street column is the newly added virtual column. The reason for the virtual column name is that there is a stored column corresponding to it. The biggest difference between them is that virtual columns only modify the metadata of the database and do not store real data on the hard disk, and the reading process is calculated in real time; whereas stored columns store the columns of the expression on the hard disk. The two are used in different scenarios, and by default the columns generated via expressions are virtual columns.

This makes adding and deleting virtual columns very fast, and building indexes on virtual columns is no different from the traditional way of building indexes, which improves the performance of virtual column reads and slows down the overall insert performance. The virtual column feature combined with JSON's path expressions can easily provide users with efficient key-value indexing capabilities.

JSON comparison and sorting

JSONThe value can be used=, <, <=, >, >=, <>, !=, <=> isoperator,BETWEEN, IN, GREATEST, LEAST etc. operators are not supported right now。JSON Two-level sorting for value use rules and regulations, The first level is based onJSON Types of, The types differ using a sort specific to each type rules and regulations。

JSON types are, in descending order of priority

BLOB
BIT
OPAQUE
DATETIME
TIME
DATE
BOOLEAN
ARRAY
OBJECT
STRING
INTEGER, DOUBLE
NULL

The type with the highest priority is large and no other comparison operations are necessary; if the types are the same, each type is sorted by its own rules. The specific rules are as follows.

  1. BLOB/BIT/OPAQUE: compare the first N bytes of two values, if the first N bytes are the same, the shorter value is smaller
  2. DATETIME/TIME/DATE: Sorted by the point in time indicated
  3. BOOLEAN: false is less than true
  4. ARRAY: two arrays are equal if their lengths and values at each position are the same, if not, take the sorted result of the first element that is not equal, with the smallest empty element
  5. OBJECT: If two objects have the same KEY and the VALUE corresponding to the KEY are both the same, they are equal. Otherwise, the two are of unequal size, but the relative size is not specified.
  6. STRING: take the shorter of the two STRING length N, compare the first N bytes of the two values utf8mb4 encoding, the shorter one is smaller, the null value is the smallest
  7. INTEGER/DOUBLE: Includes comparison of exact and approximate values, slightly more complex and potentially counter-intuitive results, see Official Documents Related notes.

Any JSON value compared to SQL's NULL constant gives a result of UNKNOWN. For the comparison of JSON and non-JSON values, following a certain rules and regulations Convert non-JSON values to JSON values and then compare them according to the rules above.

summary

This article focuses on theMySQL (located) at5.7.7 The native introduced afterwardsJSON Supported features, Illustrates the introduction ofJSON Benefits of the type, and with specific examples ofMySQL (located) atJSON External interfaces on types and the new syntax introduced rules and regulations。 moreover, Also highlighted areJSON Storage structure on the hard disk, A brief analysis of the advantages of this storage architecture harmony deficiencies。 And finally, a presentation on Indexing of JSON principle, as well as comparing harmony sorted rules and regulations。 I believe that after understanding the content presented in this article, with respect toJSON The parts not mentioned in the text are also easier to understand。


Recommended>>
1、This early gastric cancer identification method has an accuracy rate of 934 Let medical big data give you the answers
2、AI bigwigs test the waters of mobility as Driveai where Enda Wu works is about to launch a selfdriving car service
3、Guangzhou Yami Technologys 7th anniversary and 2018 annual meeting was successfully held
4、Blockchain regulatory legislation takes first step Blockchain information service management regulations Highlights and Highlights Explained
5、Distributed Storage Network Design Difficulties How to Ensure DataLevel Isolation and Business QOS

    已推荐到看一看 和朋友分享想法
    最多200字,当前共 发送

    已发送

    朋友将在看一看看到

    确定
    分享你的想法...
    取消

    分享想法到看一看

    确定
    最多200字,当前共

    发送中

    网络异常,请稍后重试

    微信扫一扫
    关注该公众号