Relational algebra in databases: operations, examples

Relational algebra in databases: operations, examples
Relational algebra in databases: operations, examples
Anonim

Generally, database systems are equipped with a query language that can help its users query instances. There are two such types - relational algebra and relational calculus. The first is a procedural query language that takes relationship instances as input and outputs instances of relationships as output. Uses unary or binary calculus for this. Relational algebra is performed recursively and intermediate results are treated as relations.

Relational algebra
Relational algebra

Cartesian product (Χ)

Combines information from two different relationships into one.

Notation – r Χ s, where r and s are ratios and their output will be defined as

r Χ s={qt | q ∈ r and t ∈ s}.

Conclusion. Sets a relationship that shows all books and articles written with the tutorial.

Rename operation (ρ).

The relation of relational algebra is the results, but without any name. The rename operation allows you to change the output value, denoted by the small Greek letter ρ.

Designation – ρ x (E), where the result of the expression E is stored with the namex.

Additional operations:

  • set intersection;
  • assignment;
  • natural connection.

Relational calculus

This is a non-procedural query language, meaning it tells you what to do but doesn't explain how to implement it. The relational calculus exists in two forms:

  • correlation calculus of a tuple;
  • filtering variable ranges.

Notation - T/State: Returns all T tuples that satisfy a condition. Result. Returns tuples with a name. TRC can be quantified. You can use existential (∃) and universal quantifiers (∀). Conclusion. The above query will give the same result as the previous one.

Domain relational calculus DRC

The filter variable uses the attribute domain instead of the integer values of the tuple (as done in the TRC mentioned above).

Notation – {a 1, a 2, a 3, …, a | P (a 1, a 2, a 3, …, a)}, where a1, a2 are attributes and P denotes formulas built with internal values.

Conclusion. Sets the article, page, and topic from the TutorialsPoint relation, where subject is the database.

Like TRC, DRC can also be written using existential and universal quantifiers. DRC also includes relational algebra operators. The strength of the expression of calculation, calculus and correlation of relations between points is equivalent.

relationalrelational algebra data model
relationalrelational algebra data model

Variations and schemes of relational calculus and algebra

The ER model, when conceptualized in diagrams, provides a good overview of essential relationships that are easier to understand. Schematic representations can be mapped to a relational schema, i.e. they can be created together with each other. It is not possible to import all ER constraints into a relational model, but an approximate structure can be generated. There are several processes and algorithms available for converting charts to this system. Some of them are automated, while others are created manually. ER charts mainly consist of the following criteria:

  • entity and its attributes;
  • link, which is an association between the above values.

Comparison of objects and relations occurs in different ways and schemes. For example, an entity is a real world object with some attributes. The matching process, the algorithm is as follows:

  • create a table for each object;
  • attributes should become table fields with corresponding data types;
  • declare a primary key.

A relationship is an association between entities. The compilation process is as follows:

  • create a table for relations;
  • add primary keys of all participating entities as table fields with appropriate data types;
  • if relation has any attribute, set each attribute as table field;
  • combine the primary key that makes up everythingthe rest for participating objects;
  • specify all foreign key constraints.

Display of weak sets and hierarchical objects occurs according to a certain system. First of all, it is necessary to understand the essential foundations and definitions of these values. A weak feature set is one that does not have any primary key associated with it. The display process is as follows:

  • create a table for a weak set of objects;
  • add all attributes to the schema as a field;
  • specify the primary key for identification;
  • set all foreign key constraints.

Display of hierarchical objects based on specialization or generalization of the language of relational algebra occurs in the form of sequential entities. The algorithm is as follows:

  • create tables for all higher lower level objects;
  • add primary keys;
  • at a low level implement all other attributes of lower level objects;
  • declare the primary keys of the table;
  • set foreign key constraints.
Relational Algebra Operations
Relational Algebra Operations

Existing options for describing, storing, changing information

SQL is a programming language for relational databases. It is developed over algebra and correlation calculus of tuples. SQL comes as a package with all major DBMS distributions. Contains both data and languages for manipulating them. By using relational algebra SQL data definition properties, you can design and modify the database schema,while the management and adjustment properties, as well as data changes, allow you to store and retrieve information installed in the system. Uses the following set of commands to define structure and system:

  • creates new databases, tables and views from a DBMS.
  • throws commands.
  • changes the database schema.
  • this command adds an attribute to a string object.

SQL is equipped with a Data Manipulation Language (DML). It modifies the database instance by inserting, updating, and deleting information. DML is responsible for changing all data. SQL contains the following set of commands in the DML section:

  1. SELECT is one of the basic query commands. It is analogous to the projection operation of relational algebra. It selects attributes based on the condition described in the WHERE clause.
  2. FROM - This section takes a name as an argument from which the attributes are to be selected/projected. In case more than one name is given, this item corresponds to the Cartesian product.
  3. WHERE - This section specifies the predicate or conditions that must be met in order to qualify the projected attribute.

There are also commands:

  • insert;
  • changing values;
  • delete.
Relational algebra in databases
Relational algebra in databases

Creating Relational Algebra Queries

When constructing a search, the task is to find a structure of operations that will lead to the correct output. The basic operations of relational algebra are simpleoperations with one or two relations as operands. The combined effects of the sequence determine the final result. Since the system of relational algebra in databases is quite simple, many intermediate results can be obtained before reaching the final output, they are also used as operands that produce new data received.

For most operators, the order of the queries and their execution does not matter, which means that the same output can be achieved by shaping and combining intermediate data in different ways. In practice, database searches are fairly easy. The system for performing operations and intermediate results is determined by the query optimizer. When forming questions, requirements, you need to

first choose which relationships are necessary to achieve an answer, and then specify operations and intermediate results. The structure of a relational algebra query in a results database can be represented as a diagram. Requirements optimizers try to organize execution as efficiently as possible. In practice, this usually means that they try to minimize intermediate results as quickly as possible. Common examples of relational algebra will help with this.

Example 1.

Information Need: Information on 1996 model year vehicles where deficiencies were found during inspection for 1999.

First, information about the cars is displayed in order to understand the values of all attributes of the relation. Information about inspections is stored in the "Inspection" table, and if detectedfaults, they are recorded in the "Problem" table. Thus, these three tables are needed to get the required information.

Only 1996 cars are interesting. The model range of the vehicle is represented as the value of the set attribute in the row of the vehicle information table. The first intermediate result consists of tuples representing the 1996 variants.

So only rows that cover this period are needed. You need to use a selection to extract them. Now there are cars and inspections that were required. The strings are then concatenated using the concatenation operation. They must be joined by a common register number, since it is the only common column, a natural join is used.

To find out if there were problems during the checks, you need to associate problem lines with the check. After connecting control rows to cars, you can connect this result to the fault table. The affiliation must be based on the common registration number and the verified date. These are the only common columns in the tables, so a natural join is used.

Relational algebra is a language
Relational algebra is a language

Calculation options without intermediate results

Example 2.

Required Information: Driver's name for model year 1995 or older vehicles that have not been tested for 2000. The name is in the "Driver" table. Law enforcement agencies are described in the table "Inspection and cars in a canteen car". SoThus, these three tables are needed. First, you need to find out the cars that were not inspected for the year 2000. It is not possible to solve this problem using only the inspections listed in the table, because it contains data about those inspections that were made, and not about those that were not implemented. This problem is solved by looking for complementary cars that are checked before the year 2000. In fact, only their registration numbers are needed.

There are other examples besides the ones above that show how to change or find information. Query variants can be optimized using special operations. In fact, to make searching and finding data as easy and simple as possible, there is a relational calculus model.

Where information is secured and protected

The relational data model of relational algebra is stored in file formats containing records. At the physical level, the actual information is fixed in an electromagnetic format on some device. These storage devices can be divided into three categories:

  1. Primary. This category includes memory that is directly accessible to the CPU. Registers, fast memory (cache) and main memory (RAM) are directly accessible to the central, since they are all located on the motherboard or chipset. This storage is typically very small, ultra-fast, and unstable. A constant power supply is required to maintain the state. If it fails, all of its data is lost.
  2. Secondary. Used to store information for the futureuse or backup. Includes memory devices that are not part of the processor chipset or motherboard, such as magnetic disks, optical disks (DVDs, CDs, etc.), hard drives, flash drives, and magnetic tapes.
  3. Tertiary. Used to store huge amounts of data. Since such storage devices are external to the computer system, they are the slowest in terms of speed. These storage gadgets are mainly used to back up the entire system. Optical discs and magnetic tapes are widely used as tertiary storage.

Special relational algebra operations are important for query efficiency.

Storage structure

The computer system has a well-defined memory hierarchy. The CPU has direct access to the main system as well as built-in registers. Main memory access time is obviously less than processor speed. To minimize this discrepancy, a cache is introduced. The cache provides the fastest access times and contains the data that is most frequently accessed by the CPU.

The memory with the fastest access is the most expensive. Large storage devices provide little speed and are cheaper, but they can store huge amounts of data compared to a processor register or cache.

Magnetic and hard drives are the most common secondary storage devices in today's computer systems. They are called magneticmetal base. These disks are placed vertically on the spindle. The read/write head moves between them and is used to magnetize or remove such a spot underneath. It can be recognized as 0 (zero) or 1 (one).

Hard drives are formatted in a well-defined order for efficient data storage. It has many concentric circles called paths. Each track is further divided into sectors, typically storing 512 bytes of data.

SQL relational algebra
SQL relational algebra

File operations

Operations on the relational algebra language system and its database can be broadly classified into two categories:

  • update;
  • search.

The first category changes data values by inserting, deleting or updating. On the other hand, search operations do not edit information, but extract it after optional conditional filtering. In both types of operations, selection plays a significant role. In addition to creating and deleting a file, there may be several operations that can be performed on them:

  1. Open - exists in one of two read or write modes. In the first case, the operating system does not allow anyone to change the data. In other words, the data is only read. Files opened in read mode can be shared between multiple objects. The write mode allows you to change the data. Files can be read but cannot be shared.
  2. Close is the most important operation from an operating system point of view as it removes all locks(if in shared mode), saves data (if modified) to secondary media, and releases all buffers and handlers associated with the file.
  3. Indexing is an information structure method for efficiently extracting records from the files of a system based on some attributes where the system was implemented. Defined based on attributes.

Indexing can be of the following type:

  1. Primary is defined in the ordered data file. The information file is organized in the key field.
  2. Secondary index generated from a field that is a candidate key and has a unique value in each record or not a key with duplicate values.
  3. Clustering is defined in an ordered data file, in a non-key field.
relational algebra relational calculus
relational algebra relational calculus

A database management system or DBMS refers to a technology for storing and retrieving user information with maximum efficiency along with appropriate security measures. A closer examination of this issue leads to the conclusion that relational algebra is a language of operators that take relations as arguments and return them as a result.