In this article, we will look at the concept of data stream management, how it works, and the main features of the system. We also look at the data stream model and continuous query processing.
Data stream management issues
Today in computer networks, a very fast-growing number of network applications, the volume, and speed of network traffic. As a consequence, computer networks have several problems associated with the optimization of network traffic, network infrastructure, network performance, and more. An example of this would be a fairly common situation where the system’s incoming data exceeds the ability to process them.
When such a problem arises, if nothing is done, due to the limitation of incoming data, the size of the queues on the network lines will greatly increase and will eventually exceed the size of the buffers of the corresponding means of communication. When this happens, units of data arriving at nodes for which there is no free buffer space will be discarded and later retransmitted. This results in a problem where, as the incoming load increases, the actual throughput decreases and the transmission delays become extremely high.
The Data Stream Management System is designed precisely to prevent such failures and to facilitate uninterrupted streaming data exchange.
Features of data stream models
The concept of “data streams” is relatively recent, and the field is developing very rapidly.
A data flow is a sequence of ordered (timestamped or not) pieces of information continuously arriving in real-time. As a rule, data streams arrive at a very fast rate and it is very difficult to organize their arrival, and because of memory limitations, it becomes difficult to store streams as a whole.
The data flow model is somewhat different from the usual relational data model, and has its features:
- Data items in a stream arrive in real-time
- The system has no control over the ordering of data stream elements
- The streams are not limited in size
- Once a data stream element has been processed, it is excluded from the stream or is archived, and it must be stored in memory for later retrieval
Principle of the data stream management system
DCMS differs slightly from traditional database management systems in that, in DCMS, you can make continuous queries concerning continuous data streams that enter and leave the system in real-time, with only data stored in RAM for the duration of the processing and various data, such as stock exchange data or network traffic, can enter the system.
The general principle of DCMS can be described as follows:
- The input monitor regulates the intensity of the input, and if the system cannot handle large incoming streams some data is omitted.
- Data is typically stored in three memory sections: temporary working memory, result memory, and static memory, where metadata is stored.
- Continuous queries are logged in the query repository, and along with that, one-time queries on the previous state of the thread can also be executed.