Do you believe that most of the modern computers are still using the architecture of 1945? In this article, we will discuss the Von Neumann’s Bottleneck and the way Oracle Exadata designed to overcome this, which makes “Oracle Exadata” a truly unique disruptive technology in the market. Come, let’s start the journey!

Introduction

In the modern digital era, data is enormous and storing, securing & accessing is tedious. We are building supercomputers and are scaling them vertically with high powerful CPUs, Memory, SSD Disks, In-Memory features etc. And still, the effect of Von Neumann’s Bottleneck makes our processing power reduced to 50%. The alternative solution to overcome this performance bottleneck is Remote Direct Memory Access (RDMA). We have various technologies like InfiniBand (IB), iWARP & RoCE, to support RDMA. But the machines of the entire stack must be designed to communicate using RDMA, and that’s exactly how Oracle Exadata is built.

The Von Neumann’s Bottleneck

John von Neumann was a Hungarian-American mathematician, physicist, computer scientist, engineer and polymath who worked on the “Manhattan Project” during World-War II. The von Neumann architecture is a computer architecture based on a 1945 description by John von Neumann and others in the “First Draft of a Report on the EDVAC” which is the base for electronic digital computers.

The von Neumann bottleneck is the idea that computer system throughput is limited due to the processors’ relative ability compared to top data transfer rates. According to this description of computer architecture, a processor is idle for a certain amount of time while memory is accessed. As per Moor’s Law, the capacity of the processor gets doubled in every two years. Similarly, the memory architecture evolved from Magnetic Tapes to Flash Memory. But the Evolution of CPU and memory is not synchronous with each other. So, we still have the bottleneck occurring during the transfer of data from the memory to the CPU and vice versa, limiting computation performance.

The Bottleneck for Database Performance

The Database (DB) Application is running on top of the Operating System (OS) Layer, and the performance of the database depends on the underlying OS and hardware. The application executes the query to the DB, and the DB then parses the query and requests data blocks from the OS and the storage. The OS and the storage layer pull the entire table data and the processor transfers the data into the DB memory, which filters and formats the output. While Oracle is making efforts to tune its database performance through various new features, it was also working in parallel to make its underlying OS and Hardware optimized for the Oracle Database. The result is Oracle Exadata.

Oracle Exadata

Oracle Exadata is the computing platform optimized for Oracle Databases. Exadata V1 was launched in 2008 and its latest release X8 was launched in April 2019 (rectified thanks to Gavin’s comment). It can be provisioned with up to 912 CPU Cores, 28.5 TB memory per rack along with 27 TB of persistent memory and 3.0 PB of disk capacity. But Exadata is not just a piece of massive hardware. Instead, it is an engineering masterpiece in which all layers (storage, compute, network, Clusterware & DB) work collaboratively to achieve better performance, cost-effectiveness, and availability for the Oracle databases. Oracle Exadata overcomes the Von-Neumann bottleneck in two approaches:

  1. Storage Server Enhancements
  2. RDMA Enhancements

Storage Server Enhancements

In Traditional DB systems, the database performance depends on the OS and hardware. Performance bottlenecks will impact it as the DB server CPU performs both data processing and data transfer. Whereas, the secret sauce of Oracle Exadata exists in its high-performance storage server. Many of Oracle Exadata’s features like cell-offloading, smart scan, storage indexes, hybrid columnar compression (HCC), Exadata flash cache work at the storage server layer. All these features significantly reduce the workload at the compute server (DB Server). It will be more elaborative once we talk about these features in detail. So, we shortly have a look at them:

  • Cell Offload – Offloads resource-intensive workloads to the storage servers. CPU usage of the compute server gets reduced.
  • Smart Scan – Offloads SQL processing to storage servers. The storage server will collect the data from underlying data blocks, filter only the requested data, and send it to the DB instance. The DB instance will then deliver the result to the end-user without much usage of the DB server CPU.
  • Storage Indexes – They are the capability to avoid unnecessary I/O operations. The storage index, maintained in-memory at the storage server, tracks the summary information for table columns in a storage region. So only the rows with specified column values are returned to the DB instance.
  • HCC – This capability enables dramatic reductions in storage for large databases up to 10x. HCC provides substantial cost-savings and performance improvements due to reduced I/O, especially for analytic workloads.
  • Flash Cache – Caches database block writes using Exadata Write Back. Write caching eliminates disk bottlenecks in large scale OLTP and batch workloads and helps achieve 6.5 Million 8K flash write I/O operations per second (IOPS).

RDMA Enhancements

Storage server enhancements discussed so far, significantly reduce the CPU overhead on the compute server. But the performance bottleneck will happen during the data transfer between storage servers and compute servers. Remote Direct Memory Access (RDMA) resolves this issue by bypassing the network and I/O stack, eliminating expensive CPU interrupts and context switches, and reducing latency by 10x, from 200µs to less than 19µs.

RDMA over InfiniBand – Up until Exadata version X8 used InfiniBand for RDMA between hosts. RDMA is an integral part of the Exadata high-performance architecture. It has been tuned and enhanced over the past decade, underpinning several Exadata-only technologies such as Exafusion Direct-to-Wire Protocol and Smart Fusion Block Transfer.

The Oracle Exadata Product Development Blog “Introducing Exadata X8M” brief about the various Features of RDMA is listed below.

Exafusion Direct-to-Wire Protocol uniquely allows database processes to read and send Oracle Real Applications Cluster (Oracle RAC) messages directly over the InfiniBand network using RDMA. This bypasses the OS kernel and the networking software overhead. The Exafusion Direct-to-Wire Protocol improves the response time and scalability of the Oracle RAC OLTP configurations on Oracle Exadata Database, especially for workloads with high-contention updates. More than half of remote reads are for Undo Blocks to satisfy read consistency in some OLTP workloads. Exadata uniquely leverages ultra-fast RDMA to read Undo Blocks from other database instances, further improving OLTP performance.

Smart Fusion Block Transfer capability uniquely improves the performance of a RAC OLTP configuration by eliminating the impact of redo log write latency, especially when hot blocks need to be transferred between sending and receiving nodes. The block is transferred as soon as the I/O to the redo log is issued at the sending node, without waiting for it to complete.

RDMA Over Converged Ethernet (RoCE) – The Exadata X8M release provides the next generation in ultra-fast cloud-scale networking fabric, RDMA over Converged Ethernet (RoCE). As the RoCE API infrastructure is identical to InfiniBand’s, all existing Exadata performance features are also available on RoCE. Of course, RoCE provides additional features as well.

Easy to Communicate RoCE’s protocols enables InfiniBand RDMA software to run on top of ethernet. This allows the same software to be used at the network protocol stack’s upper levels while transporting the InfiniBand packets across ethernet as UDP over IP at the lower level.

Congestion Control – Exadata RoCE Network fabric provides transparent prioritization of traffic by type, ensuring the best performance for critical messages requiring the lowest latency. Low latency messages such as cluster heartbeat, transaction commits and cache fusion, are not slowed by higher throughput messages (such as backups, reporting or batch messages).

Lossless Communication – Exadata RoCE Network also optimizes communications by ensuring that packets are delivered on the first try without costly retransmissions. Exadata RoCE avoids packet drops by utilizing RoCE protocols to manage the traffic flow, requesting the sender to slow down if the receiver’s buffer is full.

Instance Failure Detection – Exadata X8M Instant Failure Detection is not affected by the OS or CPU response times, as it uses the hardware-based RDMA to confirm the server responded quickly. Four RDMA reads are sent to the suspect server across all combinations of source/target ports. If all four reads fail, the server is evicted from the cluster. If a port responds, the hardware is available, even if the software is slow.

Persistent Memory Acceleration – Persistent memory (PMEM) is a new technology, adding a distinct storage tier of performance, capacity and price between DRAM and Flash. For the Exadata X8M release, 1.5 TB of persistent memory is added to High Capacity and Extreme Flash Storage Servers. Persistent memory enables reads at memory speed and ensures that writes survive any power failures that may occur. In combination with the new RoCE 100Gb/s Network Fabric, smart Exadata System Software can fully leverage persistent memory benefits on remote storage servers via specialized data and commit accelerators. PMEM Data Accelerator uses RDMA instead of I/O to read remote persistent memory that bypasses the network, I/O software, interrupts, and context switches latency. With PMEM Commit Accelerator, Oracle Database 19c can directly place the redo log record in persistent memory on multiple storage servers using RDMA which provides up to 8x faster redo log writes.

Oracle Exadata Product Development Blog – Introducing Exadata X8M

Summary

Recall Von-Neumann performance bottleneck necessitates new technologies, including hardware and software enhancements to achieve a high-performance compute’s vertical scaling demands. Oracle Exadata was engineered by seamlessly integrating new technologies, leveraging unique co-designed hardware and database-aware software, to increase further the advantages of the flagship platform for running the Oracle Database. Unique algorithms and protocols in Exadata implement database intelligence in storage, compute and networking to deliver higher performance and capacity at lower costs than other platforms. The Features of Exadata Storage Server and RDMA over Converged Ethernet (RoCE) and Persistent Memory (PMEM) implementations improve the performance and make Exadata X8M the best platform to run Oracle Database.

References

https://www.oracle.com/a/ocom/docs/engineered-systems/exadata/exadata-x8m-2-ds.pdf

https://upload.wikimedia.org/wikipedia/commons/0/00/Moore%27s_Law_Transistor_Count_1970-2020.png

https://en.wikipedia.org/wiki/Oracle_Exadata

https://flashdba.com/history-of-exadata/

https://en.wikipedia.org/wiki/Von_Neumann_architecture

https://en.wikipedia.org/wiki/John_von_Neumann

https://www.techopedia.com/definition/14630/von-neumann-bottleneck#:~:text=The%20von%20Neumann%20bottleneck%20is,time%20while%20memory%20is%20accessed.

https://www.oracle.com/technetwork/server-storage/networking/documentation/o12-020-1653901.pdf

https://blogs.oracle.com/exadata/exadata-x8m