IT & HPC Glossary - Scalable Informatics

A

Ansible

An open-source IT automation tool for configuration management and orchestration, widely used for HPC cluster and data center provisioning.

B

BeeGFS

A high-performance parallel file system with distributed metadata, simple administration, and built-in high availability for HPC and AI workloads.

Burst Buffer

A fast intermediate storage layer using NVMe SSDs that absorbs I/O bursts between compute nodes and the parallel file system.

C

Ceph

An open-source distributed storage platform providing unified object, block, and file storage with automatic data distribution and self-healing capabilities.

CXL

Compute Express Link â€” an open interconnect standard enabling high-speed communication and memory sharing between CPUs, accelerators, and memory devices.

D

Data Lake

A centralized repository storing raw data in its native format at scale, supporting diverse analytics workloads from batch to real-time processing.

Deduplication

A data reduction technique that eliminates duplicate data blocks, significantly reducing storage consumption in backup and primary storage systems.

E

Erasure Coding

A data protection method that encodes data with redundant fragments across multiple locations, offering fault tolerance with less storage overhead than replication.

F

Fibre Channel

A high-speed, lossless networking protocol for Storage Area Networks operating at up to 64 Gbps with guaranteed delivery.

FLOPS

Floating Point Operations Per Second â€” the standard measure of computing performance for scientific and HPC workloads, from GFLOPS to EFLOPS.

G

GPFS

General Parallel File System (IBM Spectrum Scale) â€” an enterprise parallel file system with advanced data management features for HPC and AI.

GPU Accelerator

A massively parallel processor with thousands of cores, essential for AI training, scientific simulation, and high-performance computing workloads.

H

HCI

Hyperconverged Infrastructure â€” a software-defined IT framework that combines compute, storage, and networking in a single system for simplified management.

I

InfiniBand

A high-bandwidth, low-latency networking technology with native RDMA support, widely used in HPC clusters and AI training infrastructure.

IOPS

Input/Output Operations Per Second â€” a key storage performance metric measuring random read/write operation throughput.

iSCSI

Internet Small Computer Systems Interface â€” a protocol enabling block-level SAN storage access over standard TCP/IP Ethernet networks.

K

Kubernetes

An open-source container orchestration platform that automates deployment and scaling of containerized applications across clusters.

L

Latency

The time delay between an I/O request and its response, measured in microseconds for flash storage â€” a critical metric for real-time applications.

Lustre

An open-source parallel file system that powers the majority of the world's top supercomputers, designed for maximum throughput at scale.

M

Metadata

Data about data in storage systems â€” file attributes, directory structure, and permissions. Metadata performance is often the scalability bottleneck.

MPI

Message Passing Interface â€” the standard communication protocol for parallel applications on distributed HPC clusters.

N

NAND Flash

The non-volatile memory technology in SSDs, storing data in cells organized into pages and blocks with 3D stacking for increased density.

NVMe

Non-Volatile Memory Express â€” a high-performance storage protocol designed for flash SSDs, communicating directly over PCIe for maximum throughput and minimal latency.

NVMe-oF

NVMe over Fabrics â€” an extension of the NVMe protocol across network fabrics, enabling remote NVMe storage access with near-local performance.

O

Object Storage

A storage architecture managing data as objects with rich metadata, designed for massive scale and accessed via HTTP/S3 APIs.

P

Parallel File System

A distributed file system that stripes data across multiple servers, enabling thousands of clients to access storage concurrently with linear bandwidth scaling.

PCIe

Peripheral Component Interconnect Express â€” the high-speed serial bus standard connecting GPUs, NVMe SSDs, and network adapters to the CPU.

POSIX

Portable Operating System Interface â€” a standard defining file system behavior that ensures application compatibility across different storage systems.

Q

QLC NAND

Quad-Level Cell NAND flash storing four bits per cell, delivering the highest density and lowest cost per gigabyte for capacity-oriented storage.

R

RAID

Redundant Array of Independent Disks â€” a method of combining multiple drives for improved performance, redundancy, or both.

RDMA

Remote Direct Memory Access â€” a technology enabling direct memory-to-memory data transfer between computers without CPU involvement, critical for low-latency HPC networking.

RoCE

RDMA over Converged Ethernet â€” a protocol enabling high-performance RDMA communication over standard Ethernet networks.

S

SAN

Storage Area Network â€” a dedicated high-speed network providing block-level access to shared storage for databases, virtualization, and enterprise workloads.

SDS

Software-Defined Storage â€” an approach that separates storage intelligence from hardware, enabling storage services on commodity infrastructure.

Singularity

A container platform for HPC that runs without root privileges, enabling reproducible scientific computing on shared clusters.

SLURM

Simple Linux Utility for Resource Management â€” the dominant open-source job scheduler for HPC clusters and supercomputers.

Snapshot

A space-efficient, point-in-time copy of data using copy-on-write techniques, enabling rapid backup and recovery with minimal storage overhead.

T

Thin Provisioning

A storage technique that allocates disk space on demand rather than upfront, improving capacity utilization through overcommitment.

Tiered Storage

A strategy that automatically moves data between fast, medium, and archival storage tiers based on access frequency to optimize cost and performance.

W

Write Amplification

A flash storage phenomenon where physical writes exceed logical writes due to block-level erase requirements, affecting SSD endurance and performance.