Exploring Public Storage Traces

Author:Murphy  |  View: 26119  |  Time: 2025-03-22 23:07:28

Input and output (I/O) operations refer to the transfer of data between a computer's main memory and various peripherals. Storage peripherals such as HDDs and SSDs have particular performance characteristics in terms of latency, throughput, and rate which can influence the performance of the computer system they power. Extrapolating, the performance and design of distributed and cloud based Data Storage depends on that of the medium. This article is intended to be a bridge between Data Science and Storage Systems: 1/ I am sharing a few datasets of various sources and sizes which I hope will be novel for Data Scientists and 2/ I am bringing up the potential for advanced analytics in Distributed Systems.

Intro

Storage access traces are "a treasure trove of information for optimizing cloud workloads." They're crucial for capacity planning, data placement, or system design and evaluation, suited for modern applications. Diverse and up-to-date datasets are particularly needed in academic research to study novel and unintuitive access patterns, help the design of new hardware architectures, new caching algorithms, or hardware simulations.

Storage traces are notoriously difficult to find. The SNIA website is the best known "repository for storage-related I/O trace files, associated tools, and other related information" but many traces don't comply with their licensing or upload format. Finding traces becomes a tedious process of scanning the academic literature or attempting to generate one's own.

Popular traces which are easier to find tend to be outdated and overused. Traces older than 10 years should not be used in modern research and development due to changes in application workloads and hardware capabilities. Also, an over-use of specific traces can bias the understanding of real workloads so it's recommended to use traces from multiple independent sources when possible.

This post is an organized collection of recent public traces I found and used. In the first part I categorize them by the level of abstraction they represent in the IO stack. In the second part I list and discuss some relevant datasets. The last part is a summary of all with a personal view on the gaps in storage tracing datasets.

Type of traces

I distinguish between three types of traces based on data representation and access model. Let me explain. A user, at the application layer, sees data stored in files or objects which are accessed by a large range of abstract operations such as open or append. Closer to the media, the data is stored in a continuous memory address space and accessed as blocks of fixed size which may only be read or written. At a higher abstraction level, within the application layer, we may also have a data presentation layer which may log access to data presentation units, which may be, for example, rows composing tables and databases, or articles and paragraphs composing news feeds. The access may be create table, or post article.

While traces can be taken anywhere in the IO stack and contain information from multiple layers, I am choosing to structure the following classification based on the Linux IO stack depicted below.

I/O Stack Diagram (adapted from [1], [2] and [3])

Block storage traces

The data in these traces is representative of the operations at the block layer. In Linux, this data is typically collected with blktrace (and rendered readable with blkparse), iostat, or dtrace. The traces contain information about the operation, the device, CPU, process, and storage location accessed. The first trace listed is an example of blktrace output.

The typical information generated by tracing programs may be too detailed for analysis and publication purposes and it is often simplified. Typical public traces contain operation, offset, size, and sometimes timing. At this layer the operations are only read and write. Each operation accesses the address starting at offset and is applied to a continuous size of memory specified in number of blocks (4KiB NTFS). For example, a trace entry for a read operation contains the address where the read starts (offset), and the number of blocks read (size). The timing information may contain the time the request was issued (start time), the time it was completed (end time), the processing in between (latency), and the time the request waited (queuing time).

Available traces sport different features, have wildly different sizes, and are the output of a variety of workloads. Selecting the right one will depend on what one's looking for. For example, trace replay only needs the order of operations and their size; For performance analysis timing information is needed.

Disk access visualization with iowatcher (source)

Object storage traces

At the application layer, data is located in files and objects which may be created, opened, appended, or closed, and then discovered via a tree structure. From an user's point of view, the storage media is decoupled, hiding fragmentation, and allowing random byte access.

I'll group together file and object traces despite a subtle difference between the two. Files follow the file system's naming convention which is structured (typically hierarchical). Often the extension suggests the content type and usage of the file. On the other hand, objects are used in large scale storage systems dealing with vast amounts of diverse data. In object storage systems the structure is not intrinsic, instead it is defined externally, by the user, with specific metadata files managed by their workload.

Being generated within the application space, typically the result of an application logging mechanism, object traces are more diverse in terms of format and content. The information recorded may be more specific, for example, operations can also be delete, copy, or append. Objects typically have variable size and even the same object's size may vary in time after appends and overwrites. The object identifier can be a string of variable size. It may encode extra information, for example, an extension that tells the content type. Other meta-information may come from the range accessed, which may tell us, for example, whether the header, the footer or the body of an image, parquet, or CSV file was accessed.

Object storage traces are better suited for understanding user access patterns. In terms of block access, a video stream and a sequential read of an entire file generate the same pattern: multiple sequential IOs at regular time intervals. But these trace entries should be treated differently if we are to replay them. Accessing video streaming blocks needs to be done with the same time delta between them, regardless of the latency of each individual block, while reading the entire file should be asap.

Access traces

Specific to each application, data may be abstracted further. Data units may be instances of a class, records in a database, or ranges in a file. A single data access may not even generate a file open or a disk IO if caching is involved. I choose to include such traces because they may be used to understand and optimize storage access, and in particular cloud storage. For example, the access traces from Twitter's Memcache are useful in understanding popularity distributions and therefore may be useful for data formatting and placement decisions. Often they're not storage traces per se, but they can be useful in the context of cache simulation, IO reduction, or data layout (indexing).

Data format in these traces can be even more diverse due to a new layer of abstraction, for example, by tweet identifiers in Memcached.

Examples of traces

Let's look at a few traces in each of the categories above. The list details some of the newer traces – no older than 10 years – and it is by no means exhaustive.

Block traces

YCSB RocksDB SSD 2020

These are SSD traces collected on a 28-core, 128 GB host with two 512 GB NVMe SSD Drives, running Ubuntu. The dataset is a result of running the YCSB-0.15.0 benchmark with RocksDB.

The first SSD stores all blktrace output, while the second hosts YCSB and RocksDB. YCSB Workload A consists of 50% reads and 50% updates of 1B operations on 250M records. Runtime is 9.7 hours, which generates over 352M block I/O requests at the file system level writing a total of 6.8 TB to the disk, with a read throughput of 90 MBps and a write throughput of 196 MBps.

The dataset is small compared to all others in the list, and limited in terms of workload, but a great place to start due to its manageable size. Another benefit is reproducibility: it uses open source tracing tools and benchmarking beds atop a relatively inexpensive hardware setup.

Format: These are SSD traces taken with blktrace and have the typical format after parsing with blkparse: [Device Major Number,Device Minor Number] [CPU Core ID] [Record ID] [Timestamp (in nanoseconds)] [ProcessID] [Trace Action] [OperationType] [SectorNumber + I/O Size] [ProcessName]

259,2    0        1     0.000000000  4020  Q   R 282624 + 8 [java]
259,2    0        2     0.000001581  4020  G   R 282624 + 8 [java]
259,2    0        3     0.000003650  4020  U   N [java] 1
259,2    0        4     0.000003858  4020  I  RS 282624 + 8 [java]
259,2    0        5     0.000005462  4020  D  RS 282624 + 8 [java]
259,2    0        6     0.013163464     0  C  RS 282624 + 8 [0]
259,2    0        7     0.013359202  4020  Q   R 286720 + 128 [java]

Where to find it: http://iotta.snia.org/traces/block-io/28568

License: SNIA Trace Data Files Download License

Alibaba Block Traces 2020

The dataset consists of "block-level I/O requests collected from 1,000 volumes, where each has a raw capacity from 40 GiB to 5 TiB. The workloads span diverse types of cloud applications. Each collected I/O request specifies the volume number, request type, request offset, request size, and timestamp."

Limitations (from the academic paper)

  • the traces do not record the response times of the I/O requests, making them unsuitable for latency analysis of I/O requests.
  • the specific applications running atop are not mentioned, so they cannot be used to extract application workloads and their I/O patterns.
  • the traces capture the access to virtual devices, so they are not representative of performance and reliability (e.g., data placement and failure statistics) for physical Block Storage devices.

A drawback of this dataset is its size. When uncompressed it results in a 751GB file which is difficult to store and manage.

Format: device_id,opcode,offset,length,timestamp

  • device_idID of the virtual disk, uint32
  • opcodeEither of ‘R' or ‘W', indicating this operation is read or write
  • offsetOffset of this operation, in bytes, uint64
  • lengthLength of this operation, in bytes, uint32
  • timestampTimestamp of this operation received by server, in microseconds, uint64
419,W,8792731648,16384,1577808144360767
725,R,59110326272,360448,1577808144360813
12,R,350868463616,8192,1577808144360852
725,R,59110686720,466944,1577808144360891
736,R,72323657728,516096,1577808144360996
12,R,348404277248,8192,1577808144361031

Additionally, there is an extra file containing each virtual device's id device_id with its total capacity.

Where to find it: https://github.com/alibaba/block-traces

License: CC-4.0.

Tencent Block Storage 2018

This dataset consists of "216 I/O traces from a warehouse (also called a failure domain) of a production cloud block storage system (CBS). The traces are I/O requests from 5584 cloud virtual volumes (CVVs) for ten days (from Oct. 1st to Oct. 10th, 2018). The I/O requests from the CVVs are mapped and redirected to a storage cluster consisting of 40 storage nodes (i.e., disks)."

Limitations:

  • Timestamps are in seconds, a granularity too little for determining the order of operations. As a consequence many requests appear as if issued at the same time. This trace is therefore unsuitable for queuing analysis.
  • There is no latency information about the duration of each operation, making the trace unsuitable for latency performance, queuing analytics.
  • No extra information about each volume such as total size.

Format: Timestamp,Offset,Size,IOType,VolumeID

  • Timestamp is the Unix time the I/O was issued in seconds.
  • Offset is the starting offset of the I/O in sectors from the start of the logical virtual volume. 1 sector = 512 bytes
  • Size is the transfer size of the I/O request in sectors.
  • IOType is "Read(0)", "Write(1)".
  • VolumeID is the ID number of a CVV.
1538323200,12910952,128,0,1063
1538323200,6338688,8,1,1627
1538323200,1904106400,384,0,1360
1538323200,342884064,256,0,1360
1538323200,15114104,8,0,3607
1538323200,140441472,32,0,1360
1538323200,15361816,520,1,1371
1538323200,23803384,8,0,2363
1538323200,5331600,4,1,3171

Where to find it: http://iotta.snia.org/traces/parallel/27917

License: NIA Trace Data Files Download License

K5cloud Traces 2018

This dataset contains traces from virtual Cloud Storage from the FUJITSU K5 cloud service. The data is gathered during a week, but not continuously because " one day's IO access logs often consumed the storage capacity of the capture system." There are 24 billion records from 3088 virtual storage nodes.

The data is captured in the TCP/IP network between servers running on hypervisor and storage systems in a K5 data center in Japan. The data is split between three datasets by each virtual storage volume id. Each virtual storage volume id is unique in the same dataset, while each virtual storage volume id is not unique between the different datasets.

Limitations:

  • There is no latency information, so the traces cannot be used for performance analysis.
  • The total node size is missing, but it can be approximated from the maximum offset accessed in the traces.
  • Some applications may require a complete dataset, which makes this one unsuitable due to missing data.

The fields in the IO access log are: ID,Timestamp,Type,Offset,Length

  • ID is the virtual storage volume id.
  • Timestamp is the time elapsed from the first IO request of all IO access logs in seconds, but with a microsecond granularity.
  • Type is R(Read) or (W)Write.
  • Offset is the starting offset of the IO access in bytes from the start of the virtual storage.
  • Length is the transfer size of the IO request in bytes.
1157,3.828359000,W,7155568640,4096
1157,3.833921000,W,7132311552,8192
1157,3.841602000,W,15264690176,28672
1157,3.842341000,W,28121042944,4096
1157,3.857702000,W,15264718848,4096
1157,9.752752000,W,7155568640,4096

Where to find it: http://iotta.snia.org/traces/parallel/27917

License: CC-4.0.

Object traces

Server-side I/O request arrival traces 2019

This repository contains two datasets for IO block traces with additional file identifiers: 1/ parallel file systems (PFS) and 2/ I/O nodes.

Notes:

  • The access patterns are resulting from MPI-IO test benchmark ran atop of Grid5000, a large scale test bed for parallel and High Performance Computing (HPC). These traces are not representative of general user or cloud workloads but instead specific to HPC and parallel computing.
  • The setup for the PFS scenario uses Orange FS as file system and for the IO nodes I/O Forwarding Scalability Layer(IOFSL). In both cases the scheduler was set to AGIOS I/O scheduling library. This setup is perhaps too specific for most use cases targeted by this article and has been designed to reflect some proposed solutions.
  • The hardware setup for PFS consists of our server nodes with 600 GB HDDs each and 64 client nodes. For IO nodes, it has four server nodes with similar disk configuration in a cluster, and 32 clients in a different cluster.

Format: The format is slightly different for the two datasets, an artifact of different file systems. For IO nodes, it consists of multiple files, each with tab-separated values Timestamp FileHandle RequestType Offset Size. A peculiarity is that reads and writes are in separate files named accordingly.

  • Timestamp is a number representing the internal timestamp in nanoseconds.
  • FileHandle is the file handle in hexadecimal of size 64.
  • RequestType is the type of the request, inverted, "W" for reads and "R" for writes.
  • Offset is a number giving the request offset in bytes
  • Size is the size of the request in bytes.
265277355663    00000000fbffffffffffff0f729db77200000000000000000000000000000000        W       2952790016      32768
265277587575    00000000fbffffffffffff0f729db77200000000000000000000000000000000        W       1946157056      32768
265277671107    00000000fbffffffffffff0f729db77200000000000000000000000000000000        W       973078528       32768
265277913090    00000000fbffffffffffff0f729db77200000000000000000000000000000000        W       4026531840      32768
265277985008    00000000fbffffffffffff0f729db77200000000000000000000000000000000        W       805306368       32768

The PFS scenario has two concurrent applications, "app1" and "app2", and its traces are inside a folder named accordingly. Each row entry has the following format: [] REQ SCHED SCHEDULING, handle:, queue_element: , type: , offset: , len: Different from the above are:

  • RequestType is 0 for reads and 1 for writes
  • QueueElement is never used and I believe it is an artifact of the tracing tool.
[D 01:11:03.153625] REQ SCHED SCHEDULING, handle: 5764607523034233445, queue_element: 0x12986c0, type: 1, offset: 369098752, len: 1048576 
[D 01:11:03.153638] REQ SCHED SCHEDULING, handle: 5764607523034233445, queue_element: 0x1298e30, type: 1, offset: 268435456, len: 1048576 
[D 01:11:03.153651] REQ SCHED SCHEDULING, handle: 5764607523034233445, queue_element: 0x1188b80, type: 1, offset: 0, len: 1048576 
[D 01:11:03.153664] REQ SCHED SCHEDULING, handle: 5764607523034233445, queue_element: 0xf26340, type: 1, offset: 603979776, len: 1048576 
[D 01:11:03.153676] REQ SCHED SCHEDULING, handle: 5764607523034233445, queue_element: 0x102d6e0, type: 1, offset: 637534208, len: 1048576 

Where to find it: https://zenodo.org/records/3340631

License: CC-4.0.

IBM Cloud Object Store 2019

These are anonymized traces from the IBM Cloud Object Storage service collected with the primary goal to study data flows to the object store.

The dataset is composed of 98 traces containing around 1.6 Billion requests for 342 Million unique objects. The traces themselves are about 88 GB in size. Each trace contains the REST operations issued against a single bucket in IBM Cloud Object Storage during a single week in 2019. Each trace contains between 22,000 to 187,000,000 object requests. All the traces were collected during the same week in 2019. The traces contain all data access requests issued over a week by a single tenant of the service. Object names are anonymized.

Some characteristics of the workload have been published in this paper, although the dataset used was larger:

  • The authors were "able to identify some of the workloads as SQL queries, Deep Learning workloads, Natural Language Processing (NLP), Apache Spark data analytic, and document and media servers. But many of the workloads' types remain unknown."
  • "A vast majority of the objects (85%) in the traces are smaller than a megabyte, Yet these objects only account for 3% of the of the stored capacity." This made the data suitable for a cache analysis.

Format: