October 29, 2020

Guide to LTO between Rust and C/C++

I use Rust in one of my research projects. The project was originally developed in C++, and because C++ is bad we decided to add new features primarily in Rust. Calling Rust from C++ is simple and easy (thanks to the excellent cxx project), but we soon find some performance regressions that didn’t appear in the C++ code. Specifically, we see many tiny Rust functions in flamegraph that wouldn’t surface up if written in C++. Read more

October 19, 2020

System research frustrations

… we see a thriving software industry that largely ignores research, and a research community that writes papers rather than software. – Rob Pike, Systems Software Research is Irrelevant I heard/experienced these from my system research daily. I write them done not to complain or express my anger; instead, I know many people suffer from the same feeling: you are not alone. These are common cases in system research, and the situation will unlikely to improve due to the nature of research prototyping. Read more

September 9, 2020

Install Perf on WSL2 (with unwind and symbols)

WSL2 don’t have perf and we can’t install it from Ubuntu apt because WSL2 has its own modified linux kernel (and perf requires a match with the kernel version). To install perf on WSL2, we need to clone the modified kernel and compile it with proper dependencies. sudo apt install flex bison gcc # Clone the kernel from MS repo git clone https://github.com/microsoft/WSL2-Linux-Kernel --depth 1 cd WSL2-Linux-Kernel/tools/perf # Optional dependencies to unwind stack and resolve symbols sudo apt install libnuma-dev libunwind-dev dwarfdump libdw-dev libelf-dev libiberty-dev make -j4 sudo cp perf /usr/local/bin Bonus I use the awesome flamegraph to automatically generate interactive flamegraph. Read more

July 31, 2020

Introducing the DB/Sys reading group

TL;DR I started an RSS-based telegram channel that collects news around database, system, programming language and architecture. Check out here: https://t.me/db_sys_reading Goal “Become a better DB/Sys researcher”, which involves the following sub-goals: 1. Actively synchronize with the industry. 2. Be aware of other research problems. 3. Familiar with the tools, tricks and hidden secrets (if any). What is this? This channel is managed by an RSS bot that automatically pulls the curated list of blog sources. Read more

July 1, 2020

Transaction Isolation levels

Summarize “A Critique of ANSI SQL Isolation Levels”, which I believe is one of the most important paper in the database research. ANSI SQL Isolation Level Phenomena Dirty read. A dirty read is the situation when a transaction reads a data that has not yet been committed. Non repeatable read. Non repeatable read occurs when a transaction reads the same row twice and get a different value each time. Read more

June 20, 2020

Resource Disaggregation

This post is less of a paper review but more of some random thoughts about resources disaggregation. The two papers are “Understanding the Effect of Data Center Resource Disaggregation on Production DBMSs” and “Rethinking Data Management Systems for Disaggregated Data Centers”. The paper are easy to follow and educational, I learned a lot from them. What is resource disaggregation? Hardware resources (CPU, main memory, GPU, SSD, HDD) are split into independently managed pools that are connected by a high-performance network fabric. Read more

May 18, 2020

Paper Review: IR and Compiler

Compilers are hot in this area, in this post I’ll review three representative papers that using compilers/IR to simplify and accelerate the system. In this review, I’ll focus on the story, i.e. the problem they tried to solve, rather than the tech details they employed to tune the performance. The goal of this review is to better understand the role of compilers/IR in data intensive systems. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning TVM is a huge system with multiple essential components. Read more

April 7, 2020

A view of async memory access in rust

Section 1: Background We have been using asyncio for years to hide the IO latency. Major high-level programming languages – except C++, which is expect to have coroutine in C++20 – have proper support for both language syntax (programmability) and user space scheduling (functionality). It’s common believe that coroutine has much smaller overhead than the operating system scheduler, but we don’t yet understand the potential of this “smaller overhead”. The reasons are two folds. Read more

March 30, 2020

Paper Review: tricky DB

This week we will discuss about system tools for database systems. The first paper talks about the potential applications when remapping the virtual and physical memory is possible, and I’ll present the topics about coroutine, which deserves a whole separate post (todo). RUMA has it: Rewired User-space Memory Access is Possible! We can divide the paper into two parts: the first part shows how to do user space remapping, the second part shows the potential usages. Read more

March 26, 2020

Scientific writing cheat sheet

.do { padding: 1em; border-left: 3px solid #c0caad; background: #f5f7fa; margin-bottom: 1em; } .dono{ padding: 1em; border-left: 3px solid #aa4465; background: #f5f7fa; margin-bottom: 1em; } b { font-weight: 500; } Scientific writing can be tough, but we can improve it with some tricks and principles. Here is a list of writing tips from the course Writing in the Sciences. I’ll keep adding new tips as I go through the course. Read more

Xiangpeng Hao 2020