January 20, 2020

Paper Review: Concurrency Control

FOEDUS: OLTP Engine for a Thousand Cores and NVRAM This is a very high-level paper about the overview of a complex system, yet it fits into the category: build a new system using existing components. The paper tried to address two sub topics: 1) database systems that scale to 1000 cores, 2) database systems on NVRAM. Although the paper claimed to solved the issues, I personally did not enjoy reading it because I didn’t see a clear logic flow on solving problems. Read more

January 13, 2020

Paper Review: Database System Architectures

In a nutshell: data is migrating from disk to memory, and how can we design new system to fit this trend. The End of an Architectural Era (It’s Time for a Complete Rewrite) (2007) In this paper, the authors tried to convince that the old one-fit-all database architecture is no longer fit for the emerging hardwares, database researchers should start with new empty sheets of paper and focus on tomorrow’s requirements. Read more

December 15, 2019

My Privacy Preserved Smart Home

I’ve been looking for an ultimate smart home solution for a while, yet none of them fit my needs. In an ideal world, a smart home will have the following features: What happens in my room stays in my room. Privacy is my first concern when considering a smart home. I don’t trust big evil companies. Thus any products from Google or Amazon lose the competition. It should be highly customizable. Read more

November 4, 2019

Is CLWB actually implemented?

TLDR: No. clwb is just an alias of clflushopt on Cascadelake. What is clwb, clflushopt, clflush? It sounds crazy, but before clflush there isn’t a instruction on Intel x86 platform that can explicitly evict a cacheline. In other words, applications has no control of when their data should be flushed to memory. So Intel came up with their own solution, namely clflush (cache line flush), which flush a cache line. Read more

August 25, 2019

Photos: Sunday Afternoon @SFU

Disclaimer: I’m a novice photographing hobbyist, who can’t afford expensive lens/cameras, and lazy enough not to practice photographing skills. Bird @Bay4 Undefined Building Undefined Building Undefined Building Running A peek into Vancouver Wanna try? National Flag

August 24, 2019

[Debugger] Memory Visualizer?

For months I’ve been imaging how beautiful my life can be if lldb (or less likely gdb) has something called memory visualizer. As my core research goal is to design efficient data structures that fits any workload on any devices. I frequently need to check how my data structure really looks like in the memory, and how it grows/shrimps on certain access pattern. What’s more, a memory visualizer can be extremely helpful when debugging concurrency bugs, because you never know where the bugs is, and a visualizer just adds much much more insights than breakpoints. Read more

August 14, 2019

How fast is Intel DC Persistent Memory Module?

TL;DR: Slow, SSD-level. More details checkout this paper. I only measured write performance, using the tool pqos-os by Intel. System Configuration Item Spec CPU Intel® Xeon® Gold 6252 CPU @ 2.10GHz * 2 DRAM 2666 MHz - 6 * 32 GB * 2 Intel DCPMM 2666 MHz - 4 * 128 GB * 2 Linux Distro/Kernel Arch Linux - 5. Read more

August 9, 2019

What's it like to program on a $26k computer

Our new server (left bottom) just arrived today. Fun facts It has ten hard drive slots, but only one slot is used, installed a single 240 GiB SATA intel SSD. It has 1.408 TiB memory (12x32 GiB DRAM + 8x128 GiB Optane), and we plan to reach 1.92 TiB memory (12x32 GiB DRAM + 12x128 GiB Optane) next month. There’re two 24 core Intel® Xeon® Gold 6252 CPU installed, with hyper-thread enabled, htop is like: It takes 10 mins to boot, and loud enough to wake up everyone in the lab. Read more

July 26, 2019

Modern allocator: mimalloc

Memory management, especially memory allocation has been a important bottleneck of high performance multi-thread systems. The following figure shows one my of experiments on high performance in-memory indexes. The experiment is performed on a four-socket machine with 40 physical cores in total. The yellow line shows the result with jemalloc and grey lines shows the throughput with glibc malloc. There’re two problems with glibc malloc: It’s slower than jemalloc Read more

July 23, 2019

Efficient(correct) way to check a bit value in C/C++

It’s very common to manipulate data in the bit granularity in high performance systems, and checking whether a bit equals to 1 is one of the primary operations. The way I usually do is: #define CHECK_BIT(var, pos) ((((var) & (1 << pos)) > 0) ? (1) : (0)) It basically creates a mask and perform an and operation against the variable, it’s simple and intuitive enough that I never thought it can be a bottleneck. Read more

Xiangpeng Hao 2020