Recent Posts

DataFusion Donated to Apache Arrow

February 05, 2019

I’m excited to announce that DataFusion has now been donated to the Apache Software Foundation as a Rust-native in-memory query engine for the Apache Arrow project.

DataFusion 2019

November 04, 2018

Earlier this year I put a lot of time and energy into DataFusion with the goal of creating a platform somewhat like Apache Spark, but implemented in Rust, without all the inefficiencies of the JVM. This was quite the journey, and I learned a lot of positive things from this effort, specifically:

Refactoring Apache Arrow to use traits and generics

May 04, 2018

I am currently working on a refactor of the Rust implementation of Apache Arrow to change the way that arrays are represented. This is a relatively large change even though this is a tiny codebase so far and I thought it would be good to write up this blog post to explain why I think this is needed. I think this information will also be interesting for any Rust developer who is struggling with making the right choice between (or using the right combination of) enums, structs, generics and traits. I was inspired to write this up after reading this blog post that was posted to Reddit just a few days ago.

DataFusion now uses Apache Arrow

April 05, 2018

I’m excited to announce that DataFusion is now using Apache Arrow for its internal memory representation of data. It was already using columnar data structures based on Vec<T> and moving to Arrow was not that big a leap.

DataFusion 0.2.1 Benchmark

March 17, 2018

Over the past week or so I have been refactoring the core of DataFusion to convert it from a row-based execution engine to perform column-based processing. This was a pretty large refactoring effort but I am now back to roughly the same level of functionality as before (which is definitely still POC but capable of running some real queries).

This Weekend in DataFusion (2/18/18)

February 18, 2018

I had limited time to work on DataFusion this weekend but have started to refactor the code base based on some feedback that I received on Reddit last week and have also been working on some benchmarks.

DataFusion update 2/11/18

February 11, 2018

Following on from my blog post Rust is for Big Data, I announced my open source distributed data processing project, DataFusion, on reddit last week. I was probably a bit premature in announcing the project since it was at such an early stage but I was excited that I had some simple queries working with quite decent performance (roughly 2x the performance of Apache Spark) and wanted to start generating some interest in the project.

Rust is for Big Data (#rust2018)

January 28, 2018

This blog post isn’t so much about what I want from the Rust language in 2018, but more about where I see an opportunity for Rust to gain more widespread use in 2018.

My Goals for 2018

January 25, 2018

Instead of making new year’s resolutions, I’ve set myself some fairly specific goals for 2018 relating to family, health, finances, hobbies, and career. Many of these goals are broken down into monthly and quarterly objectives and some even have specific objectives and measurable results in true OKR style.