Resume

I’m an experienced software engineer (30 years experience). I have worked in multiple industries and with multiple tech stacks and paradigms.

I have extensive experience with distributed computing and for the past decade have been involved with several projects where I have built SQL parsers, query planners and optimizers, as well as distributed query execution capabilities.

I have experience with various Hadoop related technologies such as Apache Spark, Apache Parquet, Apache Arrow, Apache Drill, HDFS, Thrift and so on. I’m a PMC member and committer on the Apache Arrow project, where I donated the initial Rust implementation of Arrow and later donated DataFusion, which is an in-memory SQL query engine optimized for analytics queries.

I have also worked at a Founder / Executive level within early stage startups.

I am the author of the book “How Query Engines Work”.

Technical Skills

Rust
JVM (Java/Kotlin/Scala, Maven/Gradle)
Python
C++ (early in my career)
Apache Arrow (Rust implementation, DataFusion, Ballista)
Apache Spark
Kubernetes
Distributed computing
Cloud (AWS mostly)
Dependency Management

Recent Certifications

Jun 2019: Kotlin for Java Developers
Jan 2018: Functional Programming Principles in Scala
Nov 2017: Neural Networks and Deep Learning

Patents

Scalable relational database replication - U.S. Pat. No. 8,626,709

Professional Experience

Principal Distributed Database Engineer @ Apple (since Apr 2024)

Contributing to Apache DataFusion Comet

Principal Distributed Systems Engineer @ NVIDIA (Mar 2020 - Apr 2024)

Contributing to the RAPIDS Accelerator for Apache Spark, which is an open-source plugin that GPU-accelerates Spark SQL and ETL jobs.

Principal Engineer / Senior Principal Engineer @ RMS (Sep 2017 - Feb 2020)

Promoted to Senior Principal Engineer in Dec 2020

Here are some notable achievements from my time in this role:

I led the development of the Data Store Query Service to provide low latency query execution against low cardinality data stored as Parquet files in HDFS whilst also supporting routing queries to Spark SQL Thrift Server for larger or more complex queries. The native execution consisted of a SQL parser, query planner, and native query execution implemented in Scala, using Apache Arrow for the type system. This solution provided two orders of magnitude improvements in performance for many interactive queries and reduced load on our Spark clusters, leading to increased reliability and reduced costs for the platform.
I led the development of the Analytics Gateway component, which extended the Query Service to provide a gateway implementing the Apache Hive protocol, allowing end users to query their data directly using BI tools such as Qlik Sense, Power BI, and Tableau using widely available Hive ODBC/JDBC drivers. The Analytics Gateway provides query parsing, translation, and routing to various backend data sources, including Snowflake. This added a major new product capability to the platform.
I helped build a new Core Services team and took ownership of a legacy Workflow Service and led the effort to overhaul this service and add first class support for Kubernetes and Apache Spark. This service is used by other teams to schedule Docker and Spark tasks as part of workflows which are made up of a DAG of tasks. The Workflow Service deploys and monitors workflow tasks using the Kubernetes API.

Co-Founder & CTO @ Raven Data Security (Jan 2017 - Sep 2017)

As technical co-founder I led the development of an MVP of a data security platform based on Apache Spark.

Chief Architect @ AgilData (Dec 2014 - Jan 2017)

AgilData’s mission is to make developers around the world happier and more productive by simplifying how they work with data. CodeFutures pivoted in Dec 2014 to become AgilData, with a new CEO and a new strategic investor.

Led the design and development of a distributed streaming SQL-based relational database to serve as the platform for our investor’s core product
Implemented parser, query planner, query optimizer, and native query execution using Kafka-like replicated logs combined with RocksDB indexes, supporting full relational SQL queries, including joins
Later, transitioned the execution engine to Apache Spark (translating AgilData query plans to Spark DataFrame operations)
Implemented IPC in Spark, allowing Spark jobs to use UDFs written in C, MATLAB, and other languages
Provided consulting services to our strategic investor
Worked on numerous internal R&D projects to validate ideas for future products, including a zero-knowledge encryption gateway for MySQL
Managed a small engineering team through multiple language transitions (Java -> Scala -> Rust)
Produced several popular blog posts for the company’s web site (link)
Technologies: Scala, Java, Rust, C++, Apache Spark, Apache Kafka, Apache Zookeeper, RocksDB, Google Protocol Buffers, MySQL

Chief Architect @ CodeFutures (Aug 2007 - Dec 2014)

CodeFutures provided hosted solutions for scaling MySQL databases based on database sharding.

Chief Architect and Lead Developer for dbShards, a leading commercial “NewSQL” relational database sharding solution which supports 5 of the top 50 Facebook applications
Developed high-performance multi-threaded Java agents to deliver reliable database replication based on a patent which I co-invented
Developed distributed query agents for performing distributed queries against shards
Developed very high performance SQL tokenizer and parser
Developed custom JDBC, ODBC and Native MySQL database drivers
Developed high performance messaging libraries in Java and C
Implemented High Availability (HA) features such as failover and fail-down with Apache Zookeeper
Provided consultancy and support to help customers scale their applications
Technologies: Java, JDBC, NIO, Concurrency, JNI, C, C++, Ruby, AWS, RightScale, MySQL, ODBC, Apache Tribes, XSocket, Google Protocol Buffers, TCP/UDP Sockets, Named Pipes, Unix Sockets, PHP, Python, Apache httpd, Tomcat, Jetty, Apache Zookeeper

Software Architect/Developer @ Rogue Wave Software (Oct 2005 - Aug 2007)

Rogue Wave Software was historically a C tools company but was in the process of building out a new enterprise data processing product based on Java technologies.

Architect and Developer on Hydra SDO project - an implementation of the Service Data Objects (SDO) specification
Implemented an efficient XML parser in C and wrapped it in JNI to allow Java and C to share data through an in-memory DOM
Represented Rogue Wave within the Open SOA (OSOA) collaboration
Co-authored the official SDO 2.1 specifications for Java and C++
Took active role in developing the Apache Tuscany SDO Community Test Suite (CTS)
Gained Apache committer status
Technologies: Java, C, C++, JNI, XML, XSD, SDO

Founder and CTO @ Code Futures Software (Jan 2003 - Oct 2005)

This was my consulting company and I also developed and marketed my own product, FireStorm/DAO, a code generator, which is now used by more than 300 companies worldwide.

As a contractor, I worked in the following roles:

Java Development Team Leader at British Sky Broadcasting (Sky TV)
Development Manager at Diamond Trading Center (marketing arm of De Beers)
Software Developer at Volantis Systems

Engineering Manager / Product Manager @ Cape Clear Software (Nov 2000 - Dec 2002)

Cape Clear was a “web services” company, with a product that made it easy to expose existing J2EE and CORBA services via SOAP and WSDL.

Joined Cape Clear through the acquisition of Orbware
Hired and managed a small development team in London
Assisted sales team in presenting the product to potential customers
Worked with CTO to create product roadmap
Technologies: Java, J2EE, CORBA, XML, SOAP, WSDL, UDDI

Co-Founder & CTO @ Orbware Technologies (Dec 1999 - Nov 2000)

This was my first startup. We built one of the earliest commercial J2EE/EJB servers and sold the company within 12 months.

Co-founded this startup
Developed a complete Enterprise JavaBeans (EJB) application server (OrCAS) with one other developer in 9 months
First UK licensee of the J2EE specification from Sun Microsystems
Launched the product at JavaOne in June 2000
Orbware was acquired by Cape Clear Software in November 2000
Technologies: Java, EJB, RMI, TCP Sockets, Tomcat, XML

Earlier Roles

Java Architect @ British Sky Broadcasting (1997-1999)
C++ Developer @ Mitsubishi Trust & Banking Corporation (1994-1997)
C++ Developer @ Natwest Markets (1990-1994)
dBase Developer @ Burton Group Financial Services (1989-1990)

Interests

Outside of work, I enjoy working on various hobby projects often involving some combination of embedded hardware (Arduino/AVR, Raspberry Pi, etc.), digital electronics, 3D printing, woodworking, and whatever other skills I need to complete a particular project.