I’m an experienced software engineer (30 years experience). I have worked in multiple industries and with multiple tech stacks and paradigms.
I have extensive experience with distributed computing and for the past decade have been involved with several projects where I have built SQL parsers, query planners and optimizers, as well as distributed query execution capabilities.
I have experience with various Hadoop related technologies such as Apache Spark, Apache Parquet, Apache Arrow, Apache Drill, HDFS, Thrift and so on. I’m a PMC member and committer on the Apache Arrow project, where I donated the initial Rust implementation of Arrow and later donated DataFusion, which is an in-memory SQL query engine optimized for analytics queries.
I have also worked at a Founder / Executive level within early stage startups.
I am the author of the book “How Query Engines Work”.
- JVM (Java/Kotlin/Scala, Maven/Gradle)
- C++ (early in my career)
- Apache Arrow
- Apache Spark
- Distributed computing
- Cloud (AWS mostly)
- Dependency Management
- Jun 2019: Kotlin for Java Developers
- Jan 2018: Functional Programming Principles in Scala
- Nov 2017: Neural Networks and Deep Learning
Principal Distributed Systems Engineer @ NVIDIA (Since Mar 2020)
- Contributing to the RAPIDS Accelerator for Apache Spark, which is an open-source plugin that GPU-accelerates Spark ETL jobs.
Principal Engineer / Senior Principal Engineer @ RMS (Sep 2017 - Feb 2020)
Promoted to Senior Principal Engineer in Dec 2020
Here are some notable achievements from my time in this role:
I led the development of the Data Store Query Service to provide low latency query execution against low cardinality data stored as Parquet files in HDFS whilst also supporting routing queries to Spark SQL Thrift Server for larger or more complex queries. The native execution consisted of a SQL parser, query planner, and native query execution implemented in Scala, using Apache Arrow for the type system. This solution provided two orders of magnitude improvements in performance for many interactive queries and reduced load on our Spark clusters, leading to increased reliability and reduced costs for the platform.
I led the development of the Analytics Gateway component, which extended the Query Service to provide a gateway implementing the Apache Hive protocol, allowing end users to query their data directly using BI tools such as Qlik Sense, Power BI, and Tableau using widely available Hive ODBC/JDBC drivers. The Analytics Gateway provides query parsing, translation, and routing to various backend data sources, including Snowflake. This added a major new product capability to the platform.
I helped build a new Core Services team and took ownership of a legacy Workflow Service and led the effort to overhaul this service and add first class support for Kubernetes and Apache Spark. This service is used by other teams to schedule Docker and Spark tasks as part of workflows which are made up of a DAG of tasks. The Workflow Service deploys and monitors workflow tasks using the Kubernetes API.
I played a key role in addressing many years of accumulated technical debt across a distributed monolith with poor development and testing practices. When I joined RMS, there were no versioned releases, and a general lack of integration tests in CI, resulting in brittle deployments with frequent regressions in the shared test environment. I was a driving force in an effort to introduce versioned releases and also introduced Helm charts as a standard deployment mechanism for all services in the RI platform and mentored teams in building integration tests that would run in their PR branches and this approach caught regressions before code changes were merged to master and deployed to the shared environment.
PMC @ Apache Software Foundation (since Aug 2018)
- PMC (Project Management Committee) member and committer on Apache Arrow
- Donated Rust implementation of Arrow
- Donated DataFusion as a Rust-native in-memory SQL query engine optimized for analytics queries
Co-Founder & CTO @ Raven Data Security (Jan 2017 - Sep 2017)
- As technical co-founder I led the development of an MVP of a data security platform based on Apache Spark.
Chief Architect @ AgilData (Dec 2014 - Jan 2017)
AgilData’s mission is to make developers around the world happier and more productive by simplifying how they work with data. CodeFutures pivoted in Dec 2014 to become AgilData, with a new CEO and a new strategic investor.
- Led the design and development of a distributed streaming SQL-based relational database to serve as the platform for our investor’s core product
- Implemented parser, query planner, query optimizer, and native query execution using Kafka-like replicated logs combined with RocksDB indexes, supporting full relational SQL queries, including joins
- Later, transitioned the execution engine to Apache Spark (translating AgilData query plans to Spark DataFrame operations)
- Implemented IPC in Spark, allowing Spark jobs to use UDFs written in C, MATLAB, and other languages
- Provided consulting services to our strategic investor
- Worked on numerous internal R&D projects to validate ideas for future products, including a zero-knowledge encryption gateway for MySQL
- Managed a small engineering team through multiple language transitions (Java -> Scala -> Rust)
- Produced several popular blog posts for the company’s web site (link)
- Technologies: Scala, Java, Rust, C++, Apache Spark, Apache Kafka, Apache Zookeeper, RocksDB, Google Protocol Buffers, MySQL
Chief Architect @ CodeFutures (Aug 2007 - Dec 2014)
CodeFutures provided hosted solutions for scaling MySQL databases based on database sharding.
- Chief Architect and Lead Developer for dbShards, a leading commercial “NewSQL” relational database sharding solution which supports 5 of the top 50 Facebook applications
- Developed high-performance multi-threaded Java agents to deliver reliable database replication based on a patent which I co-invented
- Developed distributed query agents for performing distributed queries against shards
- Developed very high performance SQL tokenizer and parser
- Developed custom JDBC, ODBC and Native MySQL database drivers
- Developed high performance messaging libraries in Java and C
- Implemented High Availability (HA) features such as failover and fail-down with Apache Zookeeper
- Provided consultancy and support to help customers scale their applications
- Technologies: Java, JDBC, NIO, Concurrency, JNI, C, C++, Ruby, AWS, RightScale, MySQL, ODBC, Apache Tribes, XSocket, Google Protocol Buffers, TCP/UDP Sockets, Named Pipes, Unix Sockets, PHP, Python, Apache httpd, Tomcat, Jetty, Apache Zookeeper
Software Architect/Developer @ Rogue Wave Software (Oct 2005 - Aug 2007)
Rogue Wave Software was historically a C tools company but was in the process of building out a new enterprise data processing product based on Java technologies.
- Architect and Developer on Hydra SDO project - an implementation of the Service Data Objects (SDO) specification
- Implemented an efficient XML parser in C and wrapped it in JNI to allow Java and C to share data through an in-memory DOM
- Represented Rogue Wave within the Open SOA (OSOA) collaboration
- Co-authored the official SDO 2.1 specifications for Java and C++
- Took active role in developing the Apache Tuscany SDO Community Test Suite (CTS)
- Gained Apache committer status
- Technologies: Java, C, C++, JNI, XML, XSD, SDO
Founder and CTO @ Code Futures Software (Jan 2003 - Oct 2005)
This was my consulting company and I also developed and marketed my own product, FireStorm/DAO, a code generator, which is now used by more than 300 companies worldwide.
As a contractor, I worked in the following roles:
- Java Development Team Leader at British Sky Broadcasting (Sky TV)
- Development Manager at Diamond Trading Center (marketing arm of De Beers)
- Software Developer at Volantis Systems
Engineering Manager / Product Manager @ Cape Clear Software (Nov 2000 - Dec 2002)
Cape Clear was a “web services” company, with a product that made it easy to expose existing J2EE and CORBA services via SOAP and WSDL.
- Joined Cape Clear through the acquisition of Orbware
- Hired and managed a small development team in London
- Assisted sales team in presenting the product to potential customers
- Worked with CTO to create product roadmap
- Technologies: Java, J2EE, CORBA, XML, SOAP, WSDL, UDDI
Co-Founder & CTO @ Orbware Technologies (Dec 1999 - Nov 2000)
This was my first startup. We built one of the earliest commercial J2EE/EJB servers and sold the company within 12 months.
- Co-founded this startup
- Developed a complete Enterprise JavaBeans (EJB) application server (OrCAS) with one other developer in 9 months
- First UK licensee of the J2EE specification from Sun Microsystems
- Launched the product at JavaOne in June 2000
- Orbware was acquired by Cape Clear Software in November 2000
- Technologies: Java, EJB, RMI, TCP Sockets, Tomcat, XML
- Java Architect @ British Sky Broadcasting (1997-1999)
- C++ Developer @ Mitsubishi Trust & Banking Corporation (1994-1997)
- C++ Developer @ Natwest Markets (1990-1994)
- dBase Developer @ Burton Group Financial Services (1989-1990)
Outside of work, I enjoy working on various hobby projects often involving some combination of embedded hardware (Arduino/AVR, Raspberry Pi, etc.), digital electronics, 3D printing, woodworking, and whatever other skills I need to complete a particular project.