Over the past couple days at work we started noticing Spark 2.3 and 2.4 jobs failing with a permissions error across multiple EKS clusters. Here is an example stack trace:
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden' at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
We eventually realized that this was due to Amazon rolling out security patches to their EKS clusters to address CVE-2019-9512 and CVE-2019-9514, causing a regression with the kubernetes client that Spark uses. Two issues have been filed against Spark for this: SPARK-28921 and SPARK-28925.
I have created the following trivial pull requests against Spark to upgrade to the 4.4.2 release (latest version at time of writing):
- Patch for Spark 2.3.x (rejected because Spark 2.3 is EOL 😢)
- Patch for Spark 2.4.x
- Patch for Spark 3.0.x
If you need to fix this urgently, there are a couple of options:
Option 1: Simply replace the kubernetes client jar(s) in the Spark distributions with the three 4.4.2 jars that are available to download from Maven central and hope for the best.
kubernetes-client-4.4.2.jar kubernetes-model-4.4.2.jar kubernetes-model-common-4.4.2.jar
- Check out the tag for specific Spark release that you need
- Update the kubernetes version in
- Follow the instructions for building a Spark distribution
I’ll update this post after the weekend once I’ve had time to verify how well this works.