Skip to main content

0001: Custom proxy

Context

A common question is why not just NGINX instead of the custom proxy? The reason is the dynamic routing for the applications, e.g. URLs like https://jupyterlab-abcde1234.mydomain.com/some/path: each one has a lot of fairly complex requirements.

  • It must redirect to SSO if not authenticated, and redirect back to the URL once authenticated.
  • It must perform ip-filtering that is not applicable to the main application.
  • It must check that the current user is allowed to access the application, and show a forbidden page if not.
  • It must start the application if it’s not started.
  • It must show a starting page with countdown if it’s starting.
  • It must detect if an application has started, and route requests to it if it is.
  • It must route cookies from all responses back to the user. For JupyterLab, the first response contains cookies used in XSRF protection that are never resent in later requests.
  • It must show an error page if there is an error starting or connecting to the application.
  • It must allow a refresh of the error page to attempt to start the application again.
  • It must support WebSockets, without knowledge ahead of time which paths are used by WebSockets.
  • It must support streaming uploads and downloads.
  • Ideally, there would not be duplicate reponsibilities between the proxy and other parts of the system, e.g. the Django application.

While not impossible to leverage NGINX to move some code from the proxy, there would still need to be custom code, and NGINX would have to communicate via some mechanism to this custom code to achieve all of the above: extra HTTP or Redis requests, or maybe through a custom NGINX module. It is suspected that this will make things more complex rather than less, and increase the burden on the developer.

Decision

We will use a custom proxy for Data Workspace, rather than simply using NGINX.

Consequences

Positive

  • This will decrease the burden on the developer that would have been required by custom NGINX modules, extra HTTP or Redis requests, which all would still have required custom code.

  • Using the custom proxy allows for all of the complex requirements and dynamic routing of our applications over which we have absolute control.

Negative

  • Initial difficulty when onboarding new team members as they will need to understand these decisions and requirements.

  • There is an extra network hop compared to not having a proxy.