Skip to main content

Local agents, remote brains: connecting OpenCode to LUMI

Artificial intelligence is rapidly changing how we write software. If you have been following the AI space, you have likely heard of coding agents. But what exactly are they, how do they differ from standard autocomplete tools, and how can you run your own customized agent using LUMI’s compute power?

In this post, we will provide an overview of setting up the OpenCode agent in a local Docker container, connecting it to a Qwen3-Coder-30B-A3B-Instruct instance running on LUMI, and discuss the responsible use of supercomputing resources for this workflow.

The coding assistant landscape

Coding agents are autonomous AI systems powered by Large Language Models (LLMs) that can plan, write, debug, and execute code to complete software development tasks with limited human oversight.

  • What they are good at: Rapidly generating boilerplate, performing sweeping codebase refactors, and potentially ease the development.
  • What to keep in mind: They are not infallible. They can hallucinate non-existent functions, misunderstand complex system architectures, and confidently introduce security vulnerabilities. You are still the lead developer; the agent should be treated as an assistive tool rather than an autonomous developer.

While all coding assistants aim to speed up your workflow, the tool you choose dictates how you interact with the AI:

  • Standard IDE integrations (e.g., GitHub Copilot): Useful for inline autocomplete and generating small snippets, but less autonomous and less customizable.
  • Agentic IDEs (e.g., Cursor): More powerful tools that can operate across multiple files, but often require subscriptions and lock you into specific workflows.
  • Custom local agents (e.g., OpenCode): These decouple the agent from the IDE, giving you full control over configuration, prompts, and execution.

In this blog post, we focus on custom local agents, specifically OpenCode. Regardless of the interface, coding agents operate in a similar way under the hood.

The anatomy of a custom setup

To understand how this works on LUMI, it helps to think of the system as having two parts: the Body (the Agent) and the Brain (the LLM).

In a standard IDE (Integrated Development Environment) plugin, such as those for VSCode or PyCharm, these two are often bundled together in a “black box.” But by using a custom agent like OpenCode, we can decouple them. We keep the “Body” local on our machine (our laptop/PC), where it can see our files and run our terminal while outsourcing the “Brain” to LUMI’s massive GPU nodes.

But how do the brain and body actually talk to each other across a continent? Let’s look under the hood.

Under the hood: how agents, LLMs, and vLLM work together

Unlike traditional coding assistants that simply autocomplete the current line, coding agents operate in an iterative loop of reasoning and execution. But how does a text-generating AI actually do things?

Here is the lifecycle of a prompt in our LUMI setup:

  1. The Request: You provide a prompt (e.g. “Refactor the authentication module to use OAuth2”). The agent (the ‘Body’) wraps your prompt—along with a hidden system prompt defining available tools—and sends it through a secure SSH tunnel to LUMI.
  2. The Engine (vLLM): On the LUMI compute node, the request is handled by vLLM, which feeds the prompt to the model such as Qwen3-Coder (the ‘Brain’).
  3. Tool Use: The model generates structured outputs representing actions, such as searching for files or executing commands. For example:

    {

    "action": "execute_bash",

    "command": "grep -r 'OAuth' ./src"
    }
  4. Execution: The local coding agent (the ‘Body’) interprets these commands and executes them in the Docker container.
  5. The Loop: Results are sent back to the model, forming an iterative loop until the task is complete.

Sandboxing the agent: why use a container?

While you can run coding agents directly on your machine, we recommend using Docker for additional security and isolation.

Coding agents execute commands to build code, navigate directories, and run tests. Running the agent directly on your host machine is risky—a misinterpreted prompt could result in unintended and destructive actions (e.g., deleting files).

Running the agent inside an isolated Docker container acts as a sandbox, protecting your host system.

Note: Running Docker typically requires elevated privileges or adding your user to the Docker group.

Responsible use of LUMI resources

Before diving into the setup, it is important to clarify appropriate usage. LUMI is a shared supercomputing resource, and its usage should reflect that.

While connecting a local agent to your own vLLM instance on LUMI is a useful proof of concept, it is not intended as a deployment setup. While this approach works technically, it highlights a mismatch between interactive AI workflows and traditional HPC usage models.

What LUMI IS meant for in this context:

  • Developing and benchmarking coding agent systems.
  • Experimenting with different models and agent configurations.
  • Validating pipelines, containerization, and SSH tunneling.

What LUMI IS NOT meant for:

  • Continuously hosting an LLM for a personal coding assistant.
  • Using resources for purposes other than for which they have been granted.

So, hosting an LLM on LUMI for a local coding agent is a valid use case of resources as long as the goal is research and development. To consult the full LUMI terms of service, follow this link.

How to set up OpenCode with vLLM on LUMI

A full implementation with detailed instructions is available in the companion repository. Below is a simplified overview:

  1. Connect to LUMI and run a batch job that books compute resources and starts a vLLM instance.
  2. Setup an SSH tunnel between our machine and LUMI’s compute node where the job is running.
  3. Build and run the container on our machine, edit OpenCode’s configuration file with our API key.
  4. Enter the container and execute OpenCode.

The elephant in the room: vLLM resource waste

While it is straightforward to run a local coding agent connected to LUMI, it is important to consider resource efficiency and billing.  Unlike commercial AI providers that bill you per token (meaning you only pay for what the AI generates), LUMI operates on an allocation-based model.

When you start a vLLM instance, you are billed for all the GPU hours while that job is running. It doesn’t matter if the GPU is crunching numbers or sitting idle waiting for you to finish a cup of coffee – the “meter” is running at full speed.

If you are the only person using the instance – occasionally asking it to write a Python script every 10 minutes – you are severely underutilizing hardware designed for high-throughput, concurrent workloads. In this scenario, model weights continiously occupy GPU memory, while requests are infrequent, resulting in low GPU utilization. However, you are “paying” for 100% of the hardware allocation. Furthermore, you are occupying a high-demand compute node that is physically unavailable to others. If every user reserved a dedicated GPU for interactive tasks, the cluster would quickly reach capacity, causing queue times to skyrocket for the entire community.

Be a good supercomputing citizen: share the instance!

If developing and testing coding agents is an approved use case for your project, we highly recommend teaming up with your compute project members.

Because vLLM can handle multiple concurrent requests, a single instance can serve multiple users. By connecting several local agents to the same endpoint, GPU utilization improves and resource waste is reduced, while queue pressure on LUMI decreases.

This aligns better with HPC usage principles and ensures more effective use of shared infrastructure.

Conclusion

Coding agents provide a powerful way to interact with software systems. By combining local agent environments with large models hosted on LUMI, it is possible to experiment with capabilities that exceed local hardware limits.

At the same time, this experiment highlights an important limitation: LUMI is  intended for development purposes, not production deployment of LLMs.

Understanding both the capabilities and constraints of this setup is essential for using LUMI effectively and responsibly.

Written by:

Artur Vojt-Antal, Junior Machine Learning Specialist at the LUMI AI Factory, CSC

Mai Nguyen, Junior Machine Learning Specialist at the LUMI AI Factory, CSC

Anis Rahman, Machine Learning Specialist at the LUMI AI Factory, CSC