Meet J Rosser

I'm a DPhil student in Machine Learning in the FLAIR lab at the University of Oxford supervised by Jakob Foerster and funded through the AIMS CDT by the EPSRC. My research focuses on scalable methods for reducing risks in advanced AI systems, exploring AI Safety, Interpretability, and Multi-Agent Systems all through the unifying lens of scale.

Previously, I was a Research Scientist Intern at Spotify and worked with the UK AI Security Institute (AISI) on their Bounty Programme investigating automated design of agentic systems. I was also the founding Research Scientist at Convergence (acquired by Salesforce), contributing to Proxy, a state-of-the-art multimodal web agent with 100k+ users, and held senior engineering roles at Pynea and Artera, leading teams and shipping ML innovations.

I live in London and am a member of LISA (London Initiative for Safe AI)! In my spare time I enjoy playing the trumpet in funk bands, running bouldering socials, and skateboarding.

My creds

  • I’ve written a few cool papers

  • I’ve been in the founding team of a few cool startups

  • I’ve worked with awesome people at AISI, Spotify, UCL etc.

Why work with me?

  • I like to think I’m passionate, friendly and approachable! 

  • I am really excited about mentorship, I love teaching!

  • We can work at your pace, I understand how busy Michaelmas can be!

What will you get out of this?

  • Each project below would make a really cool paper - if you are considering academia (e.g. a Masters/PhD) this would be a great way to find out if you like this kind of work!

  • Each project brief I’ve taken the time to de-risk e.g the project will likely go well and we’ll see something cool!

  • I’ve chosen the briefs so that they are scalable from a quick 4 page workshop paper to a 9 page main track conference paper!

LLMs as Automated Architects for Multi-Agent Systems

[ Multi-Agent Systems ] [ Scalable Oversight ]

Estimated project cost: $200 for API tokens

Designing effective multi-agent systems is often a complex, manual process requiring significant architectural foresight. This project investigates the use of Large Language Models (LLMs) as automated architects for these systems, specifically within the demanding domain of cybersecurity. Using the CyberAgentBreeder framework as a testbed, we will prompt LLMs to design and modify multi-agent "scaffolds" (Python programs) to solve Capture-the-Flag (CTF) challenges. The primary focus is on the quality, novelty, and complexity of the agent architectures that different LLMs generate.

Our initial findings show that state-of-the-art (SOTA) models excel at creating fundamentally new agent designs, not just iterating on existing ones. This project aims to formalize that insight: How does an LLM's scale and reasoning ability influence the architectural patterns and effectiveness of the multi-agent systems it designs? Do more capable models produce more sophisticated solutions, such as hierarchical team structures, specialized agent roles, or more robust communication protocols?

This project would be an especially great fit for:

  • Strong python programmers

  • People excited about prompt engineering

  • [Bonus] Experience with async programming