Phantom in the Box

— A Large Scale Flock Interacting with Human in Virtual 3D Space —

by Tatsuo Unemi1, Philippe Kocher2 and Daniel Bisig2

1Glycan and Life Systems Integration Center, Soka University
2Institute for Computer Music and Sound Technology, Zurich University of the Arts

April 2025

☜ Click here to watch the movie.

Concept

An individual living at this moment exists as an irreplaceable being, but it is also a fact that the body itself is made up of a collection of chemical compounds that have come to this place as parts of the organs and will leave it as excretions. The existence of the individual is only possible as a spatiotemporal functionality in relation to the environment, including other entities.

Our previous work named Identity-SA [1,2,3] provided the visitor with the experience of observing a fragile organization of bodies in the two-dimensional mirror. It showed an emergent visuals and sounds by a swarm on the two-dimensional surface, and it reacts to the visitor's motion detected by an image processing technique.

This work, Phantom in the box, offers visitors a similar opportunity. However, the visuals are not on a flat surface, but in three-dimensional virtual space. The reaction of the swarm in three-dimensional space to a visitor's gesture provides a different kind of impression than a two-dimensional mirror. Our bodies and the materials they are made of move in the shared space in real time. This representation provides more direct inspiration on the relationship between the individual entity and the materials in the environment.

Technical Features

This interactive installation runs on a combination of two (or three) small computers, a display monitor (or a screen and a projector), and a depth camera. A large swarm of several hundred thousand individuals is simulated on the main computer to produce a 3D graphic animation. The computer also calculates a set of statistical indices about the movements of the individuals, which are sent to another computer to generate sound effects. The depth camera detects the 3D shape of the visitor's surface as a 2D map of the distance distribution between the camera and the body. The computer receiving the distance map computes a gradient map from it, and organizes a set of several thousand random points that uniformly cover the detected surface in the 3D space. The information about the position of these points and corresponding color values, extracted from RGB camera attached to the depth camera, is transmitted to the main computer in order to make the swarm to react to the visitor's gesture.

Flock of birds, school of fish, and herd of herbivores are well-known natural phenomena that can be observed in a large group of active individuals interacting with each other. To understand the underlying mechanism causing these phenomena, a lot of research has been conducted in various fields such as biology, robotics, computer science, mathematics, and so on. Typically, a mathematical dynamic model and development of efficient algorithms for simulation are well researched and useful to introduce a natural and unpredictable scenery as computer graphics for creation of both video films and games. The pioneering work of Craig Reynolds proposed BOIDS model [4], and it was applied to a short film of 3D computer graphics animation entitled Stanley and Stella: In Breaking the Ice, in 1987.

Each individual agent has the position and velocity as its own property. In each step of simulation, the force vector applied to the individual is calculated by combination of three factors, separation, cohesion, and alignment, by observing the other individuals within the sight volume of a limited distance and angle. The separation force is to avoid collision between individuals. The cohesion force is to gather individuals to organize a group behavior. The alignment force is to move together with others in the group. The integrated force vector is applied to the individual's velocity and position using a simple numerical approximation method, Euler method, following the differential equation of Newtonian mechanics.

The information of the points transmitted from the computer with the depth sensor acts as additional entities in the 3D space to attract individuals. In order to summon the individuals in distance, a force field toward the nearest point is roughly organized. It affects an individual even when it could not find any points in its view.

Thanks to the improvement of parallel computing for graphics, simulation, numerical analysis, and large-scale artificial neural networks, mainly supported by the hardware development of the graphics processing unit, a large-scale BOIDS simulation also became a target of real-time processing on a personal computer [5,6]. The authors developed a software running on a small sized computer, Mac mini, capable of managing up to one million individuals for a flocking behavior in real time.

Each individual is rendered as a small paper plane constructed from two triangles, one horizontal and one vertical. The color is determined from the velocity vector by mapping the pan angle to hue value, the tilt angle to brightness, and the speed to saturation. This color mapping method realizes a dynamic alternation of colorful images following the movement of the flock.

References

  1. T. Unemi and D. Bisig, Identity-SA — an interactive swarm-based animation with a deformed reflection, Proceedings of the Tenth Generative Art Conference, Milan, Italy, pp. 269—279, 2007.
  2. T. Unemi, Y. Matsui, and D. Bisig, Identity SA 1.6 — An artistic software that produces a deformed audio-visual reflection based on a visually interactive swarm, Proceedings of the ACE 2008 International Conference on Advances in Computer Entertainment Technology, Yokohama, Japan, pp. 297—300, 2008.
  3. T. Unemi and D. Bisig, Identity SA, SIGGRAPH 2009 Computer Animation Festival, Real-Time Rendering Live demonstration, Ernest N. Morial Convention Center, New Orleans, LA, USA, August 3—6, 2009.
  4. C. W. Reynolds, Flocks, herds, and schools: A distributed behavioral model, Computer Graphics, Vol. 21, No. 4, pp. 25—34, (SIGGRAPH '87 Conference Proceedings) 1987.
  5. U. Erra, B. Frola, V. Scarano, and I. Couzin, An efficient gpu implementation for large scale individual-based simulation of collective behavior, In 2009 International workshop on high performance computational systems biology, 51—58, 2009.
  6. P. Richmond, S. Coakley, and D. M. Romano, A high performance agent based modelling framework on graphics card hardware with cuda, In Proceedings of the 8th international conference on autonomous agents and multiagent systems, Vol. 2, AAMAS '09, 1125—1126, Richland, SC, 2009.

Publications

  1. T. Unemi, P. Kocher and D. Bisig, Phantom in the Box — an Interactive Installation with Large-Scale Flocking Agents, 28th Generative Art Conference, 125–133, Rome, Italy, Dec. 16–18, 2025.
  2. T. Unemi, An Efficient Algorithm without Data Cache to Simulate a Large Scale Flock in 3D Space, 31st International Symposium on Artificial Life and Robotics, 983–988, Beppu, Japan, Jan. 21–23, 2026.

Exhibitions

  1. Renzan-sai, Campus Festival in Higashi-Nippon International University, Iwaki, Japan, Oct. 18–19, 2025.
  2. 28th Generative Art Conference, Biblioteca Casanatense, Rome, Italy, Dec. 18, 2025.

© T. Unemi, P. Kocher and D. Bisig, 2025. Revised in Feb. 10, 2026.