How Wide-Area Motion Imagery Actually Works: Persistent Surveillance at City Scale
Try the interactive lab for this articleTake the quiz (6 questions · ~5 min)A conventional surveillance camera sees a narrow slice of the world. A high-end PTZ (pan-tilt-zoom) camera mounted on a police helicopter might cover a few city blocks at useful resolution, and only if the operator is pointing it in the right direction at the right moment. If an explosion occurs on one street and the camera is trained on another, you have nothing. The operator can slew the camera to the blast site within seconds, but the critical moments before the event, the vehicle that delivered the device, the route it took to get there, the location it departed from, all of that is gone.
Wide-Area Motion Imagery (WAMI) eliminates this problem by reimagining what a camera should be. Instead of a single sensor pointed at a narrow field of view, a WAMI system uses an array of dozens or hundreds of sensors, all mounted behind a common optical system on an aircraft, collectively imaging an entire city at once. The coverage area is typically 40 to 100 square kilometres, captured continuously at frame rates of 1 to 2 Hz, with enough resolution to detect and track individual vehicles and, in some systems, individual pedestrians. Every moving object within that area is recorded, all of them, all the time.
The result is something no prior surveillance technology could deliver: the ability to rewind the tape. After an incident at a known location and time, an analyst can identify the relevant vehicle or person, then trace that entity backward in time, frame by frame, through the entire city, all the way back to its origin. This retroactive analysis capability transforms intelligence gathering from a reactive, luck-dependent discipline into something closer to deterministic forensics.
This article covers the engineering behind WAMI, from the optics and sensor arrays through the image processing pipeline, moving target detection, multi-object tracking, data management, and the operational systems that have been built, tested, and deployed.
1. The Fundamental Design Problem
The challenge that WAMI addresses is rooted in a basic trade-off in imaging system design: field of view versus resolution.
The instantaneous field of view (IFOV) of a single pixel on a detector is determined by the pixel pitch divided by the focal length:
IFOV = p / fWhere p is the detector pixel pitch (typically 1.5 to 6 micrometres for modern CMOS sensors) and f is the focal length. The ground sample distance (GSD), the size of the ground area represented by one pixel, is then:
GSD = IFOV × H = (p × H) / fWhere H is the altitude of the aircraft above ground. To image a vehicle (roughly 4.5 metres long) with enough pixels for reliable detection, you need a GSD of about 0.5 metres or smaller, which means at least 9 pixels along the vehicle's length. For more reliable classification (distinguishing a saloon car from a lorry, for example), a GSD of 0.15 to 0.25 metres is preferred.
Now consider the total field of view. A single detector with N pixels across covers a ground swath of:
swath = N × GSDA 4096-pixel-wide detector at 0.25 metre GSD covers a swath of 1,024 metres, roughly one square kilometre. To cover 100 square kilometres (a circle roughly 5.6 kilometres in radius), you would need a detector with approximately 40,000 pixels across, or about 40,000 by 40,000 pixels: 1.6 gigapixels. No single focal plane array that large exists (the largest monolithic scientific CCDs are in the range of 100 to 200 megapixels). Even if one could be fabricated, the optical system to project a sharp image onto it at the required GSD from operational altitude would be enormous.
The solution adopted by every operational WAMI system is to tile many smaller sensors behind a common optical system or a cluster of smaller optical systems. Each sensor images a sub-region of the total field of view. The sub-images are then computationally stitched together into a single seamless mosaic. This is the architectural principle behind ARGUS-IS, Gorgon Stare, and every other WAMI system.
2. ARGUS-IS: The Gigapixel Demonstrator
ARGUS-IS (Autonomous Real-time Ground Ubiquitous Surveillance Imaging System) was developed by BAE Systems under a DARPA programme that began in 2007. It remains the highest-resolution WAMI system publicly documented and represents the upper bound of what current technology can achieve.
Sensor Architecture
The ARGUS-IS camera array contains 368 individual imaging chips, each a 5-megapixel CMOS sensor of the type found in smartphone cameras circa 2010. These 368 sensors are arranged in a tiled mosaic behind a set of four large optical apertures. Each aperture feeds a cluster of 92 sensors. The four aperture clusters are arranged to provide overlapping coverage of the ground area, so that the combined mosaic covers a contiguous region with no gaps.
The total pixel count across all 368 sensors is approximately 1.8 gigapixels. At an operating altitude of 6,100 metres (20,000 feet), this provides a ground sample distance of roughly 15 centimetres across a coverage area of about 40 square kilometres. At that GSD, a standard European saloon car (4.5 metres long, 1.8 metres wide) is represented by approximately 30 by 12 pixels, more than enough for detection, tracking, and basic classification.
Optical Design
The optical challenge in ARGUS-IS is not building a single perfect lens but rather precisely calibrating the geometric relationship between 368 individual sensors and the optical elements they sit behind. Each sensor has a slightly different position and orientation relative to the aperture, which means each one images a slightly different ground patch at a slightly different angle. The calibration process involves determining, for each pixel on each sensor, the exact ground coordinates it maps to at a given aircraft position, altitude, and attitude (roll, pitch, yaw).
This calibration is done by imaging a scene with known ground control points and solving for the interior orientation parameters of each sensor and the exterior orientation of the entire payload. The process is analogous to the bundle adjustment used in photogrammetry, except that it involves 368 cameras simultaneously. Once calibrated, the geometric model allows real-time projection of each sensor's pixels onto a common ground-referenced coordinate system.
Frame Rate and Synchronisation
ARGUS-IS captures imagery at approximately 1.8 frames per second across the full 1.8 gigapixel mosaic. All 368 sensors are synchronised to trigger within a fraction of a millisecond of each other, ensuring temporal consistency across the mosaic. Without tight synchronisation, a fast-moving vehicle could appear in different positions in adjacent sensor tiles, creating artefacts in the stitched image.
The synchronisation is achieved through a common clock signal distributed to all sensor modules over a dedicated timing bus. Each sensor captures a raw frame on the rising edge of the sync pulse, stores it in a local frame buffer, and then transfers it to the central processing unit over a high-speed data bus. The architecture uses multiple parallel data buses to achieve the aggregate bandwidth required: 368 sensors at 5 megapixels each at 12 bits per pixel at 1.8 fps produces roughly 4 gigabytes per second of raw data.
Processing Architecture
The raw data from 368 sensors cannot be stitched and analysed by a conventional CPU in real time. ARGUS-IS uses a combination of FPGAs and GPUs, physically located on the aircraft, to perform the first stages of image processing. The FPGAs handle fixed-function tasks like demosaicing (converting the Bayer-pattern raw data from each sensor into colour pixels), geometric warping (projecting each sensor's image onto the common ground grid), and radiometric correction (normalising brightness and contrast across sensors). The GPUs handle more flexible tasks like seam blending and moving target detection.
The stitched mosaic is too large to transmit to the ground in real time over any available datalink. Instead, the full mosaic is stored on board, and the aircraft transmits only selected "chip-outs," small rectangular sub-regions (typically 640 by 480 or similar) that an analyst on the ground has requested. Multiple chip-outs can be transmitted simultaneously, each one following a different area of interest. This architecture allows a single WAMI aircraft to provide video feeds to many analysts at once, each viewing a different part of the city.
DARPA demonstrated ARGUS-IS in flight testing over Quantico, Virginia, between 2009 and 2013. The system was mounted on a Black Hawk helicopter for testing, though it was designed for integration onto high-altitude, long-endurance unmanned aircraft. The programme established the feasibility of gigapixel-scale persistent surveillance, but ARGUS-IS itself was not transitioned to a programme of record for operational deployment. Its technology, however, influenced every subsequent WAMI system.
3. Gorgon Stare: The Operational System
While ARGUS-IS pushed resolution limits, Gorgon Stare took a more pragmatic approach: get a working system into the field with available technology, even if it meant lower resolution.
System Design
Gorgon Stare is produced by Sierra Nevada Corporation (with significant contributions from L3Harris Technologies for the sensor payload) and mounted on the MQ-9 Reaper unmanned aircraft. The system has gone through multiple increments, each improving capability.
The initial Increment 1 system, fielded in 2011, used nine cameras: five electro-optical (EO, visible spectrum) and four infrared (IR, thermal). These cameras were arranged in a downward-looking cluster, with overlapping fields of view that together covered a ground area approximately 4 kilometres in radius from the aircraft's ground track. The resolution was significantly lower than ARGUS-IS, with a GSD of roughly 0.5 to 1.0 metres, depending on altitude and atmospheric conditions.
Increment 2, fielded from 2014, expanded to 12 cameras (five EO and seven IR) and improved the optics and processing. The IR capability is significant: it allows operations at night and through light obscuration (dust, thin cloud, smoke), extending the sensor's utility beyond daylight hours. The IR sensors operate in the mid-wave infrared (MWIR, 3 to 5 micrometre) and long-wave infrared (LWIR, 8 to 12 micrometre) bands, detecting thermal emissions from vehicles, personnel, and structures.
Architectural Differences from ARGUS-IS
Gorgon Stare and ARGUS-IS differ in several important ways beyond raw resolution.
Sensor count and type. ARGUS-IS uses 368 small CMOS sensors; Gorgon Stare uses 12 larger, higher-quality sensors with cooled IR focal plane arrays. Fewer sensors means simpler stitching and calibration but lower total pixel count.
Coverage versus resolution trade-off. Gorgon Stare accepts a coarser GSD (0.5 to 1.0 metres versus 0.15 metres) in exchange for operational robustness and lower system complexity. At 0.5 metre GSD, vehicles are detectable (roughly 9 by 4 pixels for a car) but individual pedestrians are marginal, appearing as only 1 to 3 pixels.
Dual-band capability. ARGUS-IS is visible-spectrum only. Gorgon Stare's IR channels allow 24-hour operation.
Platform integration. Gorgon Stare is integrated onto the MQ-9 Reaper, which has an endurance of over 27 hours. This means a single aircraft can provide persistent surveillance of a city-scale area for an entire day without relief. ARGUS-IS was demonstrated on a helicopter with far shorter endurance.
Datalink. The MQ-9 uses a Ku-band satellite datalink (the AN/ZPY-1 STARLite radar occupies the nose, and the Gorgon Stare sensor pod is carried under the fuselage or wing). The satellite link has limited bandwidth, typically 50 Mbit/s or less, which is not sufficient for the full-resolution video stream. Like ARGUS-IS, Gorgon Stare stores full-resolution data on board and transmits chip-outs on demand.
Gorgon Stare has been deployed operationally in Afghanistan and, reportedly, in other theatres. It represents the state of the art in fielded WAMI systems as of 2026.
4. The Imaging Pipeline: From Raw Frames to Coherent Mosaic
The raw output of a WAMI sensor array is not a usable image. It is hundreds of individual sensor frames, each with its own geometric distortion, radiometric characteristics, and slight timing offsets. Converting this into a single, geo-referenced, seamless mosaic requires a multi-stage image processing pipeline that must run in real time, on the aircraft, with limited computational resources.
Geometric Correction
Each sensor in the array captures its image from a slightly different viewpoint and through a slightly different portion of the optical system. The first processing step is to project each sensor's pixels onto a common ground-referenced coordinate system, typically a Universal Transverse Mercator (UTM) grid.
This requires knowing, with high precision, the position and attitude of the aircraft at the moment of capture. The aircraft carries an inertial navigation system (INS) coupled with GPS, providing position to within a few centimetres and attitude (roll, pitch, yaw) to within a few hundredths of a degree. These measurements, combined with the pre-calibrated geometric model of each sensor's position and orientation within the payload, allow each pixel to be projected to a ground coordinate.
For flat terrain, this projection is straightforward: each pixel's ground coordinates are computed by ray-tracing from the sensor through the optics to the ground plane at a known elevation. For terrain with significant relief, a digital terrain model (DTM) is required. Without terrain correction, a hilltop might be displaced by tens of metres in the mosaic, depending on the camera viewing angle. The orthorectification process re-projects each pixel onto the DTM surface, removing relief displacement and producing a geometrically correct orthomosaic.
The computational cost is significant. For a 1.8 gigapixel frame, every pixel requires a coordinate transformation involving at least a matrix multiplication and a ray-terrain intersection. At 1.8 fps, that is over 3 billion coordinate computations per second. FPGAs are well-suited to this task because the operation is the same for every pixel (embarrassingly parallel) and the arithmetic is simple (fixed-point multiplication and addition).
Radiometric Normalisation
Adjacent sensors in the array will not produce identical brightness values for the same ground feature. Differences arise from manufacturing variation in sensor responsivity, non-uniform illumination across the optical aperture (vignetting), and different exposure settings. If these differences are not corrected, the stitched mosaic will show visible seams at the boundaries between sensor tiles.
Radiometric normalisation involves computing a per-sensor gain and offset correction that brings all sensors into a common radiometric scale. The corrections are calibrated by imaging a uniform target (a large, flat, evenly lit surface) and computing the gain and offset required to make all sensors report the same value. In flight, these corrections are applied to every frame. Additional adjustments may be needed for atmospheric scattering variations across the wide field of view: light travelling at a steep angle through the atmosphere traverses more air mass than light travelling straight down, which can cause brightness gradients from the centre to the edges of the mosaic.
Seam Blending
Even after geometric and radiometric correction, small residual errors at sensor boundaries can produce visible seams. These are mitigated by blending: in the overlap regions between adjacent sensor tiles, pixel values are blended using a weighted average, with the weights decreasing linearly from each sensor's centre toward its edge. This produces a smooth transition. More sophisticated approaches use multi-band blending (separate blending at different spatial frequency scales, similar to the Burt-Adelson pyramid blending technique) to avoid both visible seams and the ghosting artefacts that simple averaging can produce when objects near the sensor boundary move between sensors.
Temporal Registration
At a frame rate of 1.8 Hz, the time interval between consecutive mosaic frames is approximately 556 milliseconds. A vehicle moving at 50 km/h (a typical urban speed) travels about 7.7 metres between frames, which is roughly 50 pixels at 0.15 metre GSD. This inter-frame displacement is the basis for moving target detection. But for the stitching pipeline, temporal consistency is what matters: all sensors must capture their frames close enough in time that objects near sensor boundaries do not appear duplicated or split. With sub-millisecond synchronisation (as in ARGUS-IS), a vehicle moving at 50 km/h moves only about 0.01 metres during the sensor-to-sensor timing skew, which is negligible.
5. Moving Target Detection at City Scale
Once a coherent mosaic is produced, the next task is detecting every moving object within it. In a 100 square kilometre urban area, there may be tens of thousands of vehicles in motion simultaneously, plus pedestrians, cyclists, and other movers. Detecting all of them, in every frame, in real time, is a computational challenge of considerable scale.
Background Subtraction
The standard approach to detecting moving objects in video is background subtraction: maintain a model of the static background (buildings, roads, parked cars, vegetation), then flag any pixel that deviates significantly from the model as belonging to a moving object.
In conventional CCTV, background subtraction is straightforward because the camera is stationary. In WAMI, the camera is moving. The aircraft is flying, the payload is subject to vibration and attitude changes, and the entire scene shifts from frame to frame. Before background subtraction can be applied, each frame must be registered (aligned) to a common ground reference. The geometric correction described in the previous section accomplishes this, projecting each frame onto the UTM grid. After projection, a given ground location occupies the same pixel coordinates in every frame, and background subtraction can proceed as if the camera were stationary.
Gaussian Mixture Models
A simple frame-differencing approach (subtract the current frame from the previous frame, threshold the result) fails in practice because of noise, illumination changes, and "slow movers" (vehicles stopped at a traffic light that then begin moving). More robust methods model each background pixel as a mixture of Gaussians, where each Gaussian component represents a frequently observed intensity value. This is the standard Gaussian Mixture Model (GMM) approach, first described by Stauffer and Grimson in 1999.
For each pixel in the ground-referenced mosaic, the GMM maintains K Gaussian distributions (typically K = 3 to 5), each described by a mean, variance, and weight. When a new frame arrives, each pixel's value is compared against its K Gaussians. If the value matches one of the existing distributions (within 2.5 standard deviations, typically), that distribution's parameters are updated, and the pixel is classified as background. If the value does not match any distribution, the least probable distribution is replaced with a new one centred on the observed value, and the pixel is classified as foreground (a potential mover).
In WAMI, this per-pixel GMM must be maintained for every pixel in the mosaic. For a 1.8 gigapixel mosaic with K = 3 Gaussians per pixel, each requiring a mean (2 bytes), variance (2 bytes), and weight (2 bytes), the background model alone occupies approximately 1.8 billion pixels times 3 components times 6 bytes, which is about 32 gigabytes of memory. This is feasible with modern GPU memory (multiple GPUs with 24 to 80 GB each), but it was a serious constraint in the ARGUS-IS era.
Compensating for Registration Errors
No geometric correction is perfect. Residual registration errors of even 1 to 2 pixels (0.15 to 0.30 metres at ARGUS-IS resolution) can cause the background subtraction to produce false positives along high-contrast edges (building rooflines, road markings, shadows). These false detections can outnumber genuine movers by orders of magnitude.
Mitigation techniques include morphological filtering (eroding the foreground mask to remove single-pixel false positives, then dilating to recover the shapes of genuine movers), spatial coherence checks (requiring that a detected foreground region has a minimum area consistent with a vehicle, typically at least 10 to 20 pixels), and temporal persistence checks (requiring that a detection appear in multiple consecutive frames before it is accepted as a genuine mover).
Detection Rate and Computational Cost
A well-tuned WAMI moving target detection pipeline can detect vehicles with a probability exceeding 95% and a false positive rate below 1 per 10,000 background pixels per frame. For a 1.8 gigapixel mosaic, 1 per 10,000 still means 180,000 false pixel detections per frame, but after spatial and temporal filtering, this is reduced to a manageable number of false tracks (typically fewer than 100 per frame in practice).
The computational cost for background subtraction across 1.8 gigapixels at 1.8 fps is roughly 3 to 5 TFLOPS, depending on the GMM complexity and filtering pipeline. This is within reach of a pair of high-end GPUs (an NVIDIA A100 at 19.5 TFLOPS FP32, for instance), though the ARGUS-IS era hardware used less capable devices and relied more heavily on FPGAs for fixed-function acceleration.
6. Multi-Object Tracking at Scale
Detecting movers is only the first step. The intelligence value of WAMI comes from tracking: assigning a persistent identity to each detected object and maintaining that identity across frames as the object moves through the city. In a typical urban scene, there may be 5,000 to 15,000 vehicles in motion simultaneously. Tracking all of them, continuously, for hours, is a multi-object tracking (MOT) problem at a scale that few other applications demand.
The Assignment Problem
In each frame, the detection pipeline produces a set of detections (position, size, and possibly colour or thermal signature of each detected mover). The tracking system maintains a set of active tracks, each with a predicted position for the current frame. The assignment problem is: which detection corresponds to which track?
This is a bipartite matching problem. There are M active tracks and N detections. Each track-detection pair has an associated cost, typically the Mahalanobis distance between the track's predicted position and the detection's observed position (the Mahalanobis distance accounts for the uncertainty in both the prediction and the measurement). The optimal assignment, the one that minimises total cost, is found using the Hungarian algorithm (also known as the Kuhn-Munkres algorithm).
The Hungarian algorithm has a time complexity of O(n^3), where n is the larger of M and N. For 10,000 tracks and 10,000 detections, that is 10^12 operations, which is too slow for a 1.8 Hz frame rate. In practice, WAMI tracking systems use gating to reduce the assignment problem: each track only considers detections within a gate (a spatial region around the predicted position, typically sized to account for the maximum plausible velocity). With gating, most tracks have only 1 to 5 candidate detections, and the assignment can be solved much more efficiently using sparse variants of the Hungarian algorithm or greedy nearest-neighbour assignment.
State Estimation: Kalman Filters
Each active track maintains a state estimate, typically a position (x, y), velocity (vx, vy), and sometimes acceleration. The Kalman filter is the standard tool for updating this state estimate with each new detection.
The prediction step propagates the state forward in time using a constant-velocity (or constant-acceleration) motion model:
x(k+1|k) = F × x(k|k)
P(k+1|k) = F × P(k|k) × F^T + QWhere F is the state transition matrix (encoding the constant-velocity assumption), P is the state covariance matrix, and Q is the process noise covariance (modelling the uncertainty in the motion model, accounting for acceleration, turning, stopping, and other manoeuvres).
The update step incorporates the new detection:
y = z(k+1) - H × x(k+1|k) [innovation]
S = H × P(k+1|k) × H^T + R [innovation covariance]
K = P(k+1|k) × H^T × S^{-1} [Kalman gain]
x(k+1|k+1) = x(k+1|k) + K × y
P(k+1|k+1) = (I - K × H) × P(k+1|k)Where z is the measurement, H is the observation matrix, and R is the measurement noise covariance. For WAMI tracking, the measurement is simply the (x, y) position of the detection centroid, so H extracts the position components from the state vector.
The Kalman filter provides optimal state estimation for linear systems with Gaussian noise. Vehicle motion in an urban environment is neither linear nor Gaussian (vehicles turn corners, stop at lights, accelerate and brake), but the constant-velocity Kalman filter works well enough for the short inter-frame intervals in WAMI (0.5 to 1 second). More sophisticated variants, such as the Interacting Multiple Model (IMM) filter, which maintains several parallel motion models (constant velocity, coordinated turn, stationary) and blends their outputs based on likelihood, improve tracking performance during manoeuvres.
Track Initiation and Termination
A new detection that does not match any existing track may be a new mover or a false positive. Track initiation logic requires that a detection appear in a consistent location with consistent motion over several consecutive frames (typically 3 to 5) before a new track is created. This "M-of-N" initiation logic (M detections out of N consecutive frames) filters out spurious detections while still initiating tracks on genuine movers within a few seconds.
Track termination occurs when a track fails to receive an associated detection for several consecutive frames. The threshold depends on the operational context. In open terrain, a vehicle that disappears for more than 3 to 5 frames has likely stopped (and become part of the background) or left the coverage area. In an urban environment, a vehicle may be occluded by a building, a bridge overpass, or foliage for many frames. Aggressive termination causes loss of track identity; conservative termination risks maintaining stale or incorrect tracks.
Handling Occlusion
Occlusion is the primary challenge in WAMI tracking. A vehicle that drives under a bridge overpass is invisible for however long the bridge is (perhaps 20 to 50 metres, corresponding to 1 to 4 seconds at urban speeds, or 2 to 7 frames). A vehicle that enters a parking garage may be invisible for minutes or hours. A vehicle that enters a dense tree canopy in summer may be intermittently visible.
For short occlusions (a few frames), the Kalman filter's prediction step carries the track forward, and the track is re-associated when the vehicle emerges on the other side. The system maintains a "coasting" track with increasing uncertainty until either a detection matches or the track is terminated.
For longer occlusions, more sophisticated techniques are used. If a vehicle enters a parking garage (detected by the track approaching the garage entrance and then disappearing), the system can flag the track as "occluded at known location" and watch the garage exit for a vehicle with matching visual characteristics (colour, size) emerging at a later time. This is where appearance models (colour histograms, shape features) become important, allowing re-identification after extended occlusion.
Research systems have demonstrated re-identification success rates of 60% to 80% for vehicles after extended occlusions, using deep-learning-based appearance models. Operational systems typically rely on simpler cues (colour, approximate size) and accept that some fraction of tracks will be lost to long occlusions.
7. Data Rates, Compression, and Storage
A WAMI system generates data at rates that would have been considered exotic even for scientific computing a decade ago. Understanding the data management challenge is essential to understanding why WAMI systems are architected the way they are.
Raw Data Rates
The ARGUS-IS system, with 1.8 gigapixels at 12 bits per pixel at 1.8 fps, generates:
1.8 × 10^9 pixels × 12 bits × 1.8 fps = 38.9 Gbit/s ≈ 4.9 GB/sOver a 12-hour mission, that is approximately 211 terabytes of raw data. No airborne storage system can hold that, and no datalink can transmit it. Compression is mandatory.
Compression Approaches
WAMI imagery has characteristics that differ from conventional video and require adapted compression strategies.
Intra-frame compression (JPEG2000). Each mosaic frame is compressed independently, without reference to previous or future frames. JPEG2000 is preferred over JPEG for WAMI because it supports 12-bit and 16-bit pixel depths, provides better quality at high compression ratios, and supports region-of-interest coding (allowing selected areas to be compressed at higher quality). A typical compression ratio for WAMI using JPEG2000 is 10:1 to 20:1, reducing the data rate from 4.9 GB/s to approximately 250 to 500 MB/s. At this rate, a 12-hour mission produces 10 to 20 terabytes.
Inter-frame compression (H.264/H.265). Because consecutive WAMI frames are highly similar (most of the scene is static background), inter-frame prediction can achieve much higher compression ratios. H.265/HEVC, adapted for the large frame size and low frame rate of WAMI, can achieve compression ratios of 50:1 to 100:1 for the static background, with lower ratios in regions containing movers. This reduces the data rate to roughly 50 to 100 MB/s, and a 12-hour mission to 2 to 4 terabytes.
The challenge with inter-frame compression for WAMI is that analysts need random access to any point in the recording. With inter-frame compression, decoding a single frame may require decoding all frames back to the nearest keyframe (I-frame). WAMI systems address this by inserting keyframes every 10 to 30 frames (every 5 to 15 seconds at 2 fps) and building an index that maps timestamps and ground coordinates to the relevant keyframe positions in the compressed stream.
Onboard Storage
Modern WAMI systems use arrays of solid-state drives (SSDs) for onboard storage. A typical mission storage unit holds 20 to 40 terabytes, using ruggedised SSDs rated for the vibration, temperature, and altitude environment of a military aircraft. After the aircraft lands, the storage units are physically removed and transported to a ground exploitation facility. Real-time transmission of the full mosaic is not attempted; only the chip-outs are transmitted via datalink during the mission.
Ground Exploitation Infrastructure
The ground facility that ingests, stores, indexes, and serves WAMI data must handle petabyte-scale storage. A single squadron conducting daily WAMI missions generates tens of terabytes per day. Over months of operations, the archive grows to hundreds of terabytes or petabytes.
The data is stored on distributed storage systems (often HDFS, the Hadoop Distributed File System, or similar architectures) that allow parallel access by multiple analysts. Indexing is critical: an analyst needs to query "show me the 500 by 500 metre area centred on these coordinates from 14:23 to 14:47 on this date" and receive the relevant video stream within seconds. This requires a spatio-temporal index that maps (x, y, t) coordinates to the corresponding compressed data segments.
Some organisations have built WAMI exploitation tools on top of GIS (Geographic Information System) frameworks, using standard geospatial databases (PostGIS, GeoServer) to manage the metadata and coordinate-based queries. The imagery itself is stored as tiled, pyramidal datasets (similar to how Google Maps stores its satellite imagery at multiple zoom levels), allowing analysts to zoom from city-wide overview to street-level detail without reprocessing.
8. Retroactive Analysis and Pattern of Life
The retroactive analysis capability is what makes WAMI qualitatively different from any other surveillance system. It transforms every mission's recording into a complete, searchable history of all movement within the coverage area.
The IED Scenario
Consider the scenario that drove much of the WAMI development during the wars in Afghanistan and Iraq. An improvised explosive device (IED) detonates at a specific intersection at 11:42 local time, destroying a patrol vehicle. With conventional surveillance, investigators have very little to work with. Witnesses may be unreliable or uncooperative. CCTV cameras, if they exist in the area, cover only their immediate field of view. The device was likely emplaced hours or days before detonation.
With WAMI, an analyst loads the mission recording for that area and that time. The analyst identifies the blast location in the mosaic, then steps backward in time, frame by frame. At some point, typically minutes to hours before the blast, a vehicle stops at the location for a brief period and then drives away. The analyst marks that vehicle and traces it backward: where did it come from? It drove from a compound 4 kilometres away. Who else has visited that compound? The analyst can review the entire recording to build a list of all vehicles that visited the compound in the past 12 hours.
This backward-tracing workflow, which the US military calls "trackback," has been operationally significant. Analysts have used WAMI trackback to identify IED emplacement teams, locate bomb-making facilities, and map the logistics networks of insurgent cells. The process is labour-intensive (manually tracking a vehicle through hours of 2 fps footage is tedious), which has driven significant investment in automated tracking tools.
Pattern of Life Analysis
Beyond single-incident trackback, WAMI enables pattern-of-life (PoL) analysis: the systematic observation of activity at a location over extended periods to establish what is normal and identify deviations.
For a specific compound suspected of being an insurgent meeting point, an analyst can catalogue every vehicle that arrives and departs over days or weeks of WAMI coverage. The catalogue includes arrival times, departure times, the routes each vehicle took, and (if the vehicle visited other areas of interest) its connections to other locations. Over time, this builds a network graph: nodes are locations, edges are vehicle movements between them, and the edge weights are the frequency and timing of those movements.
This is structurally similar to the metadata analysis performed in signals intelligence, where the content of communications may not be available but the pattern of who communicates with whom, when, and for how long reveals the structure of an organisation. WAMI provides the same kind of structural analysis for physical movement.
Pattern-of-life analysis can also detect changes. If a compound that normally receives 2 to 3 vehicles per day suddenly receives 15 vehicles in a single morning, that deviation from the established pattern may indicate a meeting, a preparation for an operation, or a response to some event. Automated anomaly detection algorithms can flag such deviations for analyst review.
Multi-Day Correlation
A single WAMI mission provides 12 to 27 hours of coverage (depending on the platform's endurance). For longer-term pattern-of-life analysis, data from multiple missions must be correlated. This is challenging because the aircraft may not cover exactly the same area on each mission, atmospheric conditions differ, and vehicles change (they get washed, parked in different orientations, or replaced entirely).
Vehicle re-identification across missions relies on appearance features (colour, size, distinctive markings) and behavioural features (parking location, route patterns). Automated systems have shown promising results in research, with re-identification accuracy above 70% for distinctive vehicles, but the problem remains difficult for common vehicles (the thousands of silver or white saloon cars in any European city, for instance).
9. Ground-Based and Commercial WAMI
WAMI is not exclusively a military technology. Several companies have developed or proposed WAMI systems for law enforcement and civilian applications.
Persistent Surveillance Systems (PSS)
Persistent Surveillance Systems, founded by Ross McNutt (a former US Air Force engineer who worked on early WAMI programmes), has offered a system called the Hawkeye series. Mounted on a light aircraft (typically a Cessna 207), the system uses an array of high-resolution cameras to image a coverage area of approximately 60 to 80 square kilometres at a GSD of about 0.3 metres. The company has conducted operational deployments with police departments in several US cities, including Baltimore, Dayton, and (controversially) a trial deployment that operated without public knowledge.
The business model is similar to the military use case: continuous recording of a city area, with trackback analysis conducted after reported crimes. The Baltimore deployment, which ran intermittently between 2016 and 2020, generated significant public debate about the trade-off between crime reduction and mass surveillance.
European Regulatory Context
In Europe, deploying WAMI over civilian areas faces substantial regulatory obstacles. The General Data Protection Regulation (GDPR) classifies location data and movement patterns as personal data, and the systematic, large-scale monitoring of publicly accessible areas triggers a requirement for a Data Protection Impact Assessment (DPIA) under Article 35. The European Court of Human Rights has consistently held, in cases like S. and Marper v. United Kingdom (2008) and Uzun v. Germany (2010), that systematic GPS tracking of individuals constitutes an interference with the right to respect for private life under Article 8 of the European Convention on Human Rights. WAMI, which provides equivalent or superior tracking capability without requiring a physical device on the target, would almost certainly face the same analysis.
The proportionality principle under European law requires that surveillance measures be proportionate to the legitimate aim pursued. Blanket, persistent surveillance of an entire city, recording the movements of every person, would be extremely difficult to justify as proportionate under current European legal frameworks. Even for counter-terrorism purposes, the European Court of Justice's Schrems II and La Quadrature du Net rulings have established that indiscriminate, mass surveillance by state authorities violates EU fundamental rights.
This does not mean WAMI could never be deployed in Europe. Targeted, time-limited deployments for specific serious crimes, with appropriate judicial authorisation and data minimisation measures, might survive legal challenge. But the always-on, city-scale model used by PSS in Baltimore would be extremely unlikely to be permitted under current European law.
London's CCTV Network as a Comparison
London has approximately 942,000 CCTV cameras (a 2022 estimate by Clarion Security Systems), making it one of the most surveilled cities in the world by camera density. But London's cameras are individually pointed and do not provide unified tracking. Following a vehicle across the city requires manually collecting footage from dozens or hundreds of separate camera systems, each with different operators, different formats, different time synchronisation, and different retention policies. The Metropolitan Police can and does perform this kind of multi-camera tracking for serious crimes, but it takes days or weeks of manual work.
WAMI would replace this fragmented coverage with a single, unified, time-synchronised record. The technical capability gap between London's current CCTV network and a WAMI system is enormous. The former provides fragmentary, keyhole views; the latter provides complete, continuous, city-wide coverage. The regulatory and public-acceptance gap is equally enormous.
10. Limitations and Countermeasures
WAMI systems have significant operational limitations that constrain their effectiveness.
Weather
Optical WAMI (visible-spectrum sensors) is defeated by cloud cover. Overcast skies block the view completely. Even partial cloud cover creates gaps in the mosaic where tracking is lost. This is a severe constraint in northern Europe, where persistent cloud cover is common: London averages only about 1,480 hours of sunshine per year (out of a possible 4,383), meaning the sky is cloudy roughly two-thirds of the time.
Infrared sensors (as on Gorgon Stare Increment 2) penetrate thin cloud, mist, and light fog, but thick cloud blocks them as well. Only synthetic aperture radar, which operates at microwave frequencies, can image through all weather conditions, but radar does not provide the video-like imagery that WAMI tracking algorithms depend on.
Night Operations
Visible-spectrum WAMI is useless at night. Thermal IR sensors can operate in darkness, detecting vehicles by their engine heat, exhaust plumes, and tyre friction warmth. Pedestrians are also detectable in thermal imagery, as the human body radiates at roughly 37 degrees Celsius against an ambient background that is typically 10 to 25 degrees Celsius in most operational environments.
The resolution of thermal IR sensors is lower than visible-spectrum sensors because diffraction limits scale with wavelength, and LWIR wavelengths (8 to 12 micrometres) are roughly 20 times longer than visible light wavelengths (0.4 to 0.7 micrometres). For a given aperture diameter, the angular resolution is 20 times coarser. This means that at the same altitude, a thermal WAMI sensor provides substantially coarser GSD than its visible-spectrum counterpart. Gorgon Stare's IR channels have a GSD roughly 3 to 5 times coarser than the EO channels.
Altitude and Resolution Trade-off
Higher altitude means wider coverage but coarser GSD. At 6,100 metres, ARGUS-IS achieves 0.15 metre GSD. At 12,200 metres (a more typical operational altitude for a Reaper in a contested environment), the GSD doubles to 0.30 metres, and the coverage area quadruples. But at 0.30 metre GSD, small vehicles become harder to detect and pedestrians become undetectable. The system designer must choose an altitude that balances coverage area against the minimum resolution required for the detection and tracking tasks.
Urban Canyons
In cities with tall buildings, the oblique viewing angle of a WAMI sensor means that ground areas near tall structures are occluded by the buildings themselves. A nadir-looking sensor (pointing straight down) minimises this effect, but even a nadir view has obliquity at the edges of the coverage area. In a city like Frankfurt or Milano, with clusters of high-rise buildings, significant ground areas may be invisible from the sensor's perspective.
Simple Countermeasures
WAMI tracks objects on the surface. Any countermeasure that breaks the visual continuity of a track defeats the system.
Staying indoors. WAMI cannot see through roofs. A person who enters a building and exits through a different door, possibly in different clothing, breaks the visual track completely. The analyst knows someone entered the building and someone exited it, but without additional intelligence, they cannot confirm it is the same person.
Vehicle switching. A target drives a vehicle into a large parking garage, exits on foot (invisible inside the structure), and departs in a different vehicle. The original vehicle's track ends at the garage; the new vehicle's track begins at the garage. Without facial recognition or another identifying feature (which WAMI's resolution generally cannot support), the connection is lost.
Covered areas. Markets with overhead awnings, covered walkways, dense tree canopy in summer: all create areas where tracking is interrupted. In Mediterranean cities, narrow streets with overhanging buildings create natural concealment corridors.
Timing. Operating during known gaps in WAMI coverage (weather, aircraft transit, maintenance windows) avoids observation entirely. Adversaries who understand the operational pattern of WAMI flights can time their activities accordingly.
These countermeasures are simple but effective. They highlight that WAMI is a powerful but not omniscient system, most effective against targets who are unaware of the surveillance or unable to alter their behaviour patterns.
11. The AI Frontier: Deep Learning Applied to WAMI
The bottleneck in WAMI exploitation has always been the analyst. Automated detection and tracking can identify every mover and maintain track identities, but interpreting what those tracks mean, which movements are significant, which patterns indicate a threat, has traditionally required human judgement. Deep learning is changing this.
Automated Activity Recognition
Early WAMI analysis tools provided basic capabilities: vehicle detection, track generation, and visualisation. The analyst did all the interpretation. Current research and emerging operational systems apply convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to classify not just what an object is (vehicle, pedestrian, motorcycle) but what it is doing (stopping, accelerating, making a U-turn, stopping at the same location as another vehicle).
More advanced systems attempt to recognise higher-level activities. A vehicle that stops at a known cache site, where another vehicle has previously dropped something, then drives to a known meeting location, exhibits a behavioural pattern associated with a specific threat activity. Activity recognition models trained on labelled examples of such patterns can flag them for analyst review.
The architecture typically uses a two-stage approach. First, a detection/tracking stage produces trajectories (sequences of positions and timestamps). Second, a trajectory classification stage processes each trajectory (or group of interacting trajectories) through a recurrent or transformer-based model that classifies the activity. The trajectory representation abstracts away the raw imagery, reducing the input to a time series of positions, velocities, and interaction features (proximity to other tracks, proximity to points of interest).
The Training Data Challenge
Deep learning models require large, labelled training datasets. For WAMI activity recognition, this means thousands of examples of each activity type, labelled in real WAMI footage. Creating these labels is expensive and time-consuming: a human analyst must watch hours of WAMI footage and annotate each relevant event.
The problem is compounded by class imbalance. The overwhelming majority of vehicle movements in a city are routine, law-abiding traffic. The activities of interest (IED emplacement, weapons transfer, surveillance detection routes) are extremely rare. A model trained on data where 99.99% of trajectories are benign may learn to classify everything as benign and still achieve high accuracy. Techniques like synthetic data generation (creating artificial WAMI-like imagery with simulated threat activities), data augmentation (transforming existing examples to create variations), and hard-negative mining (focusing training on difficult examples where the model makes errors) are used to address this, with varying success.
Integration with Other Intelligence Sources
The highest-value application of AI to WAMI is probably not standalone activity recognition but fusion with other intelligence streams. A WAMI track that visits a location flagged by SIGINT as a person of interest's residence, at a time consistent with HUMINT reporting of a planned meeting, triggers a much higher confidence alert than any single source alone.
Multi-source fusion is an active area of research within NATO and EU defence research programmes. The challenge is technical (integrating data streams with different formats, coordinate systems, time references, and uncertainty models) and institutional (intelligence agencies are protective of their sources and methods, and sharing raw data across organisations and nations involves significant security considerations).
Edge Processing and Onboard AI
The most ambitious vision for AI-enabled WAMI places the neural network inference directly on the aircraft, so that activity alerts are generated in real time, during the mission, rather than during post-mission analysis. This requires hardware capable of running inference on gigapixel imagery at frame rate.
Modern AI accelerators (NVIDIA Jetson AGX Orin, Intel Movidius, or custom FPGA implementations) are approaching the required performance-per-watt for airborne deployment. The Jetson AGX Orin, for example, delivers up to 275 TOPS (tera-operations per second) of INT8 inference performance while consuming only 60 watts. Multiple such devices, combined with the FPGAs already present for image stitching, could support real-time activity recognition on a WAMI platform.
The operational advantage of onboard AI is that the aircraft can autonomously alert ground operators to significant activities, rather than requiring an analyst to discover them hours later during post-mission review. For time-sensitive targets, this difference can be decisive.
12. Where WAMI Goes From Here
WAMI technology continues to evolve along several axes.
Higher Resolution, Smaller Platforms
The trend in sensor technology is toward higher pixel counts in smaller, lighter packages. The smartphone camera industry has driven CMOS sensor development to a point where 200-megapixel sensors in a package weighing a few grams are commercially available. Tiling arrays of these sensors could produce ARGUS-IS-class resolution in a payload small enough for a Group 3 or Group 4 UAS (aircraft weighing 25 to 600 kg), rather than requiring a Reaper-class platform.
Smaller platforms enable new operational concepts. Multiple small UAS, each carrying a modest WAMI payload, could cooperate to cover a larger area or provide overlapping coverage for improved occlusion handling. The distributed approach also provides resilience: losing one aircraft reduces coverage but does not eliminate it.
Hyperspectral WAMI
Current WAMI systems capture imagery in broad spectral bands (visible RGB, or broadband thermal IR). Hyperspectral imaging, which captures hundreds of narrow spectral bands, can identify materials based on their spectral signatures. A hyperspectral WAMI system could potentially distinguish vehicle types by their paint composition, detect camouflaged objects, or identify disturbed earth (a potential indicator of a buried IED).
The data rate challenge is severe: adding 100 spectral bands to a 1.8 gigapixel sensor multiplies the data rate by 100. This is not feasible with current airborne storage and processing, but selective hyperspectral imaging (capturing full hyperspectral data only in regions flagged as interesting by the broadband imagery) is an active research area.
SAR-Based WAMI
Synthetic aperture radar can provide all-weather, day/night imaging, but traditional SAR imagery is ill-suited to WAMI because it requires long integration times and produces static images rather than video. Ground Moving Target Indication (GMTI) modes in SAR can detect moving vehicles, but the spatial resolution is coarser than optical WAMI and the update rate is typically slower.
Recent developments in multi-channel SAR (using multiple receiver antennas to form a phase-centre array) enable SAR-GMTI with improved sensitivity and the ability to image both the static scene and the movers simultaneously. This is not yet at the level of optical WAMI for tracking, but it provides a capability in weather conditions where optical systems are blind.
European Programmes
European defence organisations are investing in WAMI-related capabilities. The European Defence Fund has supported projects in persistent surveillance sensor development. Germany's Bundeswehr has evaluated WAMI systems for force protection in overseas deployments. France's DGA (Direction Generale de l'Armement) has funded research into wide-area surveillance sensors through ONERA, the French aerospace research laboratory.
The European Defence Agency (EDA) has identified persistent surveillance as a capability gap that multiple member states need to address. Collaborative development, sharing the cost of sensor development, processing algorithms, and ground exploitation systems across nations, is the preferred approach, though progress has been slower than in national programmes.
WAMI sits at an intersection of several rapidly advancing technologies: high-resolution imaging sensors, embedded GPU and FPGA processing, deep learning for pattern recognition, and large-scale data management. The engineering challenges are formidable but increasingly tractable. The harder questions, who should be allowed to watch an entire city, under what legal authority, with what oversight, and for how long, remain open. The technology does not answer those questions. It only makes them urgent.