Axon API

Open Source

Python 3.10+

Python: File streaming @ 237MB/s on $8/m droplet in 507 lines (stdlib)

Metric	Performance Highlights
Metric	T1	T2	T3	T4	T5	T6	T7	T8
File Count (response)	1	2	1	2	1	2	1	2
File Size (each)	8kb	4kb	64kb	32kb	256kb	128kb	512kb	256kb
Payload Size (total)	8kb	8kb	64kb	64kb	256kb	256kb	512kb	512kb
Burst Transfer (mb/s)	13	8	97	77	225	171	237	184
Sustained Transfer (mb/s)	13	9	97	65	219	159	229	194
Burst Requests (rps)	1653	1021	1548	1223	902	682	473	367
Sustained Requests (rps)	1705	1128	1552	1043	876	635	454	387
Burst Latency (ms)	30	54	32	41	57	79	119	142
Sustained Latency (ms)	31	46	32	53	60	88	121	131

I built an open-source WSGI core that consumes dynamic batch requests via query strings and bundles the specified files into a multipart stream. Axon is synchronous and implemented in 507 lines of Python with zero dependencies. I designed it for the rapid prototyping of experimental applications that require granular control over the request lifecycle. I needed something simple enough to retool quickly but performant enough to scale.

You can validate these capabilities with the included deployment tools. Get a live demo running in under 5 minutes with the Ubuntu deployment script, then start stress testing with the included client script. The tests reveal Axon's architectural characteristics: on single-core hardware, CPU saturation occurs before network limits; multicore deployments may shift bottlenecks to network throughput depending on processing power and bandwidth constraints.

How it works

Multipart Streaming Axon returns multiple files through a single HTTP connection using a Python generator to dispense data with multipart boundaries via stream. This approach maintains constant memory usage while CPU utilization scales with request volume and payload size. 64KB chunks optimize throughput for mixed file sizes. System memory remained under 225MB while CPU utilization was maxed throughout all test scenarios.

Backpressure Handling The iterator works like a revolver - each chunk is 'chambered' on request. The iterator is called by WSGI on client demand, preventing memory buildup. This demand-driven mechanism explains why 16 threads proved optimal - threads only activate when releasing data. Thread count likely scales with network latency to better manage concurrent I/O waits, while processes scale with CPU cores.

Methodology

Performance testing was conducted using identical Digital Ocean droplets ($8/month, 1GB RAM, 1 Intel CPU, 35GB NVMe SSD) deployed in the NYC3 datacenter to minimize network latency between Axon and the stress testing client. Axon used 2 processes and 16 threads per process based on 25 preliminary optimization tests that determined this configuration was likely optimal for the hardware, with 2 processes providing a slight advantage over 1 process while improving error recovery. Chunk size was set to 64KB across Axon, uWSGI, and nginx layers after testing 8KB-64KB chunks.

Each test iteration involved deploying a fresh Axon instance via script, transferring test files with SCP, and executing wrk with 50 concurrent connections for a 1-minute warmup period followed by 10 minutes of sustained testing. Eight test scenarios were designed as four payload-size pairs (8KB, 64KB, 256KB, 512KB) with each pair testing single-file versus multi-file handling to isolate file processing overhead from payload size effects. The testing droplet was destroyed after each iteration to ensure consistent baseline conditions.

Process

I deployed a new droplet
I ran deploy-axon.sh with:
chmod +x deploy-axon.sh && ./deploy-axon.sh
I sent the required test file to the server with:
scp ./[FILE_NAME].[EXTENSION] root@[DROPLET_IP]:/var/www/axon-api/examples/
I checked the endpoint with:
curl http://[DROPLET_IP]/api/health
I ran the 1-minute burst test with:
wrk -t10 -c50 -d60s --latency "http://[DROPLET_IP]/api/stream-files?file1=examples/[FILE_NAME].[EXTENSION]&file2=examples/[FILE_NAME].[EXTENSION]"
I ran the 10-minute sustained test with:
wrk -t10 -c50 -d600s --latency "http://[DROPLET_IP]/api/stream-files?file1=examples/[FILE_NAME].[EXTENSION]&file2=examples/[FILE_NAME].[EXTENSION]"
I recorded my results and destroyed the droplet

Performance Results

Computational overhead from multipart boundary generation and header construction constrains throughput rates. Generator-based streaming maintains bounded memory under load achieving zero socket errors across all test scenarios.

Baseline tests measured the framework's maximum throughput (on this hardware) using two scenarios: a health endpoint returning JSON and invalid file requests triggering error responses.

Baseline

Metric	Baseline
	Health Check	Health Check	Error Response	Error Response
	1-minute	10-minute	1-minute	10-minute
Total Requests	165619	1390555	195562	2028471
Data Transfer	34.75MB	497.27MB	38.42MB	398.5MB
Requests/Second	2758.42	2317.27	3256.79	3380.24
Transfer/Second	592.62KB	497.84KB	655.16KB	679.99KB

50 concurrent / 1-minute burst

Metric	Results
Metric	T1	T2	T3	T4	T5	T6	T7	T8
Multipart Contents
File Count (response)	1	2	1	2	1	2	1	2
File Size (each)	8kb	4kb	64kb	32kb	256kb	128kb	512kb	256kb
Payload Size (total)	8kb	8kb	64kb	64kb	256kb	256kb	512kb	512kb
Performance Results
Total Requests	99316	61360	92958	73499	54195	40987	28460	22058
Data Transfer	792.39MB	512.01MB	5.69GB	4.53GB	13.24GB	10.04GB	13.91GB	10.8GB
RPS	1653.47	1021.44	1548.03	1223.83	902.45	682.1	473.75	367.04
Transfer (mb/s)	13.19	8.52	97.01	77.21	225.78	171.08	237.07	184.03
Socket Errors
Connect Errors	0	0	0	0	0	0	0	0
Read Errors	0	0	0	0	0	0	0	0
Write Errors	0	0	0	0	0	0	0	0
Timeout Errors	0	0	0	0	0	0	0	0
Latency Statistics (ms)
Average	30.79	54.08	32.69	41.69	57.92	79.9	119.17	142.89
Std Deviation	14.89	48.83	14.29	22.21	33.19	57.45	94.76	80.23
Maximum	169.96	1120	148.16	324.76	639.48	1010	1070	1100
Std Dev %	76.19	68.72	75.41	66.37	86.53	91.18	82.56	88.45
Latency Distribution (ms)
50th Percentile	27.08	45.29	29.1	38.62	50.4	64.79	87.05	124.41
75th Percentile	38.49	63.93	40	54.1	67.67	87.78	145.97	161.34
90th Percentile	50.85	87.46	51.86	71.28	90.95	127.61	255.88	214.07
99th Percentile	78.91	253.2	79.49	108.94	200.85	336.08	471.63	506.93
RPS Variance
RPS Average	166.01	103.12	155.36	122.85	90.5	69.3	47.57	37.64
RPS Std Dev	36.38	31.82	23.57	24.57	21.06	23.5	16.84	14.24
RPS Maximum	272	210	230	212	170	131	121	100
RPS Std Dev %	70.53	68.72	61.43	69.27	76.8	74.72	63.41	72.16

50 concurrent / 10-minutes sustained

Metric	Results
Metric	T1	T2	T3	T4	T5	T6	T7	T8
Multipart Contents
File Count (response)	1	2	1	2	1	2	1	2
File Size (each)	8kb	4kb	64kb	32kb	256kb	128kb	512kb	256kb
Payload Size (total)	8kb	8kb	64kb	64kb	256kb	256kb	512kb	512kb
Performance Results
Total Requests	1023241	677349	931670	626236	526222	381283	275128	232499
Data Transfer	7.97GB	5.52GB	57.02GB	38.58GB	128.56GB	93.39GB	134.39GB	113.77GB
RPS	1705.12	1128.74	1552.6	1043.55	876.96	635.8	458.48	387.44
Transfer (mb/s)	13.69	9.42	97.2	65.82	219.39	159.36	229.33	194.14
Socket Errors
Connect Errors	0	0	0	0	0	0	0	0
Read Errors	0	0	0	0	0	0	0	0
Write Errors	0	0	0	0	0	0	0	0
Timeout Errors	0	0	0	0	0	0	0	0
Latency Statistics (ms)
Average	31.23	46.27	32.72	53.97	60.54	88.11	121.43	131.43
Std Deviation	26.33	29.05	14.94	50.08	39.15	69.9	91.8	56
Maximum	1390	775.33	290.11	1230	935.2	1300	1040	1190
Std Dev %	93.89	78.02	77.35	93.31	90.25	91.35	81.43	82.68
Latency Distribution (ms)
50th Percentile	26.06	41.47	28.92	44.25	51.49	68.62	93.47	121.47
75th Percentile	37.26	58.19	40.04	63.76	69.57	95.25	146.68	152.3
90th Percentile	50.35	78	52.46	89.96	95.29	145.86	251.94	188.73
99th Percentile	96.22	140.77	82.11	271.08	230.3	399.27	467.44	334.38
RPS Variance
RPS Average	171.35	113.34	155.84	105.31	88.31	64.83	46.15	39.07
RPS Std Dev	40.73	32.41	26.18	34.48	22.32	23.97	16.93	12.76
RPS Maximum	414	292	252	210	191	222	150	101
RPS Std Dev %	72.55	64.59	76.67	64.85	71.76	61.6	76.81	78.07