[2026-04-08 08:53:23.205426 INFO duck_llm] 这是一条信息日志 [2026-04-08 08:53:23.205455 WARN duck_llm] 这是一条警告日志 [2026-04-08 08:53:23.205458 ERROR duck_llm] 这是一条错误日志 [2026-04-08 08:53:23.205660 INFO utils] Selected DPDK lcores: master=0, workers=[2, 4, 6, 8], all_performance_core_representatives=[0, 2, 4, 6, 8, 10, 12, 14] EAL: Detected CPU lcores: 32 EAL: Detected NUMA nodes: 1 EAL: Detected shared linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: VFIO support initialized EAL: Using IOMMU type 1 (Type 1) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) [2026-04-08 08:53:25.251289 INFO dpdk_workers] DPDK initialized successfully. Found 4 ports. [2026-04-08 08:53:25.251305 INFO dpdk_workers] Port 0 device name: 0000:01:00.0 [2026-04-08 08:53:25.251308 INFO dpdk_workers] Port 0 IP address: 10.21.1.1 [2026-04-08 08:53:25.251310 INFO dpdk_workers] Port 0 Broadcast address: 10.21.1.255 [2026-04-08 08:53:25.251312 INFO dpdk_workers] Port 1 device name: 0000:01:00.1 [2026-04-08 08:53:25.251313 INFO dpdk_workers] Port 1 IP address: 10.21.2.1 [2026-04-08 08:53:25.251315 INFO dpdk_workers] Port 1 Broadcast address: 10.21.2.255 [2026-04-08 08:53:25.251317 INFO dpdk_workers] Port 2 device name: 0000:01:00.2 [2026-04-08 08:53:25.251318 INFO dpdk_workers] Port 2 IP address: 10.21.3.1 [2026-04-08 08:53:25.251320 INFO dpdk_workers] Port 2 Broadcast address: 10.21.3.255 [2026-04-08 08:53:25.251321 INFO dpdk_workers] Port 3 device name: 0000:01:00.3 [2026-04-08 08:53:25.251323 INFO dpdk_workers] Port 3 IP address: 10.21.4.1 [2026-04-08 08:53:25.251324 INFO dpdk_workers] Port 3 Broadcast address: 10.21.4.255 [2026-04-08 08:53:25.251326 INFO dpdk_workers] Available netifs list: [(10.21.1.255, 0, 10.21.1.1), (10.21.2.255, 1, 10.21.2.1), (10.21.3.255, 2, 10.21.3.1), (10.21.4.255, 3, 10.21.4.1)] [2026-04-08 08:53:25.251334 INFO dpdk_workers] Starting worker #0: (bcast_ip: 10.21.1.255, port_id: 0, lcore_id: 2, host_ip: 10.21.1.1) [2026-04-08 08:53:25.251374 INFO dpdk_workers] Initializing worker port 0 on lcore 2... [2026-04-08 08:53:25.252781 INFO dpdk_workers] Starting worker #1: (bcast_ip: 10.21.2.255, port_id: 1, lcore_id: 4, host_ip: 10.21.2.1) [2026-04-08 08:53:25.252808 INFO dpdk_workers] Starting worker #2: (bcast_ip: 10.21.3.255, port_id: 2, lcore_id: 6, host_ip: 10.21.3.1) [2026-04-08 08:53:25.252818 INFO dpdk_workers] Starting worker #3: (bcast_ip: 10.21.4.255, port_id: 3, lcore_id: 8, host_ip: 10.21.4.1) [2026-04-08 08:53:25.252846 INFO dpdk_workers] Initializing worker port 1 on lcore 4... [2026-04-08 08:53:25.255797 INFO dpdk_workers] Initializing worker port 3 on lcore 8... [2026-04-08 08:53:25.258793 INFO dpdk_workers] Initializing worker port 2 on lcore 6... ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 0). ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 1). ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 2). ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 3). [2026-04-08 08:53:28.453699 INFO dpdk_workers] Worker port 0 initialized successfully. [2026-04-08 08:53:29.267625 INFO dpdk_workers] Worker port 1 initialized successfully. [2026-04-08 08:53:29.268532 INFO dpdk_workers] Worker port 2 initialized successfully. [2026-04-08 08:53:30.138184 INFO dpdk_workers] Worker port 3 initialized successfully. [2026-04-08 08:53:30.138204 INFO dpdk_workers] Workers initialized successfully. 4 workers running. [2026-04-08 08:53:30.138461 INFO utils] Binding master thread to cores (excluding workers): [0, 1, 3, 5, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31] [2026-04-08 08:53:30.138470 INFO utils] set_thread_affinity(tid 1369552, cores [0, 1, 3, 5, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]): 0 [2026-04-08 08:53:30.139257 INFO dpdk_workers] Run command Ping all time: send 1.1 us, recv 779.0 us [2026-04-08 08:53:30.189314 INFO dpdk_workers] Run command Ping all time: send 0.3 us, recv 0.4 us [2026-04-08 08:53:30.239371 INFO dpdk_workers] Run command Ping all time: send 0.3 us, recv 0.3 us [2026-04-08 08:53:30.289427 INFO dpdk_workers] Run command Ping all time: send 0.2 us, recv 0.4 us [2026-04-08 08:53:30.339483 INFO dpdk_workers] Run command Ping all time: send 0.3 us, recv 0.5 us [2026-04-08 08:53:30.389539 INFO dpdk_workers] Run command Ping all time: send 0.3 us, recv 0.3 us [2026-04-08 08:53:30.439594 INFO dpdk_workers] Run command Ping all time: send 0.3 us, recv 0.4 us [2026-04-08 08:53:30.489659 INFO dpdk_workers] Run command Ping all time: send 1.2 us, recv 1.1 us [2026-04-08 08:53:30.539736 INFO dpdk_workers] Run command Ping all time: send 1.1 us, recv 1.1 us [2026-04-08 08:53:30.589790 INFO dpdk_workers] Run command Ping all time: send 1.0 us, recv 1.1 us [2026-04-08 08:53:30.639873 INFO dpdk_workers] Found 32 ducks in duck-ips-multi-netifs.txt [2026-04-08 08:53:30.639877 INFO dpdk_workers] Duck #0: 10.21.1.101 (bcast_ip: 10.21.1.255) [2026-04-08 08:53:30.639879 INFO dpdk_workers] Duck #1: 10.21.1.102 (bcast_ip: 10.21.1.255) [2026-04-08 08:53:30.639881 INFO dpdk_workers] Duck #2: 10.21.1.103 (bcast_ip: 10.21.1.255) [2026-04-08 08:53:30.639883 INFO dpdk_workers] Duck #3: 10.21.1.104 (bcast_ip: 10.21.1.255) [2026-04-08 08:53:30.639885 INFO dpdk_workers] Duck #4: 10.21.1.105 (bcast_ip: 10.21.1.255) [2026-04-08 08:53:30.639887 INFO dpdk_workers] Duck #5: 10.21.1.106 (bcast_ip: 10.21.1.255) [2026-04-08 08:53:30.639889 INFO dpdk_workers] Duck #6: 10.21.1.107 (bcast_ip: 10.21.1.255) [2026-04-08 08:53:30.639891 INFO dpdk_workers] Duck #7: 10.21.1.108 (bcast_ip: 10.21.1.255) [2026-04-08 08:53:30.639893 INFO dpdk_workers] Duck #8: 10.21.2.101 (bcast_ip: 10.21.2.255) [2026-04-08 08:53:30.639894 INFO dpdk_workers] Duck #9: 10.21.2.102 (bcast_ip: 10.21.2.255) [2026-04-08 08:53:30.639896 INFO dpdk_workers] Duck #10: 10.21.2.103 (bcast_ip: 10.21.2.255) [2026-04-08 08:53:30.639898 INFO dpdk_workers] Duck #11: 10.21.2.104 (bcast_ip: 10.21.2.255) [2026-04-08 08:53:30.639900 INFO dpdk_workers] Duck #12: 10.21.2.105 (bcast_ip: 10.21.2.255) [2026-04-08 08:53:30.639902 INFO dpdk_workers] Duck #13: 10.21.2.106 (bcast_ip: 10.21.2.255) [2026-04-08 08:53:30.639904 INFO dpdk_workers] Duck #14: 10.21.2.107 (bcast_ip: 10.21.2.255) [2026-04-08 08:53:30.639905 INFO dpdk_workers] Duck #15: 10.21.2.108 (bcast_ip: 10.21.2.255) [2026-04-08 08:53:30.639907 INFO dpdk_workers] Duck #16: 10.21.3.101 (bcast_ip: 10.21.3.255) [2026-04-08 08:53:30.639909 INFO dpdk_workers] Duck #17: 10.21.3.102 (bcast_ip: 10.21.3.255) [2026-04-08 08:53:30.639911 INFO dpdk_workers] Duck #18: 10.21.3.103 (bcast_ip: 10.21.3.255) [2026-04-08 08:53:30.639913 INFO dpdk_workers] Duck #19: 10.21.3.104 (bcast_ip: 10.21.3.255) [2026-04-08 08:53:30.639914 INFO dpdk_workers] Duck #20: 10.21.3.105 (bcast_ip: 10.21.3.255) [2026-04-08 08:53:30.639916 INFO dpdk_workers] Duck #21: 10.21.3.106 (bcast_ip: 10.21.3.255) [2026-04-08 08:53:30.639918 INFO dpdk_workers] Duck #22: 10.21.3.107 (bcast_ip: 10.21.3.255) [2026-04-08 08:53:30.639920 INFO dpdk_workers] Duck #23: 10.21.3.108 (bcast_ip: 10.21.3.255) [2026-04-08 08:53:30.639922 INFO dpdk_workers] Duck #24: 10.21.4.101 (bcast_ip: 10.21.4.255) [2026-04-08 08:53:30.639923 INFO dpdk_workers] Duck #25: 10.21.4.102 (bcast_ip: 10.21.4.255) [2026-04-08 08:53:30.639925 INFO dpdk_workers] Duck #26: 10.21.4.103 (bcast_ip: 10.21.4.255) [2026-04-08 08:53:30.639927 INFO dpdk_workers] Duck #27: 10.21.4.104 (bcast_ip: 10.21.4.255) [2026-04-08 08:53:30.639929 INFO dpdk_workers] Duck #28: 10.21.4.105 (bcast_ip: 10.21.4.255) [2026-04-08 08:53:30.639931 INFO dpdk_workers] Duck #29: 10.21.4.106 (bcast_ip: 10.21.4.255) [2026-04-08 08:53:30.639932 INFO dpdk_workers] Duck #30: 10.21.4.107 (bcast_ip: 10.21.4.255) [2026-04-08 08:53:30.639936 INFO dpdk_workers] Duck #31: 10.21.4.108 (bcast_ip: 10.21.4.255) [2026-04-08 08:53:30.641655 INFO dpdk_workers] [Worker 0]: 10.21.1.101 [2026-04-08 08:53:30.641669 INFO dpdk_workers] [Worker 0]: 10.21.1.102 [2026-04-08 08:53:30.641680 INFO dpdk_workers] [Worker 0]: 10.21.1.103 [2026-04-08 08:53:30.641690 INFO dpdk_workers] [Worker 0]: 10.21.1.104 [2026-04-08 08:53:30.641698 INFO dpdk_workers] [Worker 0]: 10.21.1.105 [2026-04-08 08:53:30.641699 INFO dpdk_workers] [Worker 0]: 10.21.1.106 [2026-04-08 08:53:30.641700 INFO dpdk_workers] [Worker 0]: 10.21.1.107 [2026-04-08 08:53:30.641702 INFO dpdk_workers] [Worker 0]: 10.21.1.108 [2026-04-08 08:53:30.641985 INFO dpdk_workers] [Worker 1]: 10.21.2.101 [2026-04-08 08:53:30.641988 INFO dpdk_workers] [Worker 1]: 10.21.2.102 [2026-04-08 08:53:30.641989 INFO dpdk_workers] [Worker 1]: 10.21.2.103 [2026-04-08 08:53:30.641991 INFO dpdk_workers] [Worker 1]: 10.21.2.104 [2026-04-08 08:53:30.641992 INFO dpdk_workers] [Worker 1]: 10.21.2.105 [2026-04-08 08:53:30.641994 INFO dpdk_workers] [Worker 1]: 10.21.2.106 [2026-04-08 08:53:30.641995 INFO dpdk_workers] [Worker 1]: 10.21.2.107 [2026-04-08 08:53:30.641997 INFO dpdk_workers] [Worker 1]: 10.21.2.108 [2026-04-08 08:53:30.641999 INFO dpdk_workers] [Worker 2]: 10.21.3.101 [2026-04-08 08:53:30.642001 INFO dpdk_workers] [Worker 2]: 10.21.3.102 [2026-04-08 08:53:30.642003 INFO dpdk_workers] [Worker 2]: 10.21.3.103 [2026-04-08 08:53:30.642004 INFO dpdk_workers] [Worker 2]: 10.21.3.104 [2026-04-08 08:53:30.642006 INFO dpdk_workers] [Worker 2]: 10.21.3.105 [2026-04-08 08:53:30.642007 INFO dpdk_workers] [Worker 2]: 10.21.3.106 [2026-04-08 08:53:30.642009 INFO dpdk_workers] [Worker 2]: 10.21.3.107 [2026-04-08 08:53:30.642010 INFO dpdk_workers] [Worker 2]: 10.21.3.108 [2026-04-08 08:53:30.741939 INFO dpdk_workers] [Worker 3]: 10.21.4.101 [2026-04-08 08:53:30.741942 INFO dpdk_workers] [Worker 3]: 10.21.4.102 [2026-04-08 08:53:30.741943 INFO dpdk_workers] [Worker 3]: 10.21.4.103 [2026-04-08 08:53:30.741945 INFO dpdk_workers] [Worker 3]: 10.21.4.104 [2026-04-08 08:53:30.741946 INFO dpdk_workers] [Worker 3]: 10.21.4.105 [2026-04-08 08:53:30.741948 INFO dpdk_workers] [Worker 3]: 10.21.4.106 [2026-04-08 08:53:30.741949 INFO dpdk_workers] [Worker 3]: 10.21.4.107 [2026-04-08 08:53:30.741951 INFO dpdk_workers] [Worker 3]: 10.21.4.108 [2026-04-08 08:53:30.741954 INFO dpdk_workers] init_ducks done [2026-04-08 08:53:30.748663 INFO dpdk_ducks] Initialized 4 DPDK duck workers [2026-04-08 08:53:30.748678 INFO dpdk_ducks] DPDK duck worker 0: DpdkDuckWorker { worker_idx: 0, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (0, 8) } [2026-04-08 08:53:30.748686 INFO dpdk_ducks] DPDK duck worker 1: DpdkDuckWorker { worker_idx: 1, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (8, 16) } [2026-04-08 08:53:30.748689 INFO dpdk_ducks] DPDK duck worker 2: DpdkDuckWorker { worker_idx: 2, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (16, 24) } [2026-04-08 08:53:30.748691 INFO dpdk_ducks] DPDK duck worker 3: DpdkDuckWorker { worker_idx: 3, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (24, 32) } [2026-04-08 08:53:30.748695 INFO buffer_manager] Initializing buffer manager [2026-04-08 08:53:30.748698 INFO buffer_manager] Buffer manager initialized: ELF BufferAllocator { begin: 0, end: 10485760, current: 0 }, input BufferAllocator { begin: 10485760, end: 104857600, current: 10485760 }, weights BufferAllocator { begin: 104923136, end: 32212254720, current: 104923136 } [2026-04-08 08:53:30.748701 INFO fp8_dpdk_common] fp9 persistent judge enabled by default; set DUCK_FP9_PERSISTENT_JUDGE=0 to disable [2026-04-08 08:53:30.749115 INFO buffer_manager] Added kernel fp9_kernels at (0, 91664) [2026-04-08 08:53:30.749150 INFO fp8_dpdk_common] fp9 persistent judge: opened 32 sessions [2026-04-08 08:53:30.749153 INFO fp8_dpdk_common] fp9 persistent judge: force-opened 32 fresh sessions for new init [2026-04-08 08:53:30.749154 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init(tp_size=32) [2026-04-08 08:53:30.749156 INFO fp8_moe_dpdk] fp8_moe_dpdk: init(tp_size=32) [2026-04-08 08:53:31.139281 INFO weight_cache] weight_cache: header hit tp_size=32 num_slots=62 finished_slots=62 [2026-04-08 08:53:31.466313 INFO buffer_manager] Allocated weights buffer at (104923136, 0) [2026-04-08 08:53:31.466332 INFO buffer_manager] Allocated weights buffer at (104923136, 4128768) [2026-04-08 08:53:31.466334 INFO buffer_manager] Allocated weights buffer at (109051904, 516096) [2026-04-08 08:53:31.466336 INFO buffer_manager] Allocated weights buffer at (109568000, 2016) [2026-04-08 08:53:31.466337 INFO buffer_manager] Allocated weights buffer at (109572096, 4128768) [2026-04-08 08:53:31.466339 INFO buffer_manager] Allocated weights buffer at (113700864, 516096) [2026-04-08 08:53:31.466340 INFO buffer_manager] Allocated weights buffer at (114216960, 2016) [2026-04-08 08:53:31.466342 INFO buffer_manager] Allocated weights buffer at (114221056, 4128768) [2026-04-08 08:53:31.466343 INFO buffer_manager] Allocated weights buffer at (118349824, 516096) [2026-04-08 08:53:31.466345 INFO buffer_manager] Allocated weights buffer at (118865920, 2016) [2026-04-08 08:53:31.466346 INFO buffer_manager] Allocated weights buffer at (118870016, 0) [2026-04-08 08:53:31.466348 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init_layer_cached(layer_idx=0, cache_slot=0) planned desc only [2026-04-08 08:53:31.559407 INFO buffer_manager] Allocated weights buffer at (118870016, 0) [2026-04-08 08:53:31.559434 INFO buffer_manager] Allocated weights buffer at (118870016, 4128768) [2026-04-08 08:53:31.559436 INFO buffer_manager] Allocated weights buffer at (122998784, 516096) [2026-04-08 08:53:31.559437 INFO buffer_manager] Allocated weights buffer at (123514880, 2016) [2026-04-08 08:53:31.559439 INFO buffer_manager] Allocated weights buffer at (123518976, 4128768) [2026-04-08 08:53:31.559440 INFO buffer_manager] Allocated weights buffer at (127647744, 516096) [2026-04-08 08:53:31.559442 INFO buffer_manager] Allocated weights buffer at (128163840, 2016) [2026-04-08 08:53:31.559443 INFO buffer_manager] Allocated weights buffer at (128167936, 4128768) [2026-04-08 08:53:31.559445 INFO buffer_manager] Allocated weights buffer at (132296704, 516096) [2026-04-08 08:53:31.559446 INFO buffer_manager] Allocated weights buffer at (132812800, 2016) [2026-04-08 08:53:31.559448 INFO buffer_manager] Allocated weights buffer at (132816896, 0) [2026-04-08 08:53:31.559449 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init_layer_cached(layer_idx=1, cache_slot=1) planned desc only [2026-04-08 08:53:31.645953 INFO buffer_manager] Allocated weights buffer at (132816896, 0) [2026-04-08 08:53:31.645970 INFO buffer_manager] Allocated weights buffer at (132816896, 4128768) [2026-04-08 08:53:31.645973 INFO buffer_manager] Allocated weights buffer at (136945664, 516096) [2026-04-08 08:53:31.645974 INFO buffer_manager] Allocated weights buffer at (137461760, 2016) [2026-04-08 08:53:31.645980 INFO buffer_manager] Allocated weights buffer at (137465856, 4128768) [2026-04-08 08:53:31.645982 INFO buffer_manager] Allocated weights buffer at (141594624, 516096) [2026-04-08 08:53:31.645983 INFO buffer_manager] Allocated weights buffer at (142110720, 2016) [2026-04-08 08:53:31.645985 INFO buffer_manager] Allocated weights buffer at (142114816, 4128768) [2026-04-08 08:53:31.645986 INFO buffer_manager] Allocated weights buffer at (146243584, 516096) [2026-04-08 08:53:31.645988 INFO buffer_manager] Allocated weights buffer at (146759680, 2016) [2026-04-08 08:53:31.645989 INFO buffer_manager] Allocated weights buffer at (146763776, 0) [2026-04-08 08:53:31.645991 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init_layer_cached(layer_idx=2, cache_slot=2) planned desc only [2026-04-08 08:53:31.674521 INFO buffer_manager] Allocated weights buffer at (146763776, 0) [2026-04-08 08:53:31.674535 INFO buffer_manager] Allocated weights buffer at (146763776, 132120576) [2026-04-08 08:53:31.674537 INFO buffer_manager] Allocated weights buffer at (278884352, 57344) [2026-04-08 08:53:31.674539 INFO buffer_manager] Allocated weights buffer at (278941696, 132120576) [2026-04-08 08:53:31.674540 INFO buffer_manager] Allocated weights buffer at (411062272, 57344) [2026-04-08 08:53:31.674542 INFO buffer_manager] Allocated weights buffer at (411119616, 132120576) [2026-04-08 08:53:31.674543 INFO buffer_manager] Allocated weights buffer at (543240192, 57344) [2026-04-08 08:53:31.674545 INFO buffer_manager] Allocated weights buffer at (543297536, 0) [2026-04-08 08:53:31.674546 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=3, cache_slot=3) planned desc only [2026-04-08 08:53:31.710897 INFO buffer_manager] Allocated weights buffer at (543297536, 0) [2026-04-08 08:53:31.710910 INFO buffer_manager] Allocated weights buffer at (543297536, 132120576) [2026-04-08 08:53:31.710912 INFO buffer_manager] Allocated weights buffer at (675418112, 57344) [2026-04-08 08:53:31.710914 INFO buffer_manager] Allocated weights buffer at (675475456, 132120576) [2026-04-08 08:53:31.710915 INFO buffer_manager] Allocated weights buffer at (807596032, 57344) [2026-04-08 08:53:31.710917 INFO buffer_manager] Allocated weights buffer at (807653376, 132120576) [2026-04-08 08:53:31.710918 INFO buffer_manager] Allocated weights buffer at (939773952, 57344) [2026-04-08 08:53:31.710920 INFO buffer_manager] Allocated weights buffer at (939831296, 0) [2026-04-08 08:53:31.710921 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=4, cache_slot=4) planned desc only [2026-04-08 08:53:31.747141 INFO buffer_manager] Allocated weights buffer at (939831296, 0) [2026-04-08 08:53:31.747155 INFO buffer_manager] Allocated weights buffer at (939831296, 132120576) [2026-04-08 08:53:31.747157 INFO buffer_manager] Allocated weights buffer at (1071951872, 57344) [2026-04-08 08:53:31.747159 INFO buffer_manager] Allocated weights buffer at (1072009216, 132120576) [2026-04-08 08:53:31.747160 INFO buffer_manager] Allocated weights buffer at (1204129792, 57344) [2026-04-08 08:53:31.747162 INFO buffer_manager] Allocated weights buffer at (1204187136, 132120576) [2026-04-08 08:53:31.747163 INFO buffer_manager] Allocated weights buffer at (1336307712, 57344) [2026-04-08 08:53:31.747165 INFO buffer_manager] Allocated weights buffer at (1336365056, 0) [2026-04-08 08:53:31.747166 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=5, cache_slot=5) planned desc only [2026-04-08 08:53:31.783400 INFO buffer_manager] Allocated weights buffer at (1336365056, 0) [2026-04-08 08:53:31.783413 INFO buffer_manager] Allocated weights buffer at (1336365056, 132120576) [2026-04-08 08:53:31.783415 INFO buffer_manager] Allocated weights buffer at (1468485632, 57344) [2026-04-08 08:53:31.783417 INFO buffer_manager] Allocated weights buffer at (1468542976, 132120576) [2026-04-08 08:53:31.783418 INFO buffer_manager] Allocated weights buffer at (1600663552, 57344) [2026-04-08 08:53:31.783420 INFO buffer_manager] Allocated weights buffer at (1600720896, 132120576) [2026-04-08 08:53:31.783425 INFO buffer_manager] Allocated weights buffer at (1732841472, 57344) [2026-04-08 08:53:31.783426 INFO buffer_manager] Allocated weights buffer at (1732898816, 0) [2026-04-08 08:53:31.783428 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=6, cache_slot=6) planned desc only [2026-04-08 08:53:31.819797 INFO buffer_manager] Allocated weights buffer at (1732898816, 0) [2026-04-08 08:53:31.819811 INFO buffer_manager] Allocated weights buffer at (1732898816, 132120576) [2026-04-08 08:53:31.819813 INFO buffer_manager] Allocated weights buffer at (1865019392, 57344) [2026-04-08 08:53:31.819815 INFO buffer_manager] Allocated weights buffer at (1865076736, 132120576) [2026-04-08 08:53:31.819816 INFO buffer_manager] Allocated weights buffer at (1997197312, 57344) [2026-04-08 08:53:31.819818 INFO buffer_manager] Allocated weights buffer at (1997254656, 132120576) [2026-04-08 08:53:31.819819 INFO buffer_manager] Allocated weights buffer at (2129375232, 57344) [2026-04-08 08:53:31.819821 INFO buffer_manager] Allocated weights buffer at (2129432576, 0) [2026-04-08 08:53:31.819822 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=7, cache_slot=7) planned desc only [2026-04-08 08:53:31.856175 INFO buffer_manager] Allocated weights buffer at (2129432576, 0) [2026-04-08 08:53:31.856190 INFO buffer_manager] Allocated weights buffer at (2129432576, 132120576) [2026-04-08 08:53:31.856192 INFO buffer_manager] Allocated weights buffer at (2261553152, 57344) [2026-04-08 08:53:31.856194 INFO buffer_manager] Allocated weights buffer at (2261610496, 132120576) [2026-04-08 08:53:31.856195 INFO buffer_manager] Allocated weights buffer at (2393731072, 57344) [2026-04-08 08:53:31.856197 INFO buffer_manager] Allocated weights buffer at (2393788416, 132120576) [2026-04-08 08:53:31.856198 INFO buffer_manager] Allocated weights buffer at (2525908992, 57344) [2026-04-08 08:53:31.856200 INFO buffer_manager] Allocated weights buffer at (2525966336, 0) [2026-04-08 08:53:31.856201 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=8, cache_slot=8) planned desc only [2026-04-08 08:53:31.892527 INFO buffer_manager] Allocated weights buffer at (2525966336, 0) [2026-04-08 08:53:31.892541 INFO buffer_manager] Allocated weights buffer at (2525966336, 132120576) [2026-04-08 08:53:31.892543 INFO buffer_manager] Allocated weights buffer at (2658086912, 57344) [2026-04-08 08:53:31.892545 INFO buffer_manager] Allocated weights buffer at (2658144256, 132120576) [2026-04-08 08:53:31.892546 INFO buffer_manager] Allocated weights buffer at (2790264832, 57344) [2026-04-08 08:53:31.892548 INFO buffer_manager] Allocated weights buffer at (2790322176, 132120576) [2026-04-08 08:53:31.892549 INFO buffer_manager] Allocated weights buffer at (2922442752, 57344) [2026-04-08 08:53:31.892551 INFO buffer_manager] Allocated weights buffer at (2922500096, 0) [2026-04-08 08:53:31.892552 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=9, cache_slot=9) planned desc only [2026-04-08 08:53:31.928925 INFO buffer_manager] Allocated weights buffer at (2922500096, 0) [2026-04-08 08:53:31.928939 INFO buffer_manager] Allocated weights buffer at (2922500096, 132120576) [2026-04-08 08:53:31.928941 INFO buffer_manager] Allocated weights buffer at (3054620672, 57344) [2026-04-08 08:53:31.928943 INFO buffer_manager] Allocated weights buffer at (3054678016, 132120576) [2026-04-08 08:53:31.928944 INFO buffer_manager] Allocated weights buffer at (3186798592, 57344) [2026-04-08 08:53:31.928946 INFO buffer_manager] Allocated weights buffer at (3186855936, 132120576) [2026-04-08 08:53:31.928947 INFO buffer_manager] Allocated weights buffer at (3318976512, 57344) [2026-04-08 08:53:31.928949 INFO buffer_manager] Allocated weights buffer at (3319033856, 0) [2026-04-08 08:53:31.928951 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=10, cache_slot=10) planned desc only [2026-04-08 08:53:31.965369 INFO buffer_manager] Allocated weights buffer at (3319033856, 0) [2026-04-08 08:53:31.965385 INFO buffer_manager] Allocated weights buffer at (3319033856, 132120576) [2026-04-08 08:53:31.965390 INFO buffer_manager] Allocated weights buffer at (3451154432, 57344) [2026-04-08 08:53:31.965391 INFO buffer_manager] Allocated weights buffer at (3451211776, 132120576) [2026-04-08 08:53:31.965393 INFO buffer_manager] Allocated weights buffer at (3583332352, 57344) [2026-04-08 08:53:31.965394 INFO buffer_manager] Allocated weights buffer at (3583389696, 132120576) [2026-04-08 08:53:31.965396 INFO buffer_manager] Allocated weights buffer at (3715510272, 57344) [2026-04-08 08:53:31.965397 INFO buffer_manager] Allocated weights buffer at (3715567616, 0) [2026-04-08 08:53:31.965399 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=11, cache_slot=11) planned desc only [2026-04-08 08:53:32.001593 INFO buffer_manager] Allocated weights buffer at (3715567616, 0) [2026-04-08 08:53:32.001608 INFO buffer_manager] Allocated weights buffer at (3715567616, 132120576) [2026-04-08 08:53:32.001610 INFO buffer_manager] Allocated weights buffer at (3847688192, 57344) [2026-04-08 08:53:32.001612 INFO buffer_manager] Allocated weights buffer at (3847745536, 132120576) [2026-04-08 08:53:32.001613 INFO buffer_manager] Allocated weights buffer at (3979866112, 57344) [2026-04-08 08:53:32.001615 INFO buffer_manager] Allocated weights buffer at (3979923456, 132120576) [2026-04-08 08:53:32.001616 INFO buffer_manager] Allocated weights buffer at (4112044032, 57344) [2026-04-08 08:53:32.001618 INFO buffer_manager] Allocated weights buffer at (4112101376, 0) [2026-04-08 08:53:32.001619 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=12, cache_slot=12) planned desc only [2026-04-08 08:53:32.037707 INFO buffer_manager] Allocated weights buffer at (4112101376, 0) [2026-04-08 08:53:32.037721 INFO buffer_manager] Allocated weights buffer at (4112101376, 132120576) [2026-04-08 08:53:32.037723 INFO buffer_manager] Allocated weights buffer at (4244221952, 57344) [2026-04-08 08:53:32.037725 INFO buffer_manager] Allocated weights buffer at (4244279296, 132120576) [2026-04-08 08:53:32.037727 INFO buffer_manager] Allocated weights buffer at (4376399872, 57344) [2026-04-08 08:53:32.037728 INFO buffer_manager] Allocated weights buffer at (4376457216, 132120576) [2026-04-08 08:53:32.037729 INFO buffer_manager] Allocated weights buffer at (4508577792, 57344) [2026-04-08 08:53:32.037731 INFO buffer_manager] Allocated weights buffer at (4508635136, 0) [2026-04-08 08:53:32.037732 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=13, cache_slot=13) planned desc only [2026-04-08 08:53:32.073849 INFO buffer_manager] Allocated weights buffer at (4508635136, 0) [2026-04-08 08:53:32.073862 INFO buffer_manager] Allocated weights buffer at (4508635136, 132120576) [2026-04-08 08:53:32.073865 INFO buffer_manager] Allocated weights buffer at (4640755712, 57344) [2026-04-08 08:53:32.073866 INFO buffer_manager] Allocated weights buffer at (4640813056, 132120576) [2026-04-08 08:53:32.073868 INFO buffer_manager] Allocated weights buffer at (4772933632, 57344) [2026-04-08 08:53:32.073869 INFO buffer_manager] Allocated weights buffer at (4772990976, 132120576) [2026-04-08 08:53:32.073871 INFO buffer_manager] Allocated weights buffer at (4905111552, 57344) [2026-04-08 08:53:32.073872 INFO buffer_manager] Allocated weights buffer at (4905168896, 0) [2026-04-08 08:53:32.073874 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=14, cache_slot=14) planned desc only [2026-04-08 08:53:32.109968 INFO buffer_manager] Allocated weights buffer at (4905168896, 0) [2026-04-08 08:53:32.109982 INFO buffer_manager] Allocated weights buffer at (4905168896, 132120576) [2026-04-08 08:53:32.109984 INFO buffer_manager] Allocated weights buffer at (5037289472, 57344) [2026-04-08 08:53:32.109985 INFO buffer_manager] Allocated weights buffer at (5037346816, 132120576) [2026-04-08 08:53:32.109987 INFO buffer_manager] Allocated weights buffer at (5169467392, 57344) [2026-04-08 08:53:32.109988 INFO buffer_manager] Allocated weights buffer at (5169524736, 132120576) [2026-04-08 08:53:32.109990 INFO buffer_manager] Allocated weights buffer at (5301645312, 57344) [2026-04-08 08:53:32.109996 INFO buffer_manager] Allocated weights buffer at (5301702656, 0) [2026-04-08 08:53:32.109998 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=15, cache_slot=15) planned desc only [2026-04-08 08:53:32.146229 INFO buffer_manager] Allocated weights buffer at (5301702656, 0) [2026-04-08 08:53:32.146246 INFO buffer_manager] Allocated weights buffer at (5301702656, 132120576) [2026-04-08 08:53:32.146248 INFO buffer_manager] Allocated weights buffer at (5433823232, 57344) [2026-04-08 08:53:32.146250 INFO buffer_manager] Allocated weights buffer at (5433880576, 132120576) [2026-04-08 08:53:32.146251 INFO buffer_manager] Allocated weights buffer at (5566001152, 57344) [2026-04-08 08:53:32.146253 INFO buffer_manager] Allocated weights buffer at (5566058496, 132120576) [2026-04-08 08:53:32.146255 INFO buffer_manager] Allocated weights buffer at (5698179072, 57344) [2026-04-08 08:53:32.146256 INFO buffer_manager] Allocated weights buffer at (5698236416, 0) [2026-04-08 08:53:32.146258 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=16, cache_slot=16) planned desc only [2026-04-08 08:53:32.182467 INFO buffer_manager] Allocated weights buffer at (5698236416, 0) [2026-04-08 08:53:32.182481 INFO buffer_manager] Allocated weights buffer at (5698236416, 132120576) [2026-04-08 08:53:32.182483 INFO buffer_manager] Allocated weights buffer at (5830356992, 57344) [2026-04-08 08:53:32.182485 INFO buffer_manager] Allocated weights buffer at (5830414336, 132120576) [2026-04-08 08:53:32.182486 INFO buffer_manager] Allocated weights buffer at (5962534912, 57344) [2026-04-08 08:53:32.182488 INFO buffer_manager] Allocated weights buffer at (5962592256, 132120576) [2026-04-08 08:53:32.182489 INFO buffer_manager] Allocated weights buffer at (6094712832, 57344) [2026-04-08 08:53:32.182491 INFO buffer_manager] Allocated weights buffer at (6094770176, 0) [2026-04-08 08:53:32.182493 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=17, cache_slot=17) planned desc only [2026-04-08 08:53:32.218637 INFO buffer_manager] Allocated weights buffer at (6094770176, 0) [2026-04-08 08:53:32.218652 INFO buffer_manager] Allocated weights buffer at (6094770176, 132120576) [2026-04-08 08:53:32.218654 INFO buffer_manager] Allocated weights buffer at (6226890752, 57344) [2026-04-08 08:53:32.218656 INFO buffer_manager] Allocated weights buffer at (6226948096, 132120576) [2026-04-08 08:53:32.218657 INFO buffer_manager] Allocated weights buffer at (6359068672, 57344) [2026-04-08 08:53:32.218659 INFO buffer_manager] Allocated weights buffer at (6359126016, 132120576) [2026-04-08 08:53:32.218660 INFO buffer_manager] Allocated weights buffer at (6491246592, 57344) [2026-04-08 08:53:32.218661 INFO buffer_manager] Allocated weights buffer at (6491303936, 0) [2026-04-08 08:53:32.218663 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=18, cache_slot=18) planned desc only [2026-04-08 08:53:32.254847 INFO buffer_manager] Allocated weights buffer at (6491303936, 0) [2026-04-08 08:53:32.254862 INFO buffer_manager] Allocated weights buffer at (6491303936, 132120576) [2026-04-08 08:53:32.254864 INFO buffer_manager] Allocated weights buffer at (6623424512, 57344) [2026-04-08 08:53:32.254865 INFO buffer_manager] Allocated weights buffer at (6623481856, 132120576) [2026-04-08 08:53:32.254867 INFO buffer_manager] Allocated weights buffer at (6755602432, 57344) [2026-04-08 08:53:32.254868 INFO buffer_manager] Allocated weights buffer at (6755659776, 132120576) [2026-04-08 08:53:32.254870 INFO buffer_manager] Allocated weights buffer at (6887780352, 57344) [2026-04-08 08:53:32.254871 INFO buffer_manager] Allocated weights buffer at (6887837696, 0) [2026-04-08 08:53:32.254873 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=19, cache_slot=19) planned desc only [2026-04-08 08:53:32.291125 INFO buffer_manager] Allocated weights buffer at (6887837696, 0) [2026-04-08 08:53:32.291139 INFO buffer_manager] Allocated weights buffer at (6887837696, 132120576) [2026-04-08 08:53:32.291144 INFO buffer_manager] Allocated weights buffer at (7019958272, 57344) [2026-04-08 08:53:32.291146 INFO buffer_manager] Allocated weights buffer at (7020015616, 132120576) [2026-04-08 08:53:32.291147 INFO buffer_manager] Allocated weights buffer at (7152136192, 57344) [2026-04-08 08:53:32.291149 INFO buffer_manager] Allocated weights buffer at (7152193536, 132120576) [2026-04-08 08:53:32.291150 INFO buffer_manager] Allocated weights buffer at (7284314112, 57344) [2026-04-08 08:53:32.291151 INFO buffer_manager] Allocated weights buffer at (7284371456, 0) [2026-04-08 08:53:32.291154 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=20, cache_slot=20) planned desc only [2026-04-08 08:53:32.327270 INFO buffer_manager] Allocated weights buffer at (7284371456, 0) [2026-04-08 08:53:32.327285 INFO buffer_manager] Allocated weights buffer at (7284371456, 132120576) [2026-04-08 08:53:32.327287 INFO buffer_manager] Allocated weights buffer at (7416492032, 57344) [2026-04-08 08:53:32.327288 INFO buffer_manager] Allocated weights buffer at (7416549376, 132120576) [2026-04-08 08:53:32.327290 INFO buffer_manager] Allocated weights buffer at (7548669952, 57344) [2026-04-08 08:53:32.327291 INFO buffer_manager] Allocated weights buffer at (7548727296, 132120576) [2026-04-08 08:53:32.327293 INFO buffer_manager] Allocated weights buffer at (7680847872, 57344) [2026-04-08 08:53:32.327294 INFO buffer_manager] Allocated weights buffer at (7680905216, 0) [2026-04-08 08:53:32.327296 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=21, cache_slot=21) planned desc only [2026-04-08 08:53:32.363396 INFO buffer_manager] Allocated weights buffer at (7680905216, 0) [2026-04-08 08:53:32.363409 INFO buffer_manager] Allocated weights buffer at (7680905216, 132120576) [2026-04-08 08:53:32.363411 INFO buffer_manager] Allocated weights buffer at (7813025792, 57344) [2026-04-08 08:53:32.363413 INFO buffer_manager] Allocated weights buffer at (7813083136, 132120576) [2026-04-08 08:53:32.363414 INFO buffer_manager] Allocated weights buffer at (7945203712, 57344) [2026-04-08 08:53:32.363416 INFO buffer_manager] Allocated weights buffer at (7945261056, 132120576) [2026-04-08 08:53:32.363417 INFO buffer_manager] Allocated weights buffer at (8077381632, 57344) [2026-04-08 08:53:32.363419 INFO buffer_manager] Allocated weights buffer at (8077438976, 0) [2026-04-08 08:53:32.363421 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=22, cache_slot=22) planned desc only [2026-04-08 08:53:32.399644 INFO buffer_manager] Allocated weights buffer at (8077438976, 0) [2026-04-08 08:53:32.399659 INFO buffer_manager] Allocated weights buffer at (8077438976, 132120576) [2026-04-08 08:53:32.399661 INFO buffer_manager] Allocated weights buffer at (8209559552, 57344) [2026-04-08 08:53:32.399663 INFO buffer_manager] Allocated weights buffer at (8209616896, 132120576) [2026-04-08 08:53:32.399664 INFO buffer_manager] Allocated weights buffer at (8341737472, 57344) [2026-04-08 08:53:32.399666 INFO buffer_manager] Allocated weights buffer at (8341794816, 132120576) [2026-04-08 08:53:32.399667 INFO buffer_manager] Allocated weights buffer at (8473915392, 57344) [2026-04-08 08:53:32.399669 INFO buffer_manager] Allocated weights buffer at (8473972736, 0) [2026-04-08 08:53:32.399670 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=23, cache_slot=23) planned desc only [2026-04-08 08:53:32.435854 INFO buffer_manager] Allocated weights buffer at (8473972736, 0) [2026-04-08 08:53:32.435868 INFO buffer_manager] Allocated weights buffer at (8473972736, 132120576) [2026-04-08 08:53:32.435870 INFO buffer_manager] Allocated weights buffer at (8606093312, 57344) [2026-04-08 08:53:32.435871 INFO buffer_manager] Allocated weights buffer at (8606150656, 132120576) [2026-04-08 08:53:32.435873 INFO buffer_manager] Allocated weights buffer at (8738271232, 57344) [2026-04-08 08:53:32.435874 INFO buffer_manager] Allocated weights buffer at (8738328576, 132120576) [2026-04-08 08:53:32.435876 INFO buffer_manager] Allocated weights buffer at (8870449152, 57344) [2026-04-08 08:53:32.435881 INFO buffer_manager] Allocated weights buffer at (8870506496, 0) [2026-04-08 08:53:32.435883 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=24, cache_slot=24) planned desc only [2026-04-08 08:53:32.472179 INFO buffer_manager] Allocated weights buffer at (8870506496, 0) [2026-04-08 08:53:32.472194 INFO buffer_manager] Allocated weights buffer at (8870506496, 132120576) [2026-04-08 08:53:32.472196 INFO buffer_manager] Allocated weights buffer at (9002627072, 57344) [2026-04-08 08:53:32.472198 INFO buffer_manager] Allocated weights buffer at (9002684416, 132120576) [2026-04-08 08:53:32.472199 INFO buffer_manager] Allocated weights buffer at (9134804992, 57344) [2026-04-08 08:53:32.472201 INFO buffer_manager] Allocated weights buffer at (9134862336, 132120576) [2026-04-08 08:53:32.472202 INFO buffer_manager] Allocated weights buffer at (9266982912, 57344) [2026-04-08 08:53:32.472204 INFO buffer_manager] Allocated weights buffer at (9267040256, 0) [2026-04-08 08:53:32.472205 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=25, cache_slot=25) planned desc only [2026-04-08 08:53:32.508515 INFO buffer_manager] Allocated weights buffer at (9267040256, 0) [2026-04-08 08:53:32.508529 INFO buffer_manager] Allocated weights buffer at (9267040256, 132120576) [2026-04-08 08:53:32.508531 INFO buffer_manager] Allocated weights buffer at (9399160832, 57344) [2026-04-08 08:53:32.508532 INFO buffer_manager] Allocated weights buffer at (9399218176, 132120576) [2026-04-08 08:53:32.508534 INFO buffer_manager] Allocated weights buffer at (9531338752, 57344) [2026-04-08 08:53:32.508535 INFO buffer_manager] Allocated weights buffer at (9531396096, 132120576) [2026-04-08 08:53:32.508537 INFO buffer_manager] Allocated weights buffer at (9663516672, 57344) [2026-04-08 08:53:32.508538 INFO buffer_manager] Allocated weights buffer at (9663574016, 0) [2026-04-08 08:53:32.508540 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=26, cache_slot=26) planned desc only [2026-04-08 08:53:32.544843 INFO buffer_manager] Allocated weights buffer at (9663574016, 0) [2026-04-08 08:53:32.544857 INFO buffer_manager] Allocated weights buffer at (9663574016, 132120576) [2026-04-08 08:53:32.544860 INFO buffer_manager] Allocated weights buffer at (9795694592, 57344) [2026-04-08 08:53:32.544861 INFO buffer_manager] Allocated weights buffer at (9795751936, 132120576) [2026-04-08 08:53:32.544863 INFO buffer_manager] Allocated weights buffer at (9927872512, 57344) [2026-04-08 08:53:32.544865 INFO buffer_manager] Allocated weights buffer at (9927929856, 132120576) [2026-04-08 08:53:32.544866 INFO buffer_manager] Allocated weights buffer at (10060050432, 57344) [2026-04-08 08:53:32.544868 INFO buffer_manager] Allocated weights buffer at (10060107776, 0) [2026-04-08 08:53:32.544870 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=27, cache_slot=27) planned desc only [2026-04-08 08:53:32.580982 INFO buffer_manager] Allocated weights buffer at (10060107776, 0) [2026-04-08 08:53:32.580996 INFO buffer_manager] Allocated weights buffer at (10060107776, 132120576) [2026-04-08 08:53:32.580998 INFO buffer_manager] Allocated weights buffer at (10192228352, 57344) [2026-04-08 08:53:32.581000 INFO buffer_manager] Allocated weights buffer at (10192285696, 132120576) [2026-04-08 08:53:32.581001 INFO buffer_manager] Allocated weights buffer at (10324406272, 57344) [2026-04-08 08:53:32.581003 INFO buffer_manager] Allocated weights buffer at (10324463616, 132120576) [2026-04-08 08:53:32.581004 INFO buffer_manager] Allocated weights buffer at (10456584192, 57344) [2026-04-08 08:53:32.581006 INFO buffer_manager] Allocated weights buffer at (10456641536, 0) [2026-04-08 08:53:32.581008 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=28, cache_slot=28) planned desc only [2026-04-08 08:53:32.617102 INFO buffer_manager] Allocated weights buffer at (10456641536, 0) [2026-04-08 08:53:32.617116 INFO buffer_manager] Allocated weights buffer at (10456641536, 132120576) [2026-04-08 08:53:32.617122 INFO buffer_manager] Allocated weights buffer at (10588762112, 57344) [2026-04-08 08:53:32.617123 INFO buffer_manager] Allocated weights buffer at (10588819456, 132120576) [2026-04-08 08:53:32.617125 INFO buffer_manager] Allocated weights buffer at (10720940032, 57344) [2026-04-08 08:53:32.617126 INFO buffer_manager] Allocated weights buffer at (10720997376, 132120576) [2026-04-08 08:53:32.617128 INFO buffer_manager] Allocated weights buffer at (10853117952, 57344) [2026-04-08 08:53:32.617130 INFO buffer_manager] Allocated weights buffer at (10853175296, 0) [2026-04-08 08:53:32.617132 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=29, cache_slot=29) planned desc only [2026-04-08 08:53:32.653245 INFO buffer_manager] Allocated weights buffer at (10853175296, 0) [2026-04-08 08:53:32.653258 INFO buffer_manager] Allocated weights buffer at (10853175296, 132120576) [2026-04-08 08:53:32.653260 INFO buffer_manager] Allocated weights buffer at (10985295872, 57344) [2026-04-08 08:53:32.653262 INFO buffer_manager] Allocated weights buffer at (10985353216, 132120576) [2026-04-08 08:53:32.653264 INFO buffer_manager] Allocated weights buffer at (11117473792, 57344) [2026-04-08 08:53:32.653265 INFO buffer_manager] Allocated weights buffer at (11117531136, 132120576) [2026-04-08 08:53:32.653267 INFO buffer_manager] Allocated weights buffer at (11249651712, 57344) [2026-04-08 08:53:32.653268 INFO buffer_manager] Allocated weights buffer at (11249709056, 0) [2026-04-08 08:53:32.653270 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=30, cache_slot=30) planned desc only [2026-04-08 08:53:32.689518 INFO buffer_manager] Allocated weights buffer at (11249709056, 0) [2026-04-08 08:53:32.689534 INFO buffer_manager] Allocated weights buffer at (11249709056, 132120576) [2026-04-08 08:53:32.689536 INFO buffer_manager] Allocated weights buffer at (11381829632, 57344) [2026-04-08 08:53:32.689538 INFO buffer_manager] Allocated weights buffer at (11381886976, 132120576) [2026-04-08 08:53:32.689539 INFO buffer_manager] Allocated weights buffer at (11514007552, 57344) [2026-04-08 08:53:32.689541 INFO buffer_manager] Allocated weights buffer at (11514064896, 132120576) [2026-04-08 08:53:32.689543 INFO buffer_manager] Allocated weights buffer at (11646185472, 57344) [2026-04-08 08:53:32.689544 INFO buffer_manager] Allocated weights buffer at (11646242816, 0) [2026-04-08 08:53:32.689546 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=31, cache_slot=31) planned desc only [2026-04-08 08:53:32.725924 INFO buffer_manager] Allocated weights buffer at (11646242816, 0) [2026-04-08 08:53:32.725940 INFO buffer_manager] Allocated weights buffer at (11646242816, 132120576) [2026-04-08 08:53:32.725942 INFO buffer_manager] Allocated weights buffer at (11778363392, 57344) [2026-04-08 08:53:32.725943 INFO buffer_manager] Allocated weights buffer at (11778420736, 132120576) [2026-04-08 08:53:32.725945 INFO buffer_manager] Allocated weights buffer at (11910541312, 57344) [2026-04-08 08:53:32.725946 INFO buffer_manager] Allocated weights buffer at (11910598656, 132120576) [2026-04-08 08:53:32.725948 INFO buffer_manager] Allocated weights buffer at (12042719232, 57344) [2026-04-08 08:53:32.725949 INFO buffer_manager] Allocated weights buffer at (12042776576, 0) [2026-04-08 08:53:32.725951 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=32, cache_slot=32) planned desc only [2026-04-08 08:53:32.762092 INFO buffer_manager] Allocated weights buffer at (12042776576, 0) [2026-04-08 08:53:32.762107 INFO buffer_manager] Allocated weights buffer at (12042776576, 132120576) [2026-04-08 08:53:32.762109 INFO buffer_manager] Allocated weights buffer at (12174897152, 57344) [2026-04-08 08:53:32.762110 INFO buffer_manager] Allocated weights buffer at (12174954496, 132120576) [2026-04-08 08:53:32.762112 INFO buffer_manager] Allocated weights buffer at (12307075072, 57344) [2026-04-08 08:53:32.762113 INFO buffer_manager] Allocated weights buffer at (12307132416, 132120576) [2026-04-08 08:53:32.762115 INFO buffer_manager] Allocated weights buffer at (12439252992, 57344) [2026-04-08 08:53:32.762119 INFO buffer_manager] Allocated weights buffer at (12439310336, 0) [2026-04-08 08:53:32.762121 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=33, cache_slot=33) planned desc only [2026-04-08 08:53:32.798257 INFO buffer_manager] Allocated weights buffer at (12439310336, 0) [2026-04-08 08:53:32.798271 INFO buffer_manager] Allocated weights buffer at (12439310336, 132120576) [2026-04-08 08:53:32.798273 INFO buffer_manager] Allocated weights buffer at (12571430912, 57344) [2026-04-08 08:53:32.798275 INFO buffer_manager] Allocated weights buffer at (12571488256, 132120576) [2026-04-08 08:53:32.798276 INFO buffer_manager] Allocated weights buffer at (12703608832, 57344) [2026-04-08 08:53:32.798278 INFO buffer_manager] Allocated weights buffer at (12703666176, 132120576) [2026-04-08 08:53:32.798279 INFO buffer_manager] Allocated weights buffer at (12835786752, 57344) [2026-04-08 08:53:32.798281 INFO buffer_manager] Allocated weights buffer at (12835844096, 0) [2026-04-08 08:53:32.798284 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=34, cache_slot=34) planned desc only [2026-04-08 08:53:32.834574 INFO buffer_manager] Allocated weights buffer at (12835844096, 0) [2026-04-08 08:53:32.834587 INFO buffer_manager] Allocated weights buffer at (12835844096, 132120576) [2026-04-08 08:53:32.834589 INFO buffer_manager] Allocated weights buffer at (12967964672, 57344) [2026-04-08 08:53:32.834591 INFO buffer_manager] Allocated weights buffer at (12968022016, 132120576) [2026-04-08 08:53:32.834592 INFO buffer_manager] Allocated weights buffer at (13100142592, 57344) [2026-04-08 08:53:32.834594 INFO buffer_manager] Allocated weights buffer at (13100199936, 132120576) [2026-04-08 08:53:32.834596 INFO buffer_manager] Allocated weights buffer at (13232320512, 57344) [2026-04-08 08:53:32.834597 INFO buffer_manager] Allocated weights buffer at (13232377856, 0) [2026-04-08 08:53:32.834599 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=35, cache_slot=35) planned desc only [2026-04-08 08:53:32.870955 INFO buffer_manager] Allocated weights buffer at (13232377856, 0) [2026-04-08 08:53:32.870969 INFO buffer_manager] Allocated weights buffer at (13232377856, 132120576) [2026-04-08 08:53:32.870971 INFO buffer_manager] Allocated weights buffer at (13364498432, 57344) [2026-04-08 08:53:32.870972 INFO buffer_manager] Allocated weights buffer at (13364555776, 132120576) [2026-04-08 08:53:32.870974 INFO buffer_manager] Allocated weights buffer at (13496676352, 57344) [2026-04-08 08:53:32.870975 INFO buffer_manager] Allocated weights buffer at (13496733696, 132120576) [2026-04-08 08:53:32.870977 INFO buffer_manager] Allocated weights buffer at (13628854272, 57344) [2026-04-08 08:53:32.870978 INFO buffer_manager] Allocated weights buffer at (13628911616, 0) [2026-04-08 08:53:32.870981 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=36, cache_slot=36) planned desc only [2026-04-08 08:53:32.907144 INFO buffer_manager] Allocated weights buffer at (13628911616, 0) [2026-04-08 08:53:32.907157 INFO buffer_manager] Allocated weights buffer at (13628911616, 132120576) [2026-04-08 08:53:32.907159 INFO buffer_manager] Allocated weights buffer at (13761032192, 57344) [2026-04-08 08:53:32.907161 INFO buffer_manager] Allocated weights buffer at (13761089536, 132120576) [2026-04-08 08:53:32.907162 INFO buffer_manager] Allocated weights buffer at (13893210112, 57344) [2026-04-08 08:53:32.907164 INFO buffer_manager] Allocated weights buffer at (13893267456, 132120576) [2026-04-08 08:53:32.907165 INFO buffer_manager] Allocated weights buffer at (14025388032, 57344) [2026-04-08 08:53:32.907166 INFO buffer_manager] Allocated weights buffer at (14025445376, 0) [2026-04-08 08:53:32.907168 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=37, cache_slot=37) planned desc only [2026-04-08 08:53:32.943273 INFO buffer_manager] Allocated weights buffer at (14025445376, 0) [2026-04-08 08:53:32.943287 INFO buffer_manager] Allocated weights buffer at (14025445376, 132120576) [2026-04-08 08:53:32.943292 INFO buffer_manager] Allocated weights buffer at (14157565952, 57344) [2026-04-08 08:53:32.943294 INFO buffer_manager] Allocated weights buffer at (14157623296, 132120576) [2026-04-08 08:53:32.943295 INFO buffer_manager] Allocated weights buffer at (14289743872, 57344) [2026-04-08 08:53:32.943297 INFO buffer_manager] Allocated weights buffer at (14289801216, 132120576) [2026-04-08 08:53:32.943298 INFO buffer_manager] Allocated weights buffer at (14421921792, 57344) [2026-04-08 08:53:32.943300 INFO buffer_manager] Allocated weights buffer at (14421979136, 0) [2026-04-08 08:53:32.943301 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=38, cache_slot=38) planned desc only [2026-04-08 08:53:32.979722 INFO buffer_manager] Allocated weights buffer at (14421979136, 0) [2026-04-08 08:53:32.979736 INFO buffer_manager] Allocated weights buffer at (14421979136, 132120576) [2026-04-08 08:53:32.979738 INFO buffer_manager] Allocated weights buffer at (14554099712, 57344) [2026-04-08 08:53:32.979740 INFO buffer_manager] Allocated weights buffer at (14554157056, 132120576) [2026-04-08 08:53:32.979741 INFO buffer_manager] Allocated weights buffer at (14686277632, 57344) [2026-04-08 08:53:32.979743 INFO buffer_manager] Allocated weights buffer at (14686334976, 132120576) [2026-04-08 08:53:32.979744 INFO buffer_manager] Allocated weights buffer at (14818455552, 57344) [2026-04-08 08:53:32.979746 INFO buffer_manager] Allocated weights buffer at (14818512896, 0) [2026-04-08 08:53:32.979748 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=39, cache_slot=39) planned desc only [2026-04-08 08:53:33.015940 INFO buffer_manager] Allocated weights buffer at (14818512896, 0) [2026-04-08 08:53:33.015954 INFO buffer_manager] Allocated weights buffer at (14818512896, 132120576) [2026-04-08 08:53:33.015956 INFO buffer_manager] Allocated weights buffer at (14950633472, 57344) [2026-04-08 08:53:33.015958 INFO buffer_manager] Allocated weights buffer at (14950690816, 132120576) [2026-04-08 08:53:33.015959 INFO buffer_manager] Allocated weights buffer at (15082811392, 57344) [2026-04-08 08:53:33.015961 INFO buffer_manager] Allocated weights buffer at (15082868736, 132120576) [2026-04-08 08:53:33.015962 INFO buffer_manager] Allocated weights buffer at (15214989312, 57344) [2026-04-08 08:53:33.015964 INFO buffer_manager] Allocated weights buffer at (15215046656, 0) [2026-04-08 08:53:33.015966 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=40, cache_slot=40) planned desc only [2026-04-08 08:53:33.052287 INFO buffer_manager] Allocated weights buffer at (15215046656, 0) [2026-04-08 08:53:33.052301 INFO buffer_manager] Allocated weights buffer at (15215046656, 132120576) [2026-04-08 08:53:33.052303 INFO buffer_manager] Allocated weights buffer at (15347167232, 57344) [2026-04-08 08:53:33.052305 INFO buffer_manager] Allocated weights buffer at (15347224576, 132120576) [2026-04-08 08:53:33.052307 INFO buffer_manager] Allocated weights buffer at (15479345152, 57344) [2026-04-08 08:53:33.052308 INFO buffer_manager] Allocated weights buffer at (15479402496, 132120576) [2026-04-08 08:53:33.052310 INFO buffer_manager] Allocated weights buffer at (15611523072, 57344) [2026-04-08 08:53:33.052312 INFO buffer_manager] Allocated weights buffer at (15611580416, 0) [2026-04-08 08:53:33.052314 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=41, cache_slot=41) planned desc only [2026-04-08 08:53:33.088536 INFO buffer_manager] Allocated weights buffer at (15611580416, 0) [2026-04-08 08:53:33.088553 INFO buffer_manager] Allocated weights buffer at (15611580416, 132120576) [2026-04-08 08:53:33.088555 INFO buffer_manager] Allocated weights buffer at (15743700992, 57344) [2026-04-08 08:53:33.088557 INFO buffer_manager] Allocated weights buffer at (15743758336, 132120576) [2026-04-08 08:53:33.088558 INFO buffer_manager] Allocated weights buffer at (15875878912, 57344) [2026-04-08 08:53:33.088560 INFO buffer_manager] Allocated weights buffer at (15875936256, 132120576) [2026-04-08 08:53:33.088562 INFO buffer_manager] Allocated weights buffer at (16008056832, 57344) [2026-04-08 08:53:33.088567 INFO buffer_manager] Allocated weights buffer at (16008114176, 0) [2026-04-08 08:53:33.088568 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=42, cache_slot=42) planned desc only [2026-04-08 08:53:33.124744 INFO buffer_manager] Allocated weights buffer at (16008114176, 0) [2026-04-08 08:53:33.124757 INFO buffer_manager] Allocated weights buffer at (16008114176, 132120576) [2026-04-08 08:53:33.124759 INFO buffer_manager] Allocated weights buffer at (16140234752, 57344) [2026-04-08 08:53:33.124761 INFO buffer_manager] Allocated weights buffer at (16140292096, 132120576) [2026-04-08 08:53:33.124762 INFO buffer_manager] Allocated weights buffer at (16272412672, 57344) [2026-04-08 08:53:33.124764 INFO buffer_manager] Allocated weights buffer at (16272470016, 132120576) [2026-04-08 08:53:33.124769 INFO buffer_manager] Allocated weights buffer at (16404590592, 57344) [2026-04-08 08:53:33.124772 INFO buffer_manager] Allocated weights buffer at (16404647936, 0) [2026-04-08 08:53:33.124773 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=43, cache_slot=43) planned desc only [2026-04-08 08:53:33.160951 INFO buffer_manager] Allocated weights buffer at (16404647936, 0) [2026-04-08 08:53:33.160969 INFO buffer_manager] Allocated weights buffer at (16404647936, 132120576) [2026-04-08 08:53:33.160972 INFO buffer_manager] Allocated weights buffer at (16536768512, 57344) [2026-04-08 08:53:33.160973 INFO buffer_manager] Allocated weights buffer at (16536825856, 132120576) [2026-04-08 08:53:33.160974 INFO buffer_manager] Allocated weights buffer at (16668946432, 57344) [2026-04-08 08:53:33.160976 INFO buffer_manager] Allocated weights buffer at (16669003776, 132120576) [2026-04-08 08:53:33.160978 INFO buffer_manager] Allocated weights buffer at (16801124352, 57344) [2026-04-08 08:53:33.160979 INFO buffer_manager] Allocated weights buffer at (16801181696, 0) [2026-04-08 08:53:33.160981 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=44, cache_slot=44) planned desc only [2026-04-08 08:53:33.197038 INFO buffer_manager] Allocated weights buffer at (16801181696, 0) [2026-04-08 08:53:33.197052 INFO buffer_manager] Allocated weights buffer at (16801181696, 132120576) [2026-04-08 08:53:33.197054 INFO buffer_manager] Allocated weights buffer at (16933302272, 57344) [2026-04-08 08:53:33.197056 INFO buffer_manager] Allocated weights buffer at (16933359616, 132120576) [2026-04-08 08:53:33.197057 INFO buffer_manager] Allocated weights buffer at (17065480192, 57344) [2026-04-08 08:53:33.197059 INFO buffer_manager] Allocated weights buffer at (17065537536, 132120576) [2026-04-08 08:53:33.197060 INFO buffer_manager] Allocated weights buffer at (17197658112, 57344) [2026-04-08 08:53:33.197062 INFO buffer_manager] Allocated weights buffer at (17197715456, 0) [2026-04-08 08:53:33.197063 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=45, cache_slot=45) planned desc only [2026-04-08 08:53:33.233179 INFO buffer_manager] Allocated weights buffer at (17197715456, 0) [2026-04-08 08:53:33.233194 INFO buffer_manager] Allocated weights buffer at (17197715456, 132120576) [2026-04-08 08:53:33.233196 INFO buffer_manager] Allocated weights buffer at (17329836032, 57344) [2026-04-08 08:53:33.233198 INFO buffer_manager] Allocated weights buffer at (17329893376, 132120576) [2026-04-08 08:53:33.233199 INFO buffer_manager] Allocated weights buffer at (17462013952, 57344) [2026-04-08 08:53:33.233201 INFO buffer_manager] Allocated weights buffer at (17462071296, 132120576) [2026-04-08 08:53:33.233202 INFO buffer_manager] Allocated weights buffer at (17594191872, 57344) [2026-04-08 08:53:33.233203 INFO buffer_manager] Allocated weights buffer at (17594249216, 0) [2026-04-08 08:53:33.233205 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=46, cache_slot=46) planned desc only [2026-04-08 08:53:33.269323 INFO buffer_manager] Allocated weights buffer at (17594249216, 0) [2026-04-08 08:53:33.269337 INFO buffer_manager] Allocated weights buffer at (17594249216, 132120576) [2026-04-08 08:53:33.269342 INFO buffer_manager] Allocated weights buffer at (17726369792, 57344) [2026-04-08 08:53:33.269343 INFO buffer_manager] Allocated weights buffer at (17726427136, 132120576) [2026-04-08 08:53:33.269345 INFO buffer_manager] Allocated weights buffer at (17858547712, 57344) [2026-04-08 08:53:33.269346 INFO buffer_manager] Allocated weights buffer at (17858605056, 132120576) [2026-04-08 08:53:33.269348 INFO buffer_manager] Allocated weights buffer at (17990725632, 57344) [2026-04-08 08:53:33.269349 INFO buffer_manager] Allocated weights buffer at (17990782976, 0) [2026-04-08 08:53:33.269351 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=47, cache_slot=47) planned desc only [2026-04-08 08:53:33.305446 INFO buffer_manager] Allocated weights buffer at (17990782976, 0) [2026-04-08 08:53:33.305460 INFO buffer_manager] Allocated weights buffer at (17990782976, 132120576) [2026-04-08 08:53:33.305462 INFO buffer_manager] Allocated weights buffer at (18122903552, 57344) [2026-04-08 08:53:33.305464 INFO buffer_manager] Allocated weights buffer at (18122960896, 132120576) [2026-04-08 08:53:33.305465 INFO buffer_manager] Allocated weights buffer at (18255081472, 57344) [2026-04-08 08:53:33.305467 INFO buffer_manager] Allocated weights buffer at (18255138816, 132120576) [2026-04-08 08:53:33.305469 INFO buffer_manager] Allocated weights buffer at (18387259392, 57344) [2026-04-08 08:53:33.305470 INFO buffer_manager] Allocated weights buffer at (18387316736, 0) [2026-04-08 08:53:33.305472 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=48, cache_slot=48) planned desc only [2026-04-08 08:53:33.341535 INFO buffer_manager] Allocated weights buffer at (18387316736, 0) [2026-04-08 08:53:33.341548 INFO buffer_manager] Allocated weights buffer at (18387316736, 132120576) [2026-04-08 08:53:33.341550 INFO buffer_manager] Allocated weights buffer at (18519437312, 57344) [2026-04-08 08:53:33.341552 INFO buffer_manager] Allocated weights buffer at (18519494656, 132120576) [2026-04-08 08:53:33.341553 INFO buffer_manager] Allocated weights buffer at (18651615232, 57344) [2026-04-08 08:53:33.341555 INFO buffer_manager] Allocated weights buffer at (18651672576, 132120576) [2026-04-08 08:53:33.341557 INFO buffer_manager] Allocated weights buffer at (18783793152, 57344) [2026-04-08 08:53:33.341558 INFO buffer_manager] Allocated weights buffer at (18783850496, 0) [2026-04-08 08:53:33.341560 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=49, cache_slot=49) planned desc only [2026-04-08 08:53:33.377644 INFO buffer_manager] Allocated weights buffer at (18783850496, 0) [2026-04-08 08:53:33.377657 INFO buffer_manager] Allocated weights buffer at (18783850496, 132120576) [2026-04-08 08:53:33.377659 INFO buffer_manager] Allocated weights buffer at (18915971072, 57344) [2026-04-08 08:53:33.377660 INFO buffer_manager] Allocated weights buffer at (18916028416, 132120576) [2026-04-08 08:53:33.377662 INFO buffer_manager] Allocated weights buffer at (19048148992, 57344) [2026-04-08 08:53:33.377663 INFO buffer_manager] Allocated weights buffer at (19048206336, 132120576) [2026-04-08 08:53:33.377665 INFO buffer_manager] Allocated weights buffer at (19180326912, 57344) [2026-04-08 08:53:33.377666 INFO buffer_manager] Allocated weights buffer at (19180384256, 0) [2026-04-08 08:53:33.377668 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=50, cache_slot=50) planned desc only [2026-04-08 08:53:33.413789 INFO buffer_manager] Allocated weights buffer at (19180384256, 0) [2026-04-08 08:53:33.413802 INFO buffer_manager] Allocated weights buffer at (19180384256, 132120576) [2026-04-08 08:53:33.413804 INFO buffer_manager] Allocated weights buffer at (19312504832, 57344) [2026-04-08 08:53:33.413806 INFO buffer_manager] Allocated weights buffer at (19312562176, 132120576) [2026-04-08 08:53:33.413808 INFO buffer_manager] Allocated weights buffer at (19444682752, 57344) [2026-04-08 08:53:33.413809 INFO buffer_manager] Allocated weights buffer at (19444740096, 132120576) [2026-04-08 08:53:33.413813 INFO buffer_manager] Allocated weights buffer at (19576860672, 57344) [2026-04-08 08:53:33.413815 INFO buffer_manager] Allocated weights buffer at (19576918016, 0) [2026-04-08 08:53:33.413817 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=51, cache_slot=51) planned desc only [2026-04-08 08:53:33.449970 INFO buffer_manager] Allocated weights buffer at (19576918016, 0) [2026-04-08 08:53:33.449984 INFO buffer_manager] Allocated weights buffer at (19576918016, 132120576) [2026-04-08 08:53:33.449986 INFO buffer_manager] Allocated weights buffer at (19709038592, 57344) [2026-04-08 08:53:33.449987 INFO buffer_manager] Allocated weights buffer at (19709095936, 132120576) [2026-04-08 08:53:33.449989 INFO buffer_manager] Allocated weights buffer at (19841216512, 57344) [2026-04-08 08:53:33.449990 INFO buffer_manager] Allocated weights buffer at (19841273856, 132120576) [2026-04-08 08:53:33.449992 INFO buffer_manager] Allocated weights buffer at (19973394432, 57344) [2026-04-08 08:53:33.449993 INFO buffer_manager] Allocated weights buffer at (19973451776, 0) [2026-04-08 08:53:33.449995 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=52, cache_slot=52) planned desc only [2026-04-08 08:53:33.486097 INFO buffer_manager] Allocated weights buffer at (19973451776, 0) [2026-04-08 08:53:33.486112 INFO buffer_manager] Allocated weights buffer at (19973451776, 132120576) [2026-04-08 08:53:33.486114 INFO buffer_manager] Allocated weights buffer at (20105572352, 57344) [2026-04-08 08:53:33.486116 INFO buffer_manager] Allocated weights buffer at (20105629696, 132120576) [2026-04-08 08:53:33.486117 INFO buffer_manager] Allocated weights buffer at (20237750272, 57344) [2026-04-08 08:53:33.486119 INFO buffer_manager] Allocated weights buffer at (20237807616, 132120576) [2026-04-08 08:53:33.486120 INFO buffer_manager] Allocated weights buffer at (20369928192, 57344) [2026-04-08 08:53:33.486122 INFO buffer_manager] Allocated weights buffer at (20369985536, 0) [2026-04-08 08:53:33.486123 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=53, cache_slot=53) planned desc only [2026-04-08 08:53:33.522345 INFO buffer_manager] Allocated weights buffer at (20369985536, 0) [2026-04-08 08:53:33.522359 INFO buffer_manager] Allocated weights buffer at (20369985536, 132120576) [2026-04-08 08:53:33.522361 INFO buffer_manager] Allocated weights buffer at (20502106112, 57344) [2026-04-08 08:53:33.522362 INFO buffer_manager] Allocated weights buffer at (20502163456, 132120576) [2026-04-08 08:53:33.522364 INFO buffer_manager] Allocated weights buffer at (20634284032, 57344) [2026-04-08 08:53:33.522365 INFO buffer_manager] Allocated weights buffer at (20634341376, 132120576) [2026-04-08 08:53:33.522367 INFO buffer_manager] Allocated weights buffer at (20766461952, 57344) [2026-04-08 08:53:33.522368 INFO buffer_manager] Allocated weights buffer at (20766519296, 0) [2026-04-08 08:53:33.522370 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=54, cache_slot=54) planned desc only [2026-04-08 08:53:33.558879 INFO buffer_manager] Allocated weights buffer at (20766519296, 0) [2026-04-08 08:53:33.558892 INFO buffer_manager] Allocated weights buffer at (20766519296, 132120576) [2026-04-08 08:53:33.558894 INFO buffer_manager] Allocated weights buffer at (20898639872, 57344) [2026-04-08 08:53:33.558896 INFO buffer_manager] Allocated weights buffer at (20898697216, 132120576) [2026-04-08 08:53:33.558897 INFO buffer_manager] Allocated weights buffer at (21030817792, 57344) [2026-04-08 08:53:33.558899 INFO buffer_manager] Allocated weights buffer at (21030875136, 132120576) [2026-04-08 08:53:33.558901 INFO buffer_manager] Allocated weights buffer at (21162995712, 57344) [2026-04-08 08:53:33.558903 INFO buffer_manager] Allocated weights buffer at (21163053056, 0) [2026-04-08 08:53:33.558905 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=55, cache_slot=55) planned desc only [2026-04-08 08:53:33.595254 INFO buffer_manager] Allocated weights buffer at (21163053056, 0) [2026-04-08 08:53:33.595271 INFO buffer_manager] Allocated weights buffer at (21163053056, 132120576) [2026-04-08 08:53:33.595273 INFO buffer_manager] Allocated weights buffer at (21295173632, 57344) [2026-04-08 08:53:33.595274 INFO buffer_manager] Allocated weights buffer at (21295230976, 132120576) [2026-04-08 08:53:33.595276 INFO buffer_manager] Allocated weights buffer at (21427351552, 57344) [2026-04-08 08:53:33.595278 INFO buffer_manager] Allocated weights buffer at (21427408896, 132120576) [2026-04-08 08:53:33.595280 INFO buffer_manager] Allocated weights buffer at (21559529472, 57344) [2026-04-08 08:53:33.595281 INFO buffer_manager] Allocated weights buffer at (21559586816, 0) [2026-04-08 08:53:33.595283 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=56, cache_slot=56) planned desc only [2026-04-08 08:53:33.631557 INFO buffer_manager] Allocated weights buffer at (21559586816, 0) [2026-04-08 08:53:33.631571 INFO buffer_manager] Allocated weights buffer at (21559586816, 132120576) [2026-04-08 08:53:33.631573 INFO buffer_manager] Allocated weights buffer at (21691707392, 57344) [2026-04-08 08:53:33.631575 INFO buffer_manager] Allocated weights buffer at (21691764736, 132120576) [2026-04-08 08:53:33.631576 INFO buffer_manager] Allocated weights buffer at (21823885312, 57344) [2026-04-08 08:53:33.631578 INFO buffer_manager] Allocated weights buffer at (21823942656, 132120576) [2026-04-08 08:53:33.631579 INFO buffer_manager] Allocated weights buffer at (21956063232, 57344) [2026-04-08 08:53:33.631581 INFO buffer_manager] Allocated weights buffer at (21956120576, 0) [2026-04-08 08:53:33.631583 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=57, cache_slot=57) planned desc only [2026-04-08 08:53:33.667734 INFO buffer_manager] Allocated weights buffer at (21956120576, 0) [2026-04-08 08:53:33.667746 INFO buffer_manager] Allocated weights buffer at (21956120576, 132120576) [2026-04-08 08:53:33.667748 INFO buffer_manager] Allocated weights buffer at (22088241152, 57344) [2026-04-08 08:53:33.667750 INFO buffer_manager] Allocated weights buffer at (22088298496, 132120576) [2026-04-08 08:53:33.667751 INFO buffer_manager] Allocated weights buffer at (22220419072, 57344) [2026-04-08 08:53:33.667753 INFO buffer_manager] Allocated weights buffer at (22220476416, 132120576) [2026-04-08 08:53:33.667754 INFO buffer_manager] Allocated weights buffer at (22352596992, 57344) [2026-04-08 08:53:33.667756 INFO buffer_manager] Allocated weights buffer at (22352654336, 0) [2026-04-08 08:53:33.667757 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=58, cache_slot=58) planned desc only [2026-04-08 08:53:33.703951 INFO buffer_manager] Allocated weights buffer at (22352654336, 0) [2026-04-08 08:53:33.703966 INFO buffer_manager] Allocated weights buffer at (22352654336, 132120576) [2026-04-08 08:53:33.703968 INFO buffer_manager] Allocated weights buffer at (22484774912, 57344) [2026-04-08 08:53:33.703969 INFO buffer_manager] Allocated weights buffer at (22484832256, 132120576) [2026-04-08 08:53:33.703971 INFO buffer_manager] Allocated weights buffer at (22616952832, 57344) [2026-04-08 08:53:33.703972 INFO buffer_manager] Allocated weights buffer at (22617010176, 132120576) [2026-04-08 08:53:33.703974 INFO buffer_manager] Allocated weights buffer at (22749130752, 57344) [2026-04-08 08:53:33.703976 INFO buffer_manager] Allocated weights buffer at (22749188096, 0) [2026-04-08 08:53:33.703978 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=59, cache_slot=59) planned desc only [2026-04-08 08:53:33.740286 INFO buffer_manager] Allocated weights buffer at (22749188096, 0) [2026-04-08 08:53:33.740300 INFO buffer_manager] Allocated weights buffer at (22749188096, 132120576) [2026-04-08 08:53:33.740302 INFO buffer_manager] Allocated weights buffer at (22881308672, 57344) [2026-04-08 08:53:33.740303 INFO buffer_manager] Allocated weights buffer at (22881366016, 132120576) [2026-04-08 08:53:33.740305 INFO buffer_manager] Allocated weights buffer at (23013486592, 57344) [2026-04-08 08:53:33.740306 INFO buffer_manager] Allocated weights buffer at (23013543936, 132120576) [2026-04-08 08:53:33.740311 INFO buffer_manager] Allocated weights buffer at (23145664512, 57344) [2026-04-08 08:53:33.740313 INFO buffer_manager] Allocated weights buffer at (23145721856, 0) [2026-04-08 08:53:33.740315 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=60, cache_slot=60) planned desc only [2026-04-08 08:53:34.103343 INFO buffer_manager] Allocated weights buffer at (23145721856, 0) [2026-04-08 08:53:34.103362 INFO buffer_manager] Allocated weights buffer at (23145721856, 132120576) [2026-04-08 08:53:34.103364 INFO buffer_manager] Allocated weights buffer at (23277842432, 57344) [2026-04-08 08:53:34.103366 INFO buffer_manager] Allocated weights buffer at (23277899776, 132120576) [2026-04-08 08:53:34.103367 INFO buffer_manager] Allocated weights buffer at (23410020352, 57344) [2026-04-08 08:53:34.103369 INFO buffer_manager] Allocated weights buffer at (23410077696, 132120576) [2026-04-08 08:53:34.103371 INFO buffer_manager] Allocated weights buffer at (23542198272, 57344) [2026-04-08 08:53:34.103372 INFO buffer_manager] Allocated weights buffer at (23542255616, 0) [2026-04-08 08:53:34.103374 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=61, cache_slot=61) planned desc only [2026-04-08 08:53:47.300331 INFO fp8_dpdk_common] fp9 fast path forced on by default in the current kernel build [2026-04-08 08:53:47.572130 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=256, expert_tiles=752, avg_tile_batch=3.49, prepare=3.464589ms, send=20.298373ms, judge_wait=202.525582ms, fetch=26.623797ms, reduce=20ns; duck time-ns stats: p50=192.158836ms, p90=192.470081ms, max=192.649338ms; kernel_model: matmul=7.222591 GFLOP (37.491 GFLOP/s @ duck_max), param_stream=1.034945G (5.372 Gparam/s @ duck_max), weight_stream=1110.857 MiB (6.046 GB/s @ duck_max) [2026-04-08 08:53:47.845331 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=256, expert_tiles=754, avg_tile_batch=3.48, prepare=2.902251ms, send=18.317069ms, judge_wait=205.483487ms, fetch=26.923738ms, reduce=139ns; duck time-ns stats: p50=190.080223ms, p90=190.645266ms, max=190.845181ms; kernel_model: matmul=7.222591 GFLOP (37.845 GFLOP/s @ duck_max), param_stream=1.037697G (5.437 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.120 GB/s @ duck_max) [2026-04-08 08:53:48.123270 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=256, expert_tiles=752, avg_tile_batch=3.49, prepare=2.906599ms, send=17.213595ms, judge_wait=211.244051ms, fetch=26.922638ms, reduce=138ns; duck time-ns stats: p50=193.940847ms, p90=194.175491ms, max=194.426736ms; kernel_model: matmul=7.222591 GFLOP (37.148 GFLOP/s @ duck_max), param_stream=1.034945G (5.323 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.991 GB/s @ duck_max) [2026-04-08 08:53:48.398138 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=256, expert_tiles=751, avg_tile_batch=3.49, prepare=2.838046ms, send=17.227599ms, judge_wait=209.268427ms, fetch=25.995351ms, reduce=21ns; duck time-ns stats: p50=190.737818ms, p90=190.911189ms, max=191.074094ms; kernel_model: matmul=7.222591 GFLOP (37.800 GFLOP/s @ duck_max), param_stream=1.033568G (5.409 Gparam/s @ duck_max), weight_stream=1109.380 MiB (6.088 GB/s @ duck_max) [2026-04-08 08:53:48.424099 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=133, expert_tiles=134, avg_tile_batch=1.43, prepare=187.534µs, send=1.290196ms, judge_wait=21.525964ms, fetch=1.49007ms, reduce=135ns; duck time-ns stats: p50=21.331335ms, p90=21.358968ms, max=21.381413ms; kernel_model: matmul=0.528482 GFLOP (24.717 GFLOP/s @ duck_max), param_stream=0.184418G (8.625 Gparam/s @ duck_max), weight_stream=197.945 MiB (9.708 GB/s @ duck_max) [2026-04-08 08:53:48.719526 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=256, expert_tiles=750, avg_tile_batch=3.50, prepare=3.501102ms, send=17.211381ms, judge_wait=212.932811ms, fetch=26.91296ms, reduce=20ns; duck time-ns stats: p50=192.545238ms, p90=192.842769ms, max=193.509128ms; kernel_model: matmul=7.222591 GFLOP (37.324 GFLOP/s @ duck_max), param_stream=1.032192G (5.334 Gparam/s @ duck_max), weight_stream=1107.903 MiB (6.003 GB/s @ duck_max) [2026-04-08 08:53:48.987103 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=256, expert_tiles=755, avg_tile_batch=3.48, prepare=662.803µs, send=18.478855ms, judge_wait=213.152804ms, fetch=20.655223ms, reduce=135ns; duck time-ns stats: p50=192.587353ms, p90=192.727703ms, max=192.848071ms; kernel_model: matmul=7.222591 GFLOP (37.452 GFLOP/s @ duck_max), param_stream=1.039073G (5.388 Gparam/s @ duck_max), weight_stream=1115.289 MiB (6.064 GB/s @ duck_max) [2026-04-08 08:53:49.256948 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=254, expert_tiles=759, avg_tile_batch=3.46, prepare=664.759µs, send=17.2219ms, judge_wait=215.455132ms, fetch=21.688549ms, reduce=104ns; duck time-ns stats: p50=193.685435ms, p90=194.17891ms, max=194.551027ms; kernel_model: matmul=7.222591 GFLOP (37.124 GFLOP/s @ duck_max), param_stream=1.044578G (5.369 Gparam/s @ duck_max), weight_stream=1121.197 MiB (6.043 GB/s @ duck_max) [2026-04-08 08:53:49.527152 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=254, expert_tiles=756, avg_tile_batch=3.47, prepare=675.8µs, send=17.225662ms, judge_wait=217.001603ms, fetch=20.627087ms, reduce=107ns; duck time-ns stats: p50=194.124234ms, p90=194.385323ms, max=194.720821ms; kernel_model: matmul=7.222591 GFLOP (37.092 GFLOP/s @ duck_max), param_stream=1.040450G (5.343 Gparam/s @ duck_max), weight_stream=1116.766 MiB (6.014 GB/s @ duck_max) [2026-04-08 08:53:49.553054 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=134, expert_tiles=134, avg_tile_batch=1.43, prepare=57.919µs, send=1.304212ms, judge_wait=22.088055ms, fetch=1.48558ms, reduce=106ns; duck time-ns stats: p50=21.825303ms, p90=21.868464ms, max=21.914069ms; kernel_model: matmul=0.528482 GFLOP (24.116 GFLOP/s @ duck_max), param_stream=0.184418G (8.416 Gparam/s @ duck_max), weight_stream=197.945 MiB (9.472 GB/s @ duck_max) [2026-04-08 08:53:49.847092 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=256, expert_tiles=760, avg_tile_batch=3.45, prepare=806.043µs, send=18.601048ms, judge_wait=220.039694ms, fetch=20.6537ms, reduce=146ns; duck time-ns stats: p50=196.959844ms, p90=197.152037ms, max=197.190272ms; kernel_model: matmul=7.222591 GFLOP (36.628 GFLOP/s @ duck_max), param_stream=1.045955G (5.304 Gparam/s @ duck_max), weight_stream=1122.675 MiB (5.970 GB/s @ duck_max) [2026-04-08 08:53:50.118543 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=255, expert_tiles=754, avg_tile_batch=3.48, prepare=663.547µs, send=17.227148ms, judge_wait=217.313485ms, fetch=21.683154ms, reduce=138ns; duck time-ns stats: p50=195.025692ms, p90=195.224192ms, max=195.483402ms; kernel_model: matmul=7.222591 GFLOP (36.947 GFLOP/s @ duck_max), param_stream=1.037697G (5.308 Gparam/s @ duck_max), weight_stream=1113.811 MiB (5.975 GB/s @ duck_max) [2026-04-08 08:53:50.390478 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=255, expert_tiles=755, avg_tile_batch=3.48, prepare=663.724µs, send=18.348726ms, judge_wait=216.739535ms, fetch=21.663755ms, reduce=105ns; duck time-ns stats: p50=193.414207ms, p90=193.799277ms, max=193.993727ms; kernel_model: matmul=7.222591 GFLOP (37.231 GFLOP/s @ duck_max), param_stream=1.039073G (5.356 Gparam/s @ duck_max), weight_stream=1115.289 MiB (6.028 GB/s @ duck_max) [2026-04-08 08:53:50.662924 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=255, expert_tiles=744, avg_tile_batch=3.53, prepare=664.184µs, send=18.40489ms, judge_wait=217.192817ms, fetch=21.64664ms, reduce=112ns; duck time-ns stats: p50=193.494889ms, p90=193.861811ms, max=194.44361ms; kernel_model: matmul=7.222591 GFLOP (37.145 GFLOP/s @ duck_max), param_stream=1.023934G (5.266 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.927 GB/s @ duck_max) [2026-04-08 08:53:50.688910 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=132, expert_tiles=132, avg_tile_batch=1.45, prepare=56.162µs, send=1.301197ms, judge_wait=22.161815ms, fetch=1.48207ms, reduce=113ns; duck time-ns stats: p50=21.898176ms, p90=21.940365ms, max=21.975289ms; kernel_model: matmul=0.528482 GFLOP (24.049 GFLOP/s @ duck_max), param_stream=0.181666G (8.267 Gparam/s @ duck_max), weight_stream=194.991 MiB (9.304 GB/s @ duck_max) [2026-04-08 08:53:50.982741 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=253, expert_tiles=756, avg_tile_batch=3.47, prepare=747.763µs, send=17.239076ms, judge_wait=220.427319ms, fetch=21.596191ms, reduce=136ns; duck time-ns stats: p50=195.790508ms, p90=196.302754ms, max=196.90321ms; kernel_model: matmul=7.222591 GFLOP (36.681 GFLOP/s @ duck_max), param_stream=1.040450G (5.284 Gparam/s @ duck_max), weight_stream=1116.766 MiB (5.947 GB/s @ duck_max) [2026-04-08 08:53:51.255740 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=251, expert_tiles=751, avg_tile_batch=3.49, prepare=669.438µs, send=18.359821ms, judge_wait=218.806135ms, fetch=20.637245ms, reduce=21ns; duck time-ns stats: p50=194.825683ms, p90=195.00741ms, max=195.346082ms; kernel_model: matmul=7.222591 GFLOP (36.973 GFLOP/s @ duck_max), param_stream=1.033568G (5.291 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.955 GB/s @ duck_max) [2026-04-08 08:53:51.528367 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=253, expert_tiles=764, avg_tile_batch=3.43, prepare=655.276µs, send=18.41348ms, judge_wait=218.320393ms, fetch=20.650898ms, reduce=20ns; duck time-ns stats: p50=194.735384ms, p90=195.033156ms, max=195.236144ms; kernel_model: matmul=7.222591 GFLOP (36.994 GFLOP/s @ duck_max), param_stream=1.051460G (5.386 Gparam/s @ duck_max), weight_stream=1128.583 MiB (6.061 GB/s @ duck_max) [2026-04-08 08:53:51.800759 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=249, expert_tiles=752, avg_tile_batch=3.49, prepare=660.563µs, send=18.358007ms, judge_wait=218.176976ms, fetch=20.635913ms, reduce=133ns; duck time-ns stats: p50=194.51408ms, p90=194.748278ms, max=194.911753ms; kernel_model: matmul=7.222591 GFLOP (37.056 GFLOP/s @ duck_max), param_stream=1.034945G (5.310 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.976 GB/s @ duck_max) [2026-04-08 08:53:51.827242 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=132, expert_tiles=133, avg_tile_batch=1.44, prepare=57.508µs, send=1.306653ms, judge_wait=22.610213ms, fetch=1.4837ms, reduce=139ns; duck time-ns stats: p50=22.374328ms, p90=22.409324ms, max=22.42449ms; kernel_model: matmul=0.528482 GFLOP (23.567 GFLOP/s @ duck_max), param_stream=0.183042G (8.163 Gparam/s @ duck_max), weight_stream=196.468 MiB (9.187 GB/s @ duck_max) [2026-04-08 08:53:52.111576 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=253, expert_tiles=753, avg_tile_batch=3.48, prepare=797.906µs, send=17.237767ms, judge_wait=214.871936ms, fetch=21.660313ms, reduce=134ns; duck time-ns stats: p50=190.663421ms, p90=191.049531ms, max=191.287491ms; kernel_model: matmul=7.222591 GFLOP (37.758 GFLOP/s @ duck_max), param_stream=1.036321G (5.418 Gparam/s @ duck_max), weight_stream=1112.334 MiB (6.097 GB/s @ duck_max) [2026-04-08 08:53:52.384184 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=255, expert_tiles=756, avg_tile_batch=3.47, prepare=658.935µs, send=17.22685ms, judge_wait=219.392936ms, fetch=21.638411ms, reduce=20ns; duck time-ns stats: p50=194.151025ms, p90=194.364738ms, max=194.515498ms; kernel_model: matmul=7.222591 GFLOP (37.131 GFLOP/s @ duck_max), param_stream=1.040450G (5.349 Gparam/s @ duck_max), weight_stream=1116.766 MiB (6.020 GB/s @ duck_max) [2026-04-08 08:53:52.657193 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=251, expert_tiles=748, avg_tile_batch=3.51, prepare=658.273µs, send=17.221337ms, judge_wait=219.763941ms, fetch=21.647027ms, reduce=15ns; duck time-ns stats: p50=194.333812ms, p90=194.606177ms, max=194.7313ms; kernel_model: matmul=7.222591 GFLOP (37.090 GFLOP/s @ duck_max), param_stream=1.029439G (5.286 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.950 GB/s @ duck_max) [2026-04-08 08:53:52.932253 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=254, expert_tiles=754, avg_tile_batch=3.48, prepare=659.354µs, send=17.222942ms, judge_wait=222.829195ms, fetch=20.657098ms, reduce=15ns; duck time-ns stats: p50=197.108454ms, p90=197.28836ms, max=197.570044ms; kernel_model: matmul=7.222591 GFLOP (36.557 GFLOP/s @ duck_max), param_stream=1.037697G (5.252 Gparam/s @ duck_max), weight_stream=1113.811 MiB (5.911 GB/s @ duck_max) [2026-04-08 08:53:52.957412 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=129, expert_tiles=129, avg_tile_batch=1.49, prepare=56.982µs, send=1.303869ms, judge_wait=21.393767ms, fetch=1.478166ms, reduce=107ns; duck time-ns stats: p50=21.177486ms, p90=21.202097ms, max=21.208777ms; kernel_model: matmul=0.528482 GFLOP (24.918 GFLOP/s @ duck_max), param_stream=0.177537G (8.371 Gparam/s @ duck_max), weight_stream=190.559 MiB (9.421 GB/s @ duck_max) [2026-04-08 08:53:53.250178 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=252, expert_tiles=748, avg_tile_batch=3.51, prepare=743.026µs, send=18.121696ms, judge_wait=216.718067ms, fetch=27.405183ms, reduce=136ns; duck time-ns stats: p50=190.713957ms, p90=191.168952ms, max=191.575967ms; kernel_model: matmul=7.222591 GFLOP (37.701 GFLOP/s @ duck_max), param_stream=1.029439G (5.374 Gparam/s @ duck_max), weight_stream=1104.948 MiB (6.048 GB/s @ duck_max) [2026-04-08 08:53:53.521548 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=250, expert_tiles=749, avg_tile_batch=3.50, prepare=663.651µs, send=17.224299ms, judge_wait=219.054532ms, fetch=20.671653ms, reduce=21ns; duck time-ns stats: p50=195.097529ms, p90=195.338556ms, max=195.406054ms; kernel_model: matmul=7.222591 GFLOP (36.962 GFLOP/s @ duck_max), param_stream=1.030816G (5.275 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.937 GB/s @ duck_max) [2026-04-08 08:53:53.792553 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=249, expert_tiles=750, avg_tile_batch=3.50, prepare=665.6µs, send=17.226937ms, judge_wait=217.670113ms, fetch=21.638144ms, reduce=138ns; duck time-ns stats: p50=191.677948ms, p90=192.085915ms, max=192.109197ms; kernel_model: matmul=7.222591 GFLOP (37.596 GFLOP/s @ duck_max), param_stream=1.032192G (5.373 Gparam/s @ duck_max), weight_stream=1107.903 MiB (6.047 GB/s @ duck_max) [2026-04-08 08:53:54.064607 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=249, expert_tiles=749, avg_tile_batch=3.50, prepare=662.639µs, send=17.212957ms, judge_wait=219.752027ms, fetch=20.649715ms, reduce=15ns; duck time-ns stats: p50=193.832654ms, p90=194.075141ms, max=194.197146ms; kernel_model: matmul=7.222591 GFLOP (37.192 GFLOP/s @ duck_max), param_stream=1.030816G (5.308 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.974 GB/s @ duck_max) [2026-04-08 08:53:54.090889 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=117, expert_tiles=118, avg_tile_batch=1.63, prepare=56.517µs, send=1.304769ms, judge_wait=22.519969ms, fetch=1.47292ms, reduce=107ns; duck time-ns stats: p50=22.186091ms, p90=22.28062ms, max=22.336409ms; kernel_model: matmul=0.528482 GFLOP (23.660 GFLOP/s @ duck_max), param_stream=0.162398G (7.271 Gparam/s @ duck_max), weight_stream=174.310 MiB (8.183 GB/s @ duck_max) [2026-04-08 08:53:54.379616 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=254, expert_tiles=754, avg_tile_batch=3.48, prepare=734.231µs, send=17.232996ms, judge_wait=219.296969ms, fetch=21.613612ms, reduce=139ns; duck time-ns stats: p50=192.755104ms, p90=193.125ms, max=193.301563ms; kernel_model: matmul=7.222591 GFLOP (37.364 GFLOP/s @ duck_max), param_stream=1.037697G (5.368 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.042 GB/s @ duck_max) [2026-04-08 08:53:54.653229 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=250, expert_tiles=753, avg_tile_batch=3.48, prepare=657.464µs, send=17.213236ms, judge_wait=220.494025ms, fetch=21.560394ms, reduce=104ns; duck time-ns stats: p50=194.345441ms, p90=194.572414ms, max=194.64216ms; kernel_model: matmul=7.222591 GFLOP (37.107 GFLOP/s @ duck_max), param_stream=1.036321G (5.324 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.992 GB/s @ duck_max) [2026-04-08 08:53:54.926245 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=254, expert_tiles=742, avg_tile_batch=3.54, prepare=675.602µs, send=18.320397ms, judge_wait=218.590834ms, fetch=21.61614ms, reduce=108ns; duck time-ns stats: p50=192.406081ms, p90=192.693315ms, max=192.893028ms; kernel_model: matmul=7.222591 GFLOP (37.444 GFLOP/s @ duck_max), param_stream=1.021182G (5.294 Gparam/s @ duck_max), weight_stream=1096.085 MiB (5.958 GB/s @ duck_max) [2026-04-08 08:53:55.200644 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=253, expert_tiles=743, avg_tile_batch=3.53, prepare=652.66µs, send=17.222412ms, judge_wait=218.01099ms, fetch=24.833351ms, reduce=23ns; duck time-ns stats: p50=190.424808ms, p90=190.707297ms, max=190.742095ms; kernel_model: matmul=7.222591 GFLOP (37.866 GFLOP/s @ duck_max), param_stream=1.022558G (5.361 Gparam/s @ duck_max), weight_stream=1097.562 MiB (6.034 GB/s @ duck_max) [2026-04-08 08:53:55.226324 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=119, expert_tiles=123, avg_tile_batch=1.56, prepare=57.856µs, send=2.549859ms, judge_wait=20.699995ms, fetch=1.472921ms, reduce=20ns; duck time-ns stats: p50=20.492215ms, p90=20.51315ms, max=20.55792ms; kernel_model: matmul=0.528482 GFLOP (25.707 GFLOP/s @ duck_max), param_stream=0.169279G (8.234 Gparam/s @ duck_max), weight_stream=181.696 MiB (9.268 GB/s @ duck_max) [2026-04-08 08:53:55.521380 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=251, expert_tiles=756, avg_tile_batch=3.47, prepare=765.197µs, send=17.235956ms, judge_wait=225.515118ms, fetch=21.650462ms, reduce=147ns; duck time-ns stats: p50=201.228533ms, p90=201.568848ms, max=201.783164ms; kernel_model: matmul=7.222591 GFLOP (35.794 GFLOP/s @ duck_max), param_stream=1.040450G (5.156 Gparam/s @ duck_max), weight_stream=1116.766 MiB (5.803 GB/s @ duck_max) [2026-04-08 08:53:55.802326 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=252, expert_tiles=756, avg_tile_batch=3.47, prepare=660.393µs, send=17.227478ms, judge_wait=227.164986ms, fetch=22.178544ms, reduce=110ns; duck time-ns stats: p50=200.066176ms, p90=200.327494ms, max=200.754103ms; kernel_model: matmul=7.222591 GFLOP (35.977 GFLOP/s @ duck_max), param_stream=1.040450G (5.183 Gparam/s @ duck_max), weight_stream=1116.766 MiB (5.833 GB/s @ duck_max) [2026-04-08 08:53:56.078223 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=252, expert_tiles=751, avg_tile_batch=3.49, prepare=655.114µs, send=17.222873ms, judge_wait=222.573779ms, fetch=21.629695ms, reduce=135ns; duck time-ns stats: p50=198.266343ms, p90=198.528967ms, max=198.683138ms; kernel_model: matmul=7.222591 GFLOP (36.352 GFLOP/s @ duck_max), param_stream=1.033568G (5.202 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.855 GB/s @ duck_max) [2026-04-08 08:53:56.358421 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=253, expert_tiles=755, avg_tile_batch=3.48, prepare=658.449µs, send=17.226116ms, judge_wait=226.961564ms, fetch=21.644225ms, reduce=141ns; duck time-ns stats: p50=200.661531ms, p90=200.838857ms, max=200.876088ms; kernel_model: matmul=7.222591 GFLOP (35.955 GFLOP/s @ duck_max), param_stream=1.039073G (5.173 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.822 GB/s @ duck_max) [2026-04-08 08:53:56.384940 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=107, expert_tiles=115, avg_tile_batch=1.67, prepare=55.917µs, send=1.301947ms, judge_wait=22.740764ms, fetch=1.481192ms, reduce=136ns; duck time-ns stats: p50=22.478672ms, p90=22.525645ms, max=22.549694ms; kernel_model: matmul=0.528482 GFLOP (23.436 GFLOP/s @ duck_max), param_stream=0.158269G (7.019 Gparam/s @ duck_max), weight_stream=169.878 MiB (7.899 GB/s @ duck_max) [2026-04-08 08:53:56.671823 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=252, expert_tiles=744, avg_tile_batch=3.53, prepare=815.26µs, send=17.241603ms, judge_wait=217.489554ms, fetch=20.625044ms, reduce=134ns; duck time-ns stats: p50=191.321776ms, p90=191.530689ms, max=191.616038ms; kernel_model: matmul=7.222591 GFLOP (37.693 GFLOP/s @ duck_max), param_stream=1.023934G (5.344 Gparam/s @ duck_max), weight_stream=1099.039 MiB (6.014 GB/s @ duck_max) [2026-04-08 08:53:56.949140 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=252, expert_tiles=748, avg_tile_batch=3.51, prepare=654.855µs, send=17.233882ms, judge_wait=220.781179ms, fetch=24.000049ms, reduce=24ns; duck time-ns stats: p50=193.406496ms, p90=193.838099ms, max=194.006513ms; kernel_model: matmul=7.222591 GFLOP (37.229 GFLOP/s @ duck_max), param_stream=1.029439G (5.306 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.972 GB/s @ duck_max) [2026-04-08 08:53:57.223194 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=252, expert_tiles=752, avg_tile_batch=3.49, prepare=659.11µs, send=17.229054ms, judge_wait=219.214834ms, fetch=22.343237ms, reduce=20ns; duck time-ns stats: p50=192.397088ms, p90=192.666401ms, max=192.760822ms; kernel_model: matmul=7.222591 GFLOP (37.469 GFLOP/s @ duck_max), param_stream=1.034945G (5.369 Gparam/s @ duck_max), weight_stream=1110.857 MiB (6.043 GB/s @ duck_max) [2026-04-08 08:53:57.502002 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=250, expert_tiles=752, avg_tile_batch=3.49, prepare=668.687µs, send=18.567326ms, judge_wait=221.481973ms, fetch=23.497222ms, reduce=20ns; duck time-ns stats: p50=194.212476ms, p90=194.444333ms, max=194.676078ms; kernel_model: matmul=7.222591 GFLOP (37.101 GFLOP/s @ duck_max), param_stream=1.034945G (5.316 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.983 GB/s @ duck_max) [2026-04-08 08:53:57.527098 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=122, expert_tiles=124, avg_tile_batch=1.55, prepare=56.135µs, send=1.52663ms, judge_wait=21.019308ms, fetch=1.487979ms, reduce=138ns; duck time-ns stats: p50=20.819878ms, p90=20.840647ms, max=20.877312ms; kernel_model: matmul=0.528482 GFLOP (25.314 GFLOP/s @ duck_max), param_stream=0.170656G (8.174 Gparam/s @ duck_max), weight_stream=183.173 MiB (9.200 GB/s @ duck_max) [2026-04-08 08:53:57.818034 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=251, expert_tiles=749, avg_tile_batch=3.50, prepare=820.513µs, send=17.216203ms, judge_wait=220.523009ms, fetch=21.649264ms, reduce=133ns; duck time-ns stats: p50=194.051114ms, p90=194.425247ms, max=194.663187ms; kernel_model: matmul=7.222591 GFLOP (37.103 GFLOP/s @ duck_max), param_stream=1.030816G (5.295 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.960 GB/s @ duck_max) [2026-04-08 08:53:58.089862 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=250, expert_tiles=746, avg_tile_batch=3.52, prepare=828.526µs, send=17.228302ms, judge_wait=218.510229ms, fetch=20.626112ms, reduce=20ns; duck time-ns stats: p50=192.794928ms, p90=193.080518ms, max=193.672071ms; kernel_model: matmul=7.222591 GFLOP (37.293 GFLOP/s @ duck_max), param_stream=1.026687G (5.301 Gparam/s @ duck_max), weight_stream=1101.994 MiB (5.966 GB/s @ duck_max) [2026-04-08 08:53:58.366738 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=254, expert_tiles=751, avg_tile_batch=3.49, prepare=659.503µs, send=17.224774ms, judge_wait=223.774804ms, fetch=20.617391ms, reduce=21ns; duck time-ns stats: p50=197.060185ms, p90=197.359366ms, max=197.787883ms; kernel_model: matmul=7.222591 GFLOP (36.517 GFLOP/s @ duck_max), param_stream=1.033568G (5.226 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.881 GB/s @ duck_max) [2026-04-08 08:53:58.640418 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=251, expert_tiles=754, avg_tile_batch=3.48, prepare=658.306µs, send=18.320903ms, judge_wait=219.386769ms, fetch=20.657869ms, reduce=132ns; duck time-ns stats: p50=192.710423ms, p90=193.066686ms, max=193.347499ms; kernel_model: matmul=7.222591 GFLOP (37.355 GFLOP/s @ duck_max), param_stream=1.037697G (5.367 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.041 GB/s @ duck_max) [2026-04-08 08:53:58.666791 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=105, expert_tiles=114, avg_tile_batch=1.68, prepare=58.378µs, send=1.302415ms, judge_wait=22.530631ms, fetch=1.480575ms, reduce=138ns; duck time-ns stats: p50=22.305612ms, p90=22.325953ms, max=22.351453ms; kernel_model: matmul=0.528482 GFLOP (23.644 GFLOP/s @ duck_max), param_stream=0.156893G (7.019 Gparam/s @ duck_max), weight_stream=168.401 MiB (7.900 GB/s @ duck_max) [2026-04-08 08:53:58.956606 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=250, expert_tiles=754, avg_tile_batch=3.48, prepare=819.216µs, send=17.214024ms, judge_wait=219.029015ms, fetch=21.653321ms, reduce=169ns; duck time-ns stats: p50=192.489846ms, p90=192.910836ms, max=193.166331ms; kernel_model: matmul=7.222591 GFLOP (37.391 GFLOP/s @ duck_max), param_stream=1.037697G (5.372 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.046 GB/s @ duck_max) [2026-04-08 08:53:59.233334 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=251, expert_tiles=755, avg_tile_batch=3.48, prepare=667.009µs, send=18.324463ms, judge_wait=221.508567ms, fetch=21.687813ms, reduce=20ns; duck time-ns stats: p50=195.109654ms, p90=195.370186ms, max=195.564667ms; kernel_model: matmul=7.222591 GFLOP (36.932 GFLOP/s @ duck_max), param_stream=1.039073G (5.313 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.980 GB/s @ duck_max) [2026-04-08 08:53:59.512184 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=252, expert_tiles=752, avg_tile_batch=3.49, prepare=666.688µs, send=18.397473ms, judge_wait=221.125473ms, fetch=24.125305ms, reduce=135ns; duck time-ns stats: p50=193.285566ms, p90=193.474369ms, max=193.772163ms; kernel_model: matmul=7.222591 GFLOP (37.274 GFLOP/s @ duck_max), param_stream=1.034945G (5.341 Gparam/s @ duck_max), weight_stream=1110.857 MiB (6.011 GB/s @ duck_max) [2026-04-08 08:53:59.789596 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=248, expert_tiles=751, avg_tile_batch=3.49, prepare=665.496µs, send=18.36008ms, judge_wait=220.94188ms, fetch=22.903863ms, reduce=20ns; duck time-ns stats: p50=192.693619ms, p90=192.971942ms, max=193.133852ms; kernel_model: matmul=7.222591 GFLOP (37.397 GFLOP/s @ duck_max), param_stream=1.033568G (5.352 Gparam/s @ duck_max), weight_stream=1109.380 MiB (6.023 GB/s @ duck_max) [2026-04-08 08:53:59.815645 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=113, expert_tiles=115, avg_tile_batch=1.67, prepare=56.485µs, send=2.477659ms, judge_wait=21.003986ms, fetch=1.479518ms, reduce=135ns; duck time-ns stats: p50=20.773247ms, p90=20.805364ms, max=20.829007ms; kernel_model: matmul=0.528482 GFLOP (25.372 GFLOP/s @ duck_max), param_stream=0.158269G (7.599 Gparam/s @ duck_max), weight_stream=169.878 MiB (8.552 GB/s @ duck_max) [2026-04-08 08:54:00.104042 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=244, expert_tiles=749, avg_tile_batch=3.50, prepare=748.261µs, send=18.371563ms, judge_wait=217.372102ms, fetch=21.638417ms, reduce=138ns; duck time-ns stats: p50=190.975124ms, p90=191.502615ms, max=191.789589ms; kernel_model: matmul=7.222591 GFLOP (37.659 GFLOP/s @ duck_max), param_stream=1.030816G (5.375 Gparam/s @ duck_max), weight_stream=1106.425 MiB (6.049 GB/s @ duck_max) [2026-04-08 08:54:00.382209 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=245, expert_tiles=749, avg_tile_batch=3.50, prepare=665.107µs, send=17.212284ms, judge_wait=222.774838ms, fetch=23.647561ms, reduce=20ns; duck time-ns stats: p50=195.416209ms, p90=196.28158ms, max=196.457202ms; kernel_model: matmul=7.222591 GFLOP (36.764 GFLOP/s @ duck_max), param_stream=1.030816G (5.247 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.905 GB/s @ duck_max) [2026-04-08 08:54:00.659830 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=246, expert_tiles=758, avg_tile_batch=3.46, prepare=597.116µs, send=18.576061ms, judge_wait=220.266277ms, fetch=24.439665ms, reduce=14ns; duck time-ns stats: p50=192.659219ms, p90=192.979096ms, max=193.35927ms; kernel_model: matmul=7.222591 GFLOP (37.353 GFLOP/s @ duck_max), param_stream=1.043202G (5.395 Gparam/s @ duck_max), weight_stream=1119.720 MiB (6.072 GB/s @ duck_max) [2026-04-08 08:54:00.934686 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=247, expert_tiles=759, avg_tile_batch=3.46, prepare=655.396µs, send=17.227625ms, judge_wait=220.810489ms, fetch=22.353981ms, reduce=23ns; duck time-ns stats: p50=193.881789ms, p90=194.212243ms, max=194.34072ms; kernel_model: matmul=7.222591 GFLOP (37.165 GFLOP/s @ duck_max), param_stream=1.044578G (5.375 Gparam/s @ duck_max), weight_stream=1121.197 MiB (6.049 GB/s @ duck_max) [2026-04-08 08:54:00.959850 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=102, expert_tiles=108, avg_tile_batch=1.78, prepare=58.435µs, send=2.601493ms, judge_wait=20.071398ms, fetch=1.475461ms, reduce=64ns; duck time-ns stats: p50=19.739557ms, p90=19.804517ms, max=19.93457ms; kernel_model: matmul=0.528482 GFLOP (26.511 GFLOP/s @ duck_max), param_stream=0.148636G (7.456 Gparam/s @ duck_max), weight_stream=159.538 MiB (8.392 GB/s @ duck_max) [2026-04-08 08:54:01.251546 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=241, expert_tiles=751, avg_tile_batch=3.49, prepare=747.112µs, send=17.231361ms, judge_wait=219.781875ms, fetch=24.095735ms, reduce=147ns; duck time-ns stats: p50=191.992292ms, p90=192.180506ms, max=192.396772ms; kernel_model: matmul=7.222591 GFLOP (37.540 GFLOP/s @ duck_max), param_stream=1.033568G (5.372 Gparam/s @ duck_max), weight_stream=1109.380 MiB (6.046 GB/s @ duck_max) [2026-04-08 08:54:01.525649 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=244, expert_tiles=753, avg_tile_batch=3.48, prepare=668.973µs, send=17.227991ms, judge_wait=220.607727ms, fetch=21.824832ms, reduce=20ns; duck time-ns stats: p50=192.017624ms, p90=192.215401ms, max=192.355088ms; kernel_model: matmul=7.222591 GFLOP (37.548 GFLOP/s @ duck_max), param_stream=1.036321G (5.388 Gparam/s @ duck_max), weight_stream=1112.334 MiB (6.064 GB/s @ duck_max) [2026-04-08 08:54:01.799494 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=244, expert_tiles=754, avg_tile_batch=3.48, prepare=666.298µs, send=17.221685ms, judge_wait=219.792628ms, fetch=22.375606ms, reduce=20ns; duck time-ns stats: p50=190.638635ms, p90=190.888828ms, max=191.432694ms; kernel_model: matmul=7.222591 GFLOP (37.729 GFLOP/s @ duck_max), param_stream=1.037697G (5.421 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.101 GB/s @ duck_max) [2026-04-08 08:54:02.078079 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=238, expert_tiles=752, avg_tile_batch=3.49, prepare=659.794µs, send=18.393685ms, judge_wait=220.986611ms, fetch=24.784625ms, reduce=20ns; duck time-ns stats: p50=193.405296ms, p90=193.91957ms, max=194.322091ms; kernel_model: matmul=7.222591 GFLOP (37.168 GFLOP/s @ duck_max), param_stream=1.034945G (5.326 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.994 GB/s @ duck_max) [2026-04-08 08:54:02.103762 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=93, expert_tiles=104, avg_tile_batch=1.85, prepare=57.318µs, send=1.388538ms, judge_wait=21.840677ms, fetch=1.4749ms, reduce=109ns; duck time-ns stats: p50=21.615382ms, p90=21.645866ms, max=21.656038ms; kernel_model: matmul=0.528482 GFLOP (24.403 GFLOP/s @ duck_max), param_stream=0.143131G (6.609 Gparam/s @ duck_max), weight_stream=153.629 MiB (7.439 GB/s @ duck_max) [2026-04-08 08:54:02.402755 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=237, expert_tiles=751, avg_tile_batch=3.49, prepare=804.249µs, send=17.227593ms, judge_wait=228.079447ms, fetch=23.076805ms, reduce=20ns; duck time-ns stats: p50=200.785687ms, p90=200.933476ms, max=201.08496ms; kernel_model: matmul=7.222591 GFLOP (35.918 GFLOP/s @ duck_max), param_stream=1.033568G (5.140 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.785 GB/s @ duck_max) [2026-04-08 08:54:02.679680 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=239, expert_tiles=745, avg_tile_batch=3.52, prepare=658.948µs, send=18.37333ms, judge_wait=219.975024ms, fetch=24.113832ms, reduce=137ns; duck time-ns stats: p50=192.626552ms, p90=192.883761ms, max=193.008744ms; kernel_model: matmul=7.222591 GFLOP (37.421 GFLOP/s @ duck_max), param_stream=1.025311G (5.312 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.979 GB/s @ duck_max) [2026-04-08 08:54:02.953850 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=247, expert_tiles=754, avg_tile_batch=3.48, prepare=659.991µs, send=17.226854ms, judge_wait=221.858183ms, fetch=20.65925ms, reduce=19ns; duck time-ns stats: p50=195.313893ms, p90=195.876755ms, max=196.146498ms; kernel_model: matmul=7.222591 GFLOP (36.822 GFLOP/s @ duck_max), param_stream=1.037697G (5.290 Gparam/s @ duck_max), weight_stream=1113.811 MiB (5.954 GB/s @ duck_max) [2026-04-08 08:54:03.232480 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=232, expert_tiles=747, avg_tile_batch=3.51, prepare=665.73µs, send=17.221793ms, judge_wait=222.790694ms, fetch=24.175551ms, reduce=20ns; duck time-ns stats: p50=195.123034ms, p90=195.452052ms, max=195.823317ms; kernel_model: matmul=7.222591 GFLOP (36.883 GFLOP/s @ duck_max), param_stream=1.028063G (5.250 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.909 GB/s @ duck_max) [2026-04-08 08:54:03.257039 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=101, expert_tiles=108, avg_tile_batch=1.78, prepare=56.631µs, send=2.605182ms, judge_wait=19.460737ms, fetch=1.479482ms, reduce=138ns; duck time-ns stats: p50=19.23478ms, p90=19.260396ms, max=19.301395ms; kernel_model: matmul=0.528482 GFLOP (27.381 GFLOP/s @ duck_max), param_stream=0.148636G (7.701 Gparam/s @ duck_max), weight_stream=159.538 MiB (8.667 GB/s @ duck_max) [2026-04-08 08:54:03.548528 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=249, expert_tiles=762, avg_tile_batch=3.44, prepare=753.577µs, send=17.240163ms, judge_wait=219.998797ms, fetch=23.691443ms, reduce=135ns; duck time-ns stats: p50=192.713074ms, p90=192.848782ms, max=193.031973ms; kernel_model: matmul=7.222591 GFLOP (37.417 GFLOP/s @ duck_max), param_stream=1.048707G (5.433 Gparam/s @ duck_max), weight_stream=1125.629 MiB (6.115 GB/s @ duck_max) [2026-04-08 08:54:03.828829 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=247, expert_tiles=760, avg_tile_batch=3.45, prepare=661.695µs, send=17.228777ms, judge_wait=224.025298ms, fetch=24.542789ms, reduce=146ns; duck time-ns stats: p50=196.303862ms, p90=196.620761ms, max=196.94377ms; kernel_model: matmul=7.222591 GFLOP (36.673 GFLOP/s @ duck_max), param_stream=1.045955G (5.311 Gparam/s @ duck_max), weight_stream=1122.675 MiB (5.977 GB/s @ duck_max) [2026-04-08 08:54:04.109154 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=240, expert_tiles=751, avg_tile_batch=3.49, prepare=662.609µs, send=18.367553ms, judge_wait=225.372709ms, fetch=22.041319ms, reduce=137ns; duck time-ns stats: p50=196.618936ms, p90=196.944724ms, max=197.311964ms; kernel_model: matmul=7.222591 GFLOP (36.605 GFLOP/s @ duck_max), param_stream=1.033568G (5.238 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.896 GB/s @ duck_max) [2026-04-08 08:54:04.386170 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=243, expert_tiles=745, avg_tile_batch=3.52, prepare=657.148µs, send=17.211361ms, judge_wait=221.494113ms, fetch=23.891834ms, reduce=20ns; duck time-ns stats: p50=194.079042ms, p90=194.407495ms, max=194.534862ms; kernel_model: matmul=7.222591 GFLOP (37.127 GFLOP/s @ duck_max), param_stream=1.025311G (5.271 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.932 GB/s @ duck_max) [2026-04-08 08:54:04.411367 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=109, expert_tiles=114, avg_tile_batch=1.68, prepare=56.023µs, send=2.641474ms, judge_wait=20.081828ms, fetch=1.484371ms, reduce=135ns; duck time-ns stats: p50=19.83533ms, p90=19.898573ms, max=19.911356ms; kernel_model: matmul=0.528482 GFLOP (26.542 GFLOP/s @ duck_max), param_stream=0.156893G (7.880 Gparam/s @ duck_max), weight_stream=168.401 MiB (8.868 GB/s @ duck_max) [2026-04-08 08:54:04.711428 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=241, expert_tiles=748, avg_tile_batch=3.51, prepare=766.218µs, send=17.237848ms, judge_wait=222.596353ms, fetch=24.785468ms, reduce=20ns; duck time-ns stats: p50=193.039068ms, p90=193.230374ms, max=193.462135ms; kernel_model: matmul=7.222591 GFLOP (37.333 GFLOP/s @ duck_max), param_stream=1.029439G (5.321 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.989 GB/s @ duck_max) [2026-04-08 08:54:04.992161 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=247, expert_tiles=753, avg_tile_batch=3.48, prepare=659.45µs, send=18.346843ms, judge_wait=225.199935ms, fetch=21.759876ms, reduce=138ns; duck time-ns stats: p50=196.163001ms, p90=196.392724ms, max=196.476266ms; kernel_model: matmul=7.222591 GFLOP (36.761 GFLOP/s @ duck_max), param_stream=1.036321G (5.275 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.936 GB/s @ duck_max) [2026-04-08 08:54:05.268649 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=243, expert_tiles=756, avg_tile_batch=3.47, prepare=661.594µs, send=17.226513ms, judge_wait=221.623376ms, fetch=22.332249ms, reduce=158ns; duck time-ns stats: p50=193.027131ms, p90=193.294158ms, max=193.649065ms; kernel_model: matmul=7.222591 GFLOP (37.297 GFLOP/s @ duck_max), param_stream=1.040450G (5.373 Gparam/s @ duck_max), weight_stream=1116.766 MiB (6.047 GB/s @ duck_max) [2026-04-08 08:54:05.548973 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=240, expert_tiles=751, avg_tile_batch=3.49, prepare=656.858µs, send=17.224676ms, judge_wait=223.068346ms, fetch=24.726385ms, reduce=20ns; duck time-ns stats: p50=193.458053ms, p90=193.744459ms, max=193.890506ms; kernel_model: matmul=7.222591 GFLOP (37.251 GFLOP/s @ duck_max), param_stream=1.033568G (5.331 Gparam/s @ duck_max), weight_stream=1109.380 MiB (6.000 GB/s @ duck_max) [2026-04-08 08:54:05.573999 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=99, expert_tiles=107, avg_tile_batch=1.79, prepare=57.79µs, send=2.461349ms, judge_wait=20.054587ms, fetch=1.475554ms, reduce=20ns; duck time-ns stats: p50=19.821258ms, p90=19.863683ms, max=19.874552ms; kernel_model: matmul=0.528482 GFLOP (26.591 GFLOP/s @ duck_max), param_stream=0.147259G (7.409 Gparam/s @ duck_max), weight_stream=158.061 MiB (8.339 GB/s @ duck_max) [2026-04-08 08:54:05.870365 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=754, avg_tile_batch=3.48, prepare=3.493011ms, send=17.237745ms, judge_wait=220.165864ms, fetch=23.084783ms, reduce=21ns; duck time-ns stats: p50=191.850201ms, p90=192.294241ms, max=192.633155ms; kernel_model: matmul=7.222591 GFLOP (37.494 GFLOP/s @ duck_max), param_stream=1.037697G (5.387 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.063 GB/s @ duck_max) [2026-04-08 08:54:06.147043 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=242, expert_tiles=754, avg_tile_batch=3.48, prepare=665.46µs, send=18.529794ms, judge_wait=220.602336ms, fetch=22.26935ms, reduce=135ns; duck time-ns stats: p50=193.565778ms, p90=193.770661ms, max=193.983969ms; kernel_model: matmul=7.222591 GFLOP (37.233 GFLOP/s @ duck_max), param_stream=1.037697G (5.349 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.021 GB/s @ duck_max) [2026-04-08 08:54:06.428375 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=240, expert_tiles=762, avg_tile_batch=3.44, prepare=658.692µs, send=17.225136ms, judge_wait=224.781318ms, fetch=24.112656ms, reduce=20ns; duck time-ns stats: p50=197.320351ms, p90=197.828759ms, max=198.1826ms; kernel_model: matmul=7.222591 GFLOP (36.444 GFLOP/s @ duck_max), param_stream=1.048707G (5.292 Gparam/s @ duck_max), weight_stream=1125.629 MiB (5.956 GB/s @ duck_max) [2026-04-08 08:54:06.705691 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=239, expert_tiles=743, avg_tile_batch=3.53, prepare=656.33µs, send=18.380787ms, judge_wait=221.382313ms, fetch=22.329265ms, reduce=135ns; duck time-ns stats: p50=194.487241ms, p90=194.859772ms, max=194.962763ms; kernel_model: matmul=7.222591 GFLOP (37.046 GFLOP/s @ duck_max), param_stream=1.022558G (5.245 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.903 GB/s @ duck_max) [2026-04-08 08:54:06.730725 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=108, expert_tiles=114, avg_tile_batch=1.68, prepare=56.479µs, send=1.499014ms, judge_wait=20.982581ms, fetch=1.482193ms, reduce=136ns; duck time-ns stats: p50=20.785962ms, p90=20.803333ms, max=20.838181ms; kernel_model: matmul=0.528482 GFLOP (25.361 GFLOP/s @ duck_max), param_stream=0.156893G (7.529 Gparam/s @ duck_max), weight_stream=168.401 MiB (8.474 GB/s @ duck_max) [2026-04-08 08:54:07.027525 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=239, expert_tiles=745, avg_tile_batch=3.52, prepare=3.406405ms, send=17.217226ms, judge_wait=219.965751ms, fetch=23.948586ms, reduce=135ns; duck time-ns stats: p50=191.51533ms, p90=191.673394ms, max=191.875574ms; kernel_model: matmul=7.222591 GFLOP (37.642 GFLOP/s @ duck_max), param_stream=1.025311G (5.344 Gparam/s @ duck_max), weight_stream=1100.517 MiB (6.014 GB/s @ duck_max) [2026-04-08 08:54:07.301365 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=247, expert_tiles=750, avg_tile_batch=3.50, prepare=659.597µs, send=17.226774ms, judge_wait=220.751217ms, fetch=20.629431ms, reduce=21ns; duck time-ns stats: p50=193.929184ms, p90=194.094578ms, max=194.173515ms; kernel_model: matmul=7.222591 GFLOP (37.197 GFLOP/s @ duck_max), param_stream=1.032192G (5.316 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.983 GB/s @ duck_max) [2026-04-08 08:54:07.586723 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=755, avg_tile_batch=3.48, prepare=663.4µs, send=19.330473ms, judge_wait=222.150984ms, fetch=28.660739ms, reduce=20ns; duck time-ns stats: p50=193.921774ms, p90=194.551448ms, max=194.732269ms; kernel_model: matmul=7.222591 GFLOP (37.090 GFLOP/s @ duck_max), param_stream=1.039073G (5.336 Gparam/s @ duck_max), weight_stream=1115.289 MiB (6.006 GB/s @ duck_max) [2026-04-08 08:54:07.866394 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=239, expert_tiles=754, avg_tile_batch=3.48, prepare=659.36µs, send=18.546822ms, judge_wait=225.283213ms, fetch=20.643774ms, reduce=141ns; duck time-ns stats: p50=198.622164ms, p90=198.777791ms, max=199.059206ms; kernel_model: matmul=7.222591 GFLOP (36.284 GFLOP/s @ duck_max), param_stream=1.037697G (5.213 Gparam/s @ duck_max), weight_stream=1113.811 MiB (5.867 GB/s @ duck_max) [2026-04-08 08:54:07.890987 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=96, expert_tiles=105, avg_tile_batch=1.83, prepare=55.959µs, send=1.303984ms, judge_wait=20.718007ms, fetch=1.475735ms, reduce=137ns; duck time-ns stats: p50=20.46353ms, p90=20.51147ms, max=20.548024ms; kernel_model: matmul=0.528482 GFLOP (25.719 GFLOP/s @ duck_max), param_stream=0.144507G (7.033 Gparam/s @ duck_max), weight_stream=155.106 MiB (7.915 GB/s @ duck_max) [2026-04-08 08:54:08.190715 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=244, expert_tiles=748, avg_tile_batch=3.51, prepare=800.356µs, send=17.240048ms, judge_wait=223.19365ms, fetch=24.304024ms, reduce=19ns; duck time-ns stats: p50=195.630387ms, p90=195.872379ms, max=196.061408ms; kernel_model: matmul=7.222591 GFLOP (36.838 GFLOP/s @ duck_max), param_stream=1.029439G (5.251 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.909 GB/s @ duck_max) [2026-04-08 08:54:08.473366 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=239, expert_tiles=744, avg_tile_batch=3.53, prepare=667.289µs, send=17.223603ms, judge_wait=226.31689ms, fetch=23.851824ms, reduce=136ns; duck time-ns stats: p50=196.571988ms, p90=196.742622ms, max=196.960581ms; kernel_model: matmul=7.222591 GFLOP (36.670 GFLOP/s @ duck_max), param_stream=1.023934G (5.199 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.851 GB/s @ duck_max) [2026-04-08 08:54:08.758595 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=235, expert_tiles=750, avg_tile_batch=3.50, prepare=669.973µs, send=18.362425ms, judge_wait=229.940421ms, fetch=21.648046ms, reduce=100ns; duck time-ns stats: p50=204.216748ms, p90=204.579974ms, max=204.725714ms; kernel_model: matmul=7.222591 GFLOP (35.279 GFLOP/s @ duck_max), param_stream=1.032192G (5.042 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.675 GB/s @ duck_max) [2026-04-08 08:54:09.040591 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=751, avg_tile_batch=3.49, prepare=657.398µs, send=18.324831ms, judge_wait=224.62727ms, fetch=23.751951ms, reduce=139ns; duck time-ns stats: p50=195.375185ms, p90=195.789736ms, max=196.13254ms; kernel_model: matmul=7.222591 GFLOP (36.825 GFLOP/s @ duck_max), param_stream=1.033568G (5.270 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.931 GB/s @ duck_max) [2026-04-08 08:54:09.065076 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=81, expert_tiles=94, avg_tile_batch=2.04, prepare=56.695µs, send=2.521948ms, judge_wait=19.453339ms, fetch=1.477873ms, reduce=20ns; duck time-ns stats: p50=19.258218ms, p90=19.312087ms, max=19.316795ms; kernel_model: matmul=0.528482 GFLOP (27.359 GFLOP/s @ duck_max), param_stream=0.129368G (6.697 Gparam/s @ duck_max), weight_stream=138.857 MiB (7.538 GB/s @ duck_max) [2026-04-08 08:54:09.360441 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=749, avg_tile_batch=3.50, prepare=761.791µs, send=18.587933ms, judge_wait=219.696188ms, fetch=22.348478ms, reduce=138ns; duck time-ns stats: p50=192.562355ms, p90=192.969276ms, max=193.073123ms; kernel_model: matmul=7.222591 GFLOP (37.409 GFLOP/s @ duck_max), param_stream=1.030816G (5.339 Gparam/s @ duck_max), weight_stream=1106.425 MiB (6.009 GB/s @ duck_max) [2026-04-08 08:54:09.663107 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=756, avg_tile_batch=3.47, prepare=657.126µs, send=18.577064ms, judge_wait=246.457539ms, fetch=22.343218ms, reduce=19ns; duck time-ns stats: p50=219.214948ms, p90=219.490808ms, max=220.159926ms; kernel_model: matmul=7.222591 GFLOP (32.806 GFLOP/s @ duck_max), param_stream=1.040450G (4.726 Gparam/s @ duck_max), weight_stream=1116.766 MiB (5.319 GB/s @ duck_max) [2026-04-08 08:54:09.942418 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=750, avg_tile_batch=3.50, prepare=662.53µs, send=17.225505ms, judge_wait=225.142269ms, fetch=21.631669ms, reduce=163ns; duck time-ns stats: p50=198.532892ms, p90=198.727578ms, max=198.874714ms; kernel_model: matmul=7.222591 GFLOP (36.317 GFLOP/s @ duck_max), param_stream=1.032192G (5.190 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.841 GB/s @ duck_max) [2026-04-08 08:54:10.218243 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=748, avg_tile_batch=3.51, prepare=657.755µs, send=18.317854ms, judge_wait=219.473801ms, fetch=22.705655ms, reduce=155ns; duck time-ns stats: p50=192.297337ms, p90=193.081664ms, max=193.278723ms; kernel_model: matmul=7.222591 GFLOP (37.369 GFLOP/s @ duck_max), param_stream=1.029439G (5.326 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.995 GB/s @ duck_max) [2026-04-08 08:54:10.241436 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=87, expert_tiles=99, avg_tile_batch=1.94, prepare=54.915µs, send=1.303256ms, judge_wait=19.345435ms, fetch=1.477166ms, reduce=102ns; duck time-ns stats: p50=19.126989ms, p90=19.154514ms, max=19.172947ms; kernel_model: matmul=0.528482 GFLOP (27.564 GFLOP/s @ duck_max), param_stream=0.136249G (7.106 Gparam/s @ duck_max), weight_stream=146.243 MiB (7.998 GB/s @ duck_max) [2026-04-08 08:54:10.569008 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=750, avg_tile_batch=3.50, prepare=815.717µs, send=17.23691ms, judge_wait=257.276662ms, fetch=21.619192ms, reduce=132ns; duck time-ns stats: p50=231.705346ms, p90=231.972173ms, max=232.356573ms; kernel_model: matmul=7.222591 GFLOP (31.084 GFLOP/s @ duck_max), param_stream=1.032192G (4.442 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.000 GB/s @ duck_max) [2026-04-08 08:54:10.849922 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=751, avg_tile_batch=3.49, prepare=665.68µs, send=18.365986ms, judge_wait=225.047978ms, fetch=22.293323ms, reduce=21ns; duck time-ns stats: p50=197.973174ms, p90=198.134821ms, max=198.302219ms; kernel_model: matmul=7.222591 GFLOP (36.422 GFLOP/s @ duck_max), param_stream=1.033568G (5.212 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.866 GB/s @ duck_max) [2026-04-08 08:54:11.139182 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=221, expert_tiles=750, avg_tile_batch=3.50, prepare=663.115µs, send=18.425374ms, judge_wait=234.779397ms, fetch=20.641432ms, reduce=143ns; duck time-ns stats: p50=207.787979ms, p90=208.001938ms, max=208.128862ms; kernel_model: matmul=7.222591 GFLOP (34.702 GFLOP/s @ duck_max), param_stream=1.032192G (4.959 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.582 GB/s @ duck_max) [2026-04-08 08:54:11.422034 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=748, avg_tile_batch=3.51, prepare=664.47µs, send=17.225878ms, judge_wait=228.738154ms, fetch=21.657535ms, reduce=20ns; duck time-ns stats: p50=202.194474ms, p90=202.501394ms, max=202.844127ms; kernel_model: matmul=7.222591 GFLOP (35.607 GFLOP/s @ duck_max), param_stream=1.029439G (5.075 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.712 GB/s @ duck_max) [2026-04-08 08:54:11.445319 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=89, expert_tiles=101, avg_tile_batch=1.90, prepare=56.841µs, send=1.304727ms, judge_wait=19.427167ms, fetch=1.482195ms, reduce=135ns; duck time-ns stats: p50=19.173528ms, p90=19.206113ms, max=19.245826ms; kernel_model: matmul=0.528482 GFLOP (27.460 GFLOP/s @ duck_max), param_stream=0.139002G (7.222 Gparam/s @ duck_max), weight_stream=149.198 MiB (8.129 GB/s @ duck_max) [2026-04-08 08:54:11.734514 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=224, expert_tiles=742, avg_tile_batch=3.54, prepare=752.05µs, send=17.22833ms, judge_wait=218.605698ms, fetch=22.779675ms, reduce=20ns; duck time-ns stats: p50=191.563917ms, p90=192.034859ms, max=192.167453ms; kernel_model: matmul=7.222591 GFLOP (37.585 GFLOP/s @ duck_max), param_stream=1.021182G (5.314 Gparam/s @ duck_max), weight_stream=1096.085 MiB (5.981 GB/s @ duck_max) [2026-04-08 08:54:12.013996 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=221, expert_tiles=753, avg_tile_batch=3.48, prepare=666.51µs, send=18.560445ms, judge_wait=223.693482ms, fetch=22.791263ms, reduce=21ns; duck time-ns stats: p50=194.622655ms, p90=194.797958ms, max=195.020792ms; kernel_model: matmul=7.222591 GFLOP (37.035 GFLOP/s @ duck_max), param_stream=1.036321G (5.314 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.981 GB/s @ duck_max) [2026-04-08 08:54:12.295328 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=220, expert_tiles=748, avg_tile_batch=3.51, prepare=661.422µs, send=18.480495ms, judge_wait=224.699262ms, fetch=23.722901ms, reduce=16ns; duck time-ns stats: p50=196.728202ms, p90=196.979744ms, max=197.259618ms; kernel_model: matmul=7.222591 GFLOP (36.615 GFLOP/s @ duck_max), param_stream=1.029439G (5.219 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.874 GB/s @ duck_max) [2026-04-08 08:54:12.581793 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=218, expert_tiles=744, avg_tile_batch=3.53, prepare=665.612µs, send=17.226051ms, judge_wait=233.13467ms, fetch=21.639447ms, reduce=104ns; duck time-ns stats: p50=207.338652ms, p90=207.656373ms, max=207.98898ms; kernel_model: matmul=7.222591 GFLOP (34.726 GFLOP/s @ duck_max), param_stream=1.023934G (4.923 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.541 GB/s @ duck_max) [2026-04-08 08:54:12.604903 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=81, expert_tiles=97, avg_tile_batch=1.98, prepare=55.889µs, send=1.304913ms, judge_wait=19.348037ms, fetch=1.476569ms, reduce=104ns; duck time-ns stats: p50=19.119992ms, p90=19.158158ms, max=19.177015ms; kernel_model: matmul=0.528482 GFLOP (27.558 GFLOP/s @ duck_max), param_stream=0.133497G (6.961 Gparam/s @ duck_max), weight_stream=143.289 MiB (7.835 GB/s @ duck_max) [2026-04-08 08:54:12.900635 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=216, expert_tiles=750, avg_tile_batch=3.50, prepare=822.278µs, send=17.235487ms, judge_wait=222.424735ms, fetch=24.533167ms, reduce=20ns; duck time-ns stats: p50=194.685379ms, p90=194.972885ms, max=195.092442ms; kernel_model: matmul=7.222591 GFLOP (37.021 GFLOP/s @ duck_max), param_stream=1.032192G (5.291 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.955 GB/s @ duck_max) [2026-04-08 08:54:13.175406 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=215, expert_tiles=745, avg_tile_batch=3.52, prepare=663.599µs, send=18.384594ms, judge_wait=218.844526ms, fetch=22.336502ms, reduce=135ns; duck time-ns stats: p50=191.859957ms, p90=192.183412ms, max=192.314014ms; kernel_model: matmul=7.222591 GFLOP (37.556 GFLOP/s @ duck_max), param_stream=1.025311G (5.331 Gparam/s @ duck_max), weight_stream=1100.517 MiB (6.000 GB/s @ duck_max) [2026-04-08 08:54:13.451647 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=212, expert_tiles=740, avg_tile_batch=3.55, prepare=667.342µs, send=18.401732ms, judge_wait=218.462222ms, fetch=24.20768ms, reduce=20ns; duck time-ns stats: p50=190.988028ms, p90=191.255244ms, max=191.306308ms; kernel_model: matmul=7.222591 GFLOP (37.754 GFLOP/s @ duck_max), param_stream=1.018429G (5.324 Gparam/s @ duck_max), weight_stream=1093.130 MiB (5.992 GB/s @ duck_max) [2026-04-08 08:54:13.772687 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=218, expert_tiles=750, avg_tile_batch=3.50, prepare=668.441µs, send=18.431238ms, judge_wait=265.796114ms, fetch=21.630801ms, reduce=137ns; duck time-ns stats: p50=242.327609ms, p90=242.729407ms, max=242.855435ms; kernel_model: matmul=7.222591 GFLOP (29.740 GFLOP/s @ duck_max), param_stream=1.032192G (4.250 Gparam/s @ duck_max), weight_stream=1107.903 MiB (4.784 GB/s @ duck_max) [2026-04-08 08:54:13.795962 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=76, expert_tiles=92, avg_tile_batch=2.09, prepare=59.66µs, send=1.301201ms, judge_wait=19.448009ms, fetch=1.479187ms, reduce=104ns; duck time-ns stats: p50=19.186254ms, p90=19.224351ms, max=19.268606ms; kernel_model: matmul=0.528482 GFLOP (27.427 GFLOP/s @ duck_max), param_stream=0.126616G (6.571 Gparam/s @ duck_max), weight_stream=135.903 MiB (7.396 GB/s @ duck_max) [2026-04-08 08:54:14.095563 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=750, avg_tile_batch=3.50, prepare=775.183µs, send=18.417222ms, judge_wait=228.677161ms, fetch=21.592779ms, reduce=20ns; duck time-ns stats: p50=201.928326ms, p90=202.240185ms, max=202.35599ms; kernel_model: matmul=7.222591 GFLOP (35.693 GFLOP/s @ duck_max), param_stream=1.032192G (5.101 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.741 GB/s @ duck_max) [2026-04-08 08:54:14.368458 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=224, expert_tiles=746, avg_tile_batch=3.52, prepare=658.285µs, send=17.224119ms, judge_wait=218.795392ms, fetch=22.522755ms, reduce=19ns; duck time-ns stats: p50=190.349908ms, p90=190.607921ms, max=190.832796ms; kernel_model: matmul=7.222591 GFLOP (37.848 GFLOP/s @ duck_max), param_stream=1.026687G (5.380 Gparam/s @ duck_max), weight_stream=1101.994 MiB (6.055 GB/s @ duck_max) [2026-04-08 08:54:14.651272 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=223, expert_tiles=751, avg_tile_batch=3.49, prepare=659.201µs, send=18.493587ms, judge_wait=227.388773ms, fetch=22.502203ms, reduce=139ns; duck time-ns stats: p50=198.990369ms, p90=199.359223ms, max=199.602123ms; kernel_model: matmul=7.222591 GFLOP (36.185 GFLOP/s @ duck_max), param_stream=1.033568G (5.178 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.828 GB/s @ duck_max) [2026-04-08 08:54:14.939895 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=220, expert_tiles=744, avg_tile_batch=3.53, prepare=665.197µs, send=17.225144ms, judge_wait=234.114356ms, fetch=22.837864ms, reduce=135ns; duck time-ns stats: p50=207.004009ms, p90=207.310427ms, max=207.581325ms; kernel_model: matmul=7.222591 GFLOP (34.794 GFLOP/s @ duck_max), param_stream=1.023934G (4.933 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.552 GB/s @ duck_max) [2026-04-08 08:54:14.965515 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=93, expert_tiles=105, avg_tile_batch=1.83, prepare=56.261µs, send=2.615141ms, judge_wait=20.536042ms, fetch=1.481695ms, reduce=105ns; duck time-ns stats: p50=20.314087ms, p90=20.371807ms, max=20.398211ms; kernel_model: matmul=0.528482 GFLOP (25.908 GFLOP/s @ duck_max), param_stream=0.144507G (7.084 Gparam/s @ duck_max), weight_stream=155.106 MiB (7.973 GB/s @ duck_max) [2026-04-08 08:54:15.256841 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=214, expert_tiles=732, avg_tile_batch=3.58, prepare=817.762µs, send=17.236739ms, judge_wait=220.432163ms, fetch=22.195021ms, reduce=137ns; duck time-ns stats: p50=191.791001ms, p90=192.129686ms, max=192.407057ms; kernel_model: matmul=7.222591 GFLOP (37.538 GFLOP/s @ duck_max), param_stream=1.007419G (5.236 Gparam/s @ duck_max), weight_stream=1081.313 MiB (5.893 GB/s @ duck_max) [2026-04-08 08:54:15.532032 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=224, expert_tiles=753, avg_tile_batch=3.48, prepare=658.077µs, send=18.353936ms, judge_wait=219.170295ms, fetch=22.281556ms, reduce=134ns; duck time-ns stats: p50=190.180671ms, p90=190.495875ms, max=190.798357ms; kernel_model: matmul=7.222591 GFLOP (37.855 GFLOP/s @ duck_max), param_stream=1.036321G (5.431 Gparam/s @ duck_max), weight_stream=1112.334 MiB (6.113 GB/s @ duck_max) [2026-04-08 08:54:15.806705 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=748, avg_tile_batch=3.51, prepare=660.412µs, send=18.545577ms, judge_wait=219.167036ms, fetch=21.651793ms, reduce=19ns; duck time-ns stats: p50=192.848916ms, p90=193.086317ms, max=193.249234ms; kernel_model: matmul=7.222591 GFLOP (37.374 GFLOP/s @ duck_max), param_stream=1.029439G (5.327 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.995 GB/s @ duck_max) [2026-04-08 08:54:16.084568 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=223, expert_tiles=751, avg_tile_batch=3.49, prepare=660.969µs, send=17.220959ms, judge_wait=223.10202ms, fetch=22.210226ms, reduce=101ns; duck time-ns stats: p50=194.644055ms, p90=195.015472ms, max=195.848527ms; kernel_model: matmul=7.222591 GFLOP (36.878 GFLOP/s @ duck_max), param_stream=1.033568G (5.277 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.940 GB/s @ duck_max) [2026-04-08 08:54:16.109627 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=93, expert_tiles=104, avg_tile_batch=1.85, prepare=57.927µs, send=2.504586ms, judge_wait=20.022438ms, fetch=1.474934ms, reduce=106ns; duck time-ns stats: p50=19.800771ms, p90=19.82963ms, max=19.843861ms; kernel_model: matmul=0.528482 GFLOP (26.632 GFLOP/s @ duck_max), param_stream=0.143131G (7.213 Gparam/s @ duck_max), weight_stream=153.629 MiB (8.118 GB/s @ duck_max) [2026-04-08 08:54:16.406559 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=746, avg_tile_batch=3.52, prepare=819.103µs, send=17.213854ms, judge_wait=224.519395ms, fetch=23.195806ms, reduce=21ns; duck time-ns stats: p50=196.587404ms, p90=197.451063ms, max=197.654537ms; kernel_model: matmul=7.222591 GFLOP (36.541 GFLOP/s @ duck_max), param_stream=1.026687G (5.194 Gparam/s @ duck_max), weight_stream=1101.994 MiB (5.846 GB/s @ duck_max) [2026-04-08 08:54:16.683213 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=221, expert_tiles=743, avg_tile_batch=3.53, prepare=663.429µs, send=17.212339ms, judge_wait=221.393023ms, fetch=22.75915ms, reduce=16ns; duck time-ns stats: p50=192.909781ms, p90=193.313933ms, max=193.715403ms; kernel_model: matmul=7.222591 GFLOP (37.285 GFLOP/s @ duck_max), param_stream=1.022558G (5.279 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.941 GB/s @ duck_max) [2026-04-08 08:54:16.959122 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=223, expert_tiles=746, avg_tile_batch=3.52, prepare=665.552µs, send=18.376998ms, judge_wait=218.837385ms, fetch=23.324726ms, reduce=111ns; duck time-ns stats: p50=189.584635ms, p90=189.793928ms, max=189.889265ms; kernel_model: matmul=7.222591 GFLOP (38.036 GFLOP/s @ duck_max), param_stream=1.026687G (5.407 Gparam/s @ duck_max), weight_stream=1101.994 MiB (6.085 GB/s @ duck_max) [2026-04-08 08:54:17.235046 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=222, expert_tiles=744, avg_tile_batch=3.53, prepare=667.711µs, send=17.220773ms, judge_wait=220.0142ms, fetch=23.328489ms, reduce=105ns; duck time-ns stats: p50=192.656908ms, p90=192.92126ms, max=193.085093ms; kernel_model: matmul=7.222591 GFLOP (37.406 GFLOP/s @ duck_max), param_stream=1.023934G (5.303 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.968 GB/s @ duck_max) [2026-04-08 08:54:17.257664 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=85, expert_tiles=96, avg_tile_batch=2.00, prepare=55.849µs, send=1.292632ms, judge_wait=18.782097ms, fetch=1.492507ms, reduce=135ns; duck time-ns stats: p50=18.588763ms, p90=18.61336ms, max=18.631114ms; kernel_model: matmul=0.528482 GFLOP (28.366 GFLOP/s @ duck_max), param_stream=0.132121G (7.091 Gparam/s @ duck_max), weight_stream=141.812 MiB (7.981 GB/s @ duck_max) [2026-04-08 08:54:17.555043 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=222, expert_tiles=751, avg_tile_batch=3.49, prepare=816.059µs, send=17.214271ms, judge_wait=224.92778ms, fetch=23.315682ms, reduce=134ns; duck time-ns stats: p50=196.870768ms, p90=197.104543ms, max=197.177063ms; kernel_model: matmul=7.222591 GFLOP (36.630 GFLOP/s @ duck_max), param_stream=1.033568G (5.242 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.900 GB/s @ duck_max) [2026-04-08 08:54:17.838332 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=751, avg_tile_batch=3.49, prepare=656.512µs, send=17.227464ms, judge_wait=227.104612ms, fetch=23.766403ms, reduce=20ns; duck time-ns stats: p50=199.742423ms, p90=199.929523ms, max=200.085818ms; kernel_model: matmul=7.222591 GFLOP (36.097 GFLOP/s @ duck_max), param_stream=1.033568G (5.166 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.814 GB/s @ duck_max) [2026-04-08 08:54:18.117679 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=753, avg_tile_batch=3.48, prepare=655.784µs, send=18.412581ms, judge_wait=221.397483ms, fetch=24.346971ms, reduce=20ns; duck time-ns stats: p50=193.897157ms, p90=194.123908ms, max=194.416953ms; kernel_model: matmul=7.222591 GFLOP (37.150 GFLOP/s @ duck_max), param_stream=1.036321G (5.330 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.999 GB/s @ duck_max) [2026-04-08 08:54:18.399329 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=747, avg_tile_batch=3.51, prepare=657.76µs, send=18.377948ms, judge_wait=225.501328ms, fetch=22.503804ms, reduce=139ns; duck time-ns stats: p50=197.251187ms, p90=197.473313ms, max=197.631052ms; kernel_model: matmul=7.222591 GFLOP (36.546 GFLOP/s @ duck_max), param_stream=1.028063G (5.202 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.855 GB/s @ duck_max) [2026-04-08 08:54:18.423689 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=84, expert_tiles=98, avg_tile_batch=1.96, prepare=56.601µs, send=2.648968ms, judge_wait=19.13003ms, fetch=1.476101ms, reduce=29ns; duck time-ns stats: p50=18.900885ms, p90=18.933515ms, max=18.948466ms; kernel_model: matmul=0.528482 GFLOP (27.891 GFLOP/s @ duck_max), param_stream=0.134873G (7.118 Gparam/s @ duck_max), weight_stream=144.766 MiB (8.011 GB/s @ duck_max) [2026-04-08 08:54:18.723754 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=748, avg_tile_batch=3.51, prepare=818.726µs, send=17.234973ms, judge_wait=224.217653ms, fetch=24.328867ms, reduce=23ns; duck time-ns stats: p50=196.428706ms, p90=196.66344ms, max=196.881406ms; kernel_model: matmul=7.222591 GFLOP (36.685 GFLOP/s @ duck_max), param_stream=1.029439G (5.229 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.885 GB/s @ duck_max) [2026-04-08 08:54:19.006662 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=753, avg_tile_batch=3.48, prepare=660.205µs, send=18.362756ms, judge_wait=227.135631ms, fetch=22.111462ms, reduce=20ns; duck time-ns stats: p50=198.481875ms, p90=198.843591ms, max=199.216535ms; kernel_model: matmul=7.222591 GFLOP (36.255 GFLOP/s @ duck_max), param_stream=1.036321G (5.202 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.855 GB/s @ duck_max) [2026-04-08 08:54:19.290010 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=745, avg_tile_batch=3.52, prepare=660.271µs, send=17.225743ms, judge_wait=226.130175ms, fetch=24.753838ms, reduce=21ns; duck time-ns stats: p50=196.555003ms, p90=196.753489ms, max=196.82026ms; kernel_model: matmul=7.222591 GFLOP (36.696 GFLOP/s @ duck_max), param_stream=1.025311G (5.209 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.863 GB/s @ duck_max) [2026-04-08 08:54:19.568313 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=751, avg_tile_batch=3.49, prepare=658.308µs, send=18.337948ms, judge_wait=220.606901ms, fetch=24.084985ms, reduce=138ns; duck time-ns stats: p50=192.808786ms, p90=193.057915ms, max=193.227382ms; kernel_model: matmul=7.222591 GFLOP (37.379 GFLOP/s @ duck_max), param_stream=1.033568G (5.349 Gparam/s @ duck_max), weight_stream=1109.380 MiB (6.020 GB/s @ duck_max) [2026-04-08 08:54:19.593626 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=88, expert_tiles=103, avg_tile_batch=1.86, prepare=55.506µs, send=2.540735ms, judge_wait=20.256477ms, fetch=1.481735ms, reduce=20ns; duck time-ns stats: p50=20.063791ms, p90=20.09069ms, max=20.113584ms; kernel_model: matmul=0.528482 GFLOP (26.275 GFLOP/s @ duck_max), param_stream=0.141754G (7.048 Gparam/s @ duck_max), weight_stream=152.152 MiB (7.932 GB/s @ duck_max) [2026-04-08 08:54:19.887896 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=743, avg_tile_batch=3.53, prepare=774.355µs, send=17.213546ms, judge_wait=221.833564ms, fetch=23.751293ms, reduce=21ns; duck time-ns stats: p50=193.900968ms, p90=194.182004ms, max=194.440176ms; kernel_model: matmul=7.222591 GFLOP (37.146 GFLOP/s @ duck_max), param_stream=1.022558G (5.259 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.919 GB/s @ duck_max) [2026-04-08 08:54:20.170730 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=222, expert_tiles=753, avg_tile_batch=3.48, prepare=662.826µs, send=17.225258ms, judge_wait=229.508497ms, fetch=21.646772ms, reduce=19ns; duck time-ns stats: p50=204.612113ms, p90=204.943003ms, max=205.112678ms; kernel_model: matmul=7.222591 GFLOP (35.213 GFLOP/s @ duck_max), param_stream=1.036321G (5.052 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.686 GB/s @ duck_max) [2026-04-08 08:54:20.446749 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=221, expert_tiles=743, avg_tile_batch=3.53, prepare=659.536µs, send=17.225542ms, judge_wait=220.330401ms, fetch=24.069498ms, reduce=21ns; duck time-ns stats: p50=192.319846ms, p90=192.752103ms, max=193.172688ms; kernel_model: matmul=7.222591 GFLOP (37.389 GFLOP/s @ duck_max), param_stream=1.022558G (5.293 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.958 GB/s @ duck_max) [2026-04-08 08:54:20.729862 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=216, expert_tiles=742, avg_tile_batch=3.54, prepare=662.036µs, send=18.581972ms, judge_wait=225.542832ms, fetch=24.520917ms, reduce=21ns; duck time-ns stats: p50=195.774131ms, p90=196.006768ms, max=196.35434ms; kernel_model: matmul=7.222591 GFLOP (36.783 GFLOP/s @ duck_max), param_stream=1.021182G (5.201 Gparam/s @ duck_max), weight_stream=1096.085 MiB (5.853 GB/s @ duck_max) [2026-04-08 08:54:20.754067 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=85, expert_tiles=96, avg_tile_batch=2.00, prepare=56.753µs, send=1.378606ms, judge_wait=20.358306ms, fetch=1.479858ms, reduce=136ns; duck time-ns stats: p50=20.143412ms, p90=20.199076ms, max=20.224288ms; kernel_model: matmul=0.528482 GFLOP (26.131 GFLOP/s @ duck_max), param_stream=0.132121G (6.533 Gparam/s @ duck_max), weight_stream=141.812 MiB (7.353 GB/s @ duck_max) [2026-04-08 08:54:21.051126 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=760, avg_tile_batch=3.45, prepare=833.593µs, send=17.216169ms, judge_wait=224.676955ms, fetch=23.717153ms, reduce=20ns; duck time-ns stats: p50=194.721929ms, p90=194.950535ms, max=195.079737ms; kernel_model: matmul=7.222591 GFLOP (37.024 GFLOP/s @ duck_max), param_stream=1.045955G (5.362 Gparam/s @ duck_max), weight_stream=1122.675 MiB (6.035 GB/s @ duck_max) [2026-04-08 08:54:21.327570 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=743, avg_tile_batch=3.53, prepare=678.585µs, send=18.375019ms, judge_wait=220.437598ms, fetch=22.42586ms, reduce=14ns; duck time-ns stats: p50=191.967143ms, p90=192.346608ms, max=192.570761ms; kernel_model: matmul=7.222591 GFLOP (37.506 GFLOP/s @ duck_max), param_stream=1.022558G (5.310 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.976 GB/s @ duck_max) [2026-04-08 08:54:21.609692 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=760, avg_tile_batch=3.45, prepare=668.369µs, send=18.4033ms, judge_wait=224.136027ms, fetch=24.424169ms, reduce=16ns; duck time-ns stats: p50=194.62225ms, p90=194.945093ms, max=195.025508ms; kernel_model: matmul=7.222591 GFLOP (37.034 GFLOP/s @ duck_max), param_stream=1.045955G (5.363 Gparam/s @ duck_max), weight_stream=1122.675 MiB (6.036 GB/s @ duck_max) [2026-04-08 08:54:21.891890 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=228, expert_tiles=760, avg_tile_batch=3.45, prepare=666.061µs, send=18.421153ms, judge_wait=225.706335ms, fetch=22.877496ms, reduce=21ns; duck time-ns stats: p50=196.45423ms, p90=196.6834ms, max=196.739714ms; kernel_model: matmul=7.222591 GFLOP (36.711 GFLOP/s @ duck_max), param_stream=1.045955G (5.316 Gparam/s @ duck_max), weight_stream=1122.675 MiB (5.984 GB/s @ duck_max) [2026-04-08 08:54:21.915591 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=100, expert_tiles=107, avg_tile_batch=1.79, prepare=56.393µs, send=1.357749ms, judge_wait=19.821349ms, fetch=1.46872ms, reduce=20ns; duck time-ns stats: p50=19.642187ms, p90=19.666506ms, max=19.681864ms; kernel_model: matmul=0.528482 GFLOP (26.851 GFLOP/s @ duck_max), param_stream=0.147259G (7.482 Gparam/s @ duck_max), weight_stream=158.061 MiB (8.421 GB/s @ duck_max) [2026-04-08 08:54:22.211009 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=751, avg_tile_batch=3.49, prepare=817.643µs, send=17.211221ms, judge_wait=221.976154ms, fetch=24.305206ms, reduce=19ns; duck time-ns stats: p50=192.273691ms, p90=192.68319ms, max=192.804446ms; kernel_model: matmul=7.222591 GFLOP (37.461 GFLOP/s @ duck_max), param_stream=1.033568G (5.361 Gparam/s @ duck_max), weight_stream=1109.380 MiB (6.033 GB/s @ duck_max) [2026-04-08 08:54:22.491362 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=748, avg_tile_batch=3.51, prepare=665.102µs, send=18.387355ms, judge_wait=223.853516ms, fetch=22.887417ms, reduce=21ns; duck time-ns stats: p50=194.654727ms, p90=195.005ms, max=195.384473ms; kernel_model: matmul=7.222591 GFLOP (36.966 GFLOP/s @ duck_max), param_stream=1.029439G (5.269 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.930 GB/s @ duck_max) [2026-04-08 08:54:22.768567 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=756, avg_tile_batch=3.47, prepare=665.142µs, send=18.353806ms, judge_wait=220.608616ms, fetch=22.999203ms, reduce=23ns; duck time-ns stats: p50=191.370933ms, p90=191.656716ms, max=191.832423ms; kernel_model: matmul=7.222591 GFLOP (37.651 GFLOP/s @ duck_max), param_stream=1.040450G (5.424 Gparam/s @ duck_max), weight_stream=1116.766 MiB (6.104 GB/s @ duck_max) [2026-04-08 08:54:23.047591 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=228, expert_tiles=747, avg_tile_batch=3.51, prepare=659.72µs, send=18.340368ms, judge_wait=223.147735ms, fetch=22.317616ms, reduce=20ns; duck time-ns stats: p50=194.137404ms, p90=194.615309ms, max=194.889147ms; kernel_model: matmul=7.222591 GFLOP (37.060 GFLOP/s @ duck_max), param_stream=1.028063G (5.275 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.937 GB/s @ duck_max) [2026-04-08 08:54:23.071378 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=102, expert_tiles=111, avg_tile_batch=1.73, prepare=60.069µs, send=1.501433ms, judge_wait=19.736875ms, fetch=1.476778ms, reduce=137ns; duck time-ns stats: p50=19.511069ms, p90=19.537254ms, max=19.559661ms; kernel_model: matmul=0.528482 GFLOP (27.019 GFLOP/s @ duck_max), param_stream=0.152764G (7.810 Gparam/s @ duck_max), weight_stream=163.970 MiB (8.790 GB/s @ duck_max) [2026-04-08 08:54:23.368729 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=747, avg_tile_batch=3.51, prepare=811.225µs, send=17.23748ms, judge_wait=222.239313ms, fetch=23.516138ms, reduce=20ns; duck time-ns stats: p50=192.839882ms, p90=193.328391ms, max=193.413442ms; kernel_model: matmul=7.222591 GFLOP (37.343 GFLOP/s @ duck_max), param_stream=1.028063G (5.315 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.982 GB/s @ duck_max) [2026-04-08 08:54:23.647391 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=232, expert_tiles=754, avg_tile_batch=3.48, prepare=657.293µs, send=18.421366ms, judge_wait=221.71048ms, fetch=23.178839ms, reduce=20ns; duck time-ns stats: p50=192.486898ms, p90=192.829854ms, max=193.058454ms; kernel_model: matmul=7.222591 GFLOP (37.411 GFLOP/s @ duck_max), param_stream=1.037697G (5.375 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.050 GB/s @ duck_max) [2026-04-08 08:54:23.925221 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=748, avg_tile_batch=3.51, prepare=662.106µs, send=18.390435ms, judge_wait=221.145315ms, fetch=22.996173ms, reduce=20ns; duck time-ns stats: p50=192.359302ms, p90=192.590006ms, max=192.626516ms; kernel_model: matmul=7.222591 GFLOP (37.495 GFLOP/s @ duck_max), param_stream=1.029439G (5.344 Gparam/s @ duck_max), weight_stream=1104.948 MiB (6.015 GB/s @ duck_max) [2026-04-08 08:54:24.200600 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=214, expert_tiles=740, avg_tile_batch=3.55, prepare=663.684µs, send=18.509044ms, judge_wait=219.094873ms, fetch=22.546958ms, reduce=18ns; duck time-ns stats: p50=190.464013ms, p90=190.724145ms, max=190.938105ms; kernel_model: matmul=7.222591 GFLOP (37.827 GFLOP/s @ duck_max), param_stream=1.018429G (5.334 Gparam/s @ duck_max), weight_stream=1093.130 MiB (6.003 GB/s @ duck_max) [2026-04-08 08:54:24.223493 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=87, expert_tiles=98, avg_tile_batch=1.96, prepare=60.275µs, send=1.362753ms, judge_wait=18.945215ms, fetch=1.473206ms, reduce=102ns; duck time-ns stats: p50=18.75079ms, p90=18.785706ms, max=18.813714ms; kernel_model: matmul=0.528482 GFLOP (28.090 GFLOP/s @ duck_max), param_stream=0.134873G (7.169 Gparam/s @ duck_max), weight_stream=144.766 MiB (8.068 GB/s @ duck_max) [2026-04-08 08:54:24.520491 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=752, avg_tile_batch=3.49, prepare=763.448µs, send=17.235163ms, judge_wait=223.797144ms, fetch=23.638051ms, reduce=21ns; duck time-ns stats: p50=193.726317ms, p90=194.086951ms, max=194.43097ms; kernel_model: matmul=7.222591 GFLOP (37.147 GFLOP/s @ duck_max), param_stream=1.034945G (5.323 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.991 GB/s @ duck_max) [2026-04-08 08:54:24.807276 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=235, expert_tiles=760, avg_tile_batch=3.45, prepare=663.765µs, send=18.561511ms, judge_wait=228.7453ms, fetch=24.227575ms, reduce=21ns; duck time-ns stats: p50=201.370159ms, p90=201.6531ms, max=201.891076ms; kernel_model: matmul=7.222591 GFLOP (35.775 GFLOP/s @ duck_max), param_stream=1.045955G (5.181 Gparam/s @ duck_max), weight_stream=1122.675 MiB (5.831 GB/s @ duck_max) [2026-04-08 08:54:25.084552 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=749, avg_tile_batch=3.50, prepare=665.629µs, send=18.490185ms, judge_wait=220.854288ms, fetch=22.622768ms, reduce=132ns; duck time-ns stats: p50=191.630408ms, p90=191.904591ms, max=192.163627ms; kernel_model: matmul=7.222591 GFLOP (37.586 GFLOP/s @ duck_max), param_stream=1.030816G (5.364 Gparam/s @ duck_max), weight_stream=1106.425 MiB (6.037 GB/s @ duck_max) [2026-04-08 08:54:25.362309 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=217, expert_tiles=740, avg_tile_batch=3.55, prepare=665.656µs, send=18.316884ms, judge_wait=219.888532ms, fetch=24.232844ms, reduce=19ns; duck time-ns stats: p50=192.344051ms, p90=192.623011ms, max=193.057992ms; kernel_model: matmul=7.222591 GFLOP (37.412 GFLOP/s @ duck_max), param_stream=1.018429G (5.275 Gparam/s @ duck_max), weight_stream=1093.130 MiB (5.937 GB/s @ duck_max) [2026-04-08 08:54:25.392588 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=85, expert_tiles=99, avg_tile_batch=1.94, prepare=61.953µs, send=2.619755ms, judge_wait=25.084527ms, fetch=1.48748ms, reduce=133ns; duck time-ns stats: p50=24.835287ms, p90=24.87074ms, max=24.89293ms; kernel_model: matmul=0.528482 GFLOP (21.230 GFLOP/s @ duck_max), param_stream=0.136249G (5.473 Gparam/s @ duck_max), weight_stream=146.243 MiB (6.160 GB/s @ duck_max) [2026-04-08 08:54:25.688543 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=750, avg_tile_batch=3.50, prepare=753.343µs, send=18.325647ms, judge_wait=223.749956ms, fetch=22.455632ms, reduce=145ns; duck time-ns stats: p50=194.576245ms, p90=195.011068ms, max=195.496904ms; kernel_model: matmul=7.222591 GFLOP (36.945 GFLOP/s @ duck_max), param_stream=1.032192G (5.280 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.942 GB/s @ duck_max) [2026-04-08 08:54:25.964612 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=239, expert_tiles=754, avg_tile_batch=3.48, prepare=667.091µs, send=18.557834ms, judge_wait=221.078103ms, fetch=21.908983ms, reduce=19ns; duck time-ns stats: p50=192.457072ms, p90=192.806002ms, max=192.898487ms; kernel_model: matmul=7.222591 GFLOP (37.442 GFLOP/s @ duck_max), param_stream=1.037697G (5.379 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.055 GB/s @ duck_max) [2026-04-08 08:54:26.242495 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=232, expert_tiles=748, avg_tile_batch=3.51, prepare=666.01µs, send=17.224609ms, judge_wait=219.530269ms, fetch=26.559855ms, reduce=135ns; duck time-ns stats: p50=190.122175ms, p90=190.405311ms, max=190.892853ms; kernel_model: matmul=7.222591 GFLOP (37.836 GFLOP/s @ duck_max), param_stream=1.029439G (5.393 Gparam/s @ duck_max), weight_stream=1104.948 MiB (6.069 GB/s @ duck_max) [2026-04-08 08:54:26.523613 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=749, avg_tile_batch=3.50, prepare=704.709µs, send=18.421192ms, judge_wait=225.188302ms, fetch=22.923358ms, reduce=139ns; duck time-ns stats: p50=195.916145ms, p90=196.299225ms, max=196.488508ms; kernel_model: matmul=7.222591 GFLOP (36.758 GFLOP/s @ duck_max), param_stream=1.030816G (5.246 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.905 GB/s @ duck_max) [2026-04-08 08:54:26.549774 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=94, expert_tiles=104, avg_tile_batch=1.85, prepare=58.567µs, send=2.628576ms, judge_wait=21.061151ms, fetch=1.481588ms, reduce=138ns; duck time-ns stats: p50=20.813528ms, p90=20.865291ms, max=20.883233ms; kernel_model: matmul=0.528482 GFLOP (25.307 GFLOP/s @ duck_max), param_stream=0.143131G (6.854 Gparam/s @ duck_max), weight_stream=153.629 MiB (7.714 GB/s @ duck_max) [2026-04-08 08:54:26.842335 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=745, avg_tile_batch=3.52, prepare=748.999µs, send=17.239875ms, judge_wait=221.036849ms, fetch=23.694249ms, reduce=20ns; duck time-ns stats: p50=191.544626ms, p90=191.902473ms, max=191.96844ms; kernel_model: matmul=7.222591 GFLOP (37.624 GFLOP/s @ duck_max), param_stream=1.025311G (5.341 Gparam/s @ duck_max), weight_stream=1100.517 MiB (6.011 GB/s @ duck_max) [2026-04-08 08:54:27.116979 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=755, avg_tile_batch=3.48, prepare=662.735µs, send=17.223782ms, judge_wait=220.856369ms, fetch=22.081255ms, reduce=20ns; duck time-ns stats: p50=192.363536ms, p90=192.594794ms, max=192.747091ms; kernel_model: matmul=7.222591 GFLOP (37.472 GFLOP/s @ duck_max), param_stream=1.039073G (5.391 Gparam/s @ duck_max), weight_stream=1115.289 MiB (6.067 GB/s @ duck_max) [2026-04-08 08:54:27.392620 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=753, avg_tile_batch=3.48, prepare=667.99µs, send=17.224251ms, judge_wait=222.146601ms, fetch=21.751822ms, reduce=19ns; duck time-ns stats: p50=193.218201ms, p90=193.624077ms, max=193.881365ms; kernel_model: matmul=7.222591 GFLOP (37.253 GFLOP/s @ duck_max), param_stream=1.036321G (5.345 Gparam/s @ duck_max), weight_stream=1112.334 MiB (6.016 GB/s @ duck_max) [2026-04-08 08:54:27.676557 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=228, expert_tiles=747, avg_tile_batch=3.51, prepare=666.85µs, send=17.224961ms, judge_wait=229.307932ms, fetch=22.97159ms, reduce=21ns; duck time-ns stats: p50=201.071528ms, p90=201.323757ms, max=201.404212ms; kernel_model: matmul=7.222591 GFLOP (35.861 GFLOP/s @ duck_max), param_stream=1.028063G (5.104 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.745 GB/s @ duck_max) [2026-04-08 08:54:27.700564 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=100, expert_tiles=108, avg_tile_batch=1.78, prepare=58.656µs, send=2.425762ms, judge_wait=19.116456ms, fetch=1.476093ms, reduce=145ns; duck time-ns stats: p50=18.909096ms, p90=18.933442ms, max=18.945186ms; kernel_model: matmul=0.528482 GFLOP (27.895 GFLOP/s @ duck_max), param_stream=0.148636G (7.846 Gparam/s @ duck_max), weight_stream=159.538 MiB (8.830 GB/s @ duck_max) [2026-04-08 08:54:28.007343 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=243, expert_tiles=755, avg_tile_batch=3.48, prepare=799.737µs, send=17.231955ms, judge_wait=235.389686ms, fetch=23.488997ms, reduce=133ns; duck time-ns stats: p50=207.278776ms, p90=207.553677ms, max=208.004452ms; kernel_model: matmul=7.222591 GFLOP (34.723 GFLOP/s @ duck_max), param_stream=1.039073G (4.995 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.622 GB/s @ duck_max) [2026-04-08 08:54:28.282272 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=749, avg_tile_batch=3.50, prepare=660.148µs, send=18.570198ms, judge_wait=218.602047ms, fetch=23.387267ms, reduce=20ns; duck time-ns stats: p50=191.316615ms, p90=191.701878ms, max=191.785647ms; kernel_model: matmul=7.222591 GFLOP (37.660 GFLOP/s @ duck_max), param_stream=1.030816G (5.375 Gparam/s @ duck_max), weight_stream=1106.425 MiB (6.049 GB/s @ duck_max) [2026-04-08 08:54:28.560037 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=748, avg_tile_batch=3.51, prepare=658.879µs, send=17.221967ms, judge_wait=223.794215ms, fetch=22.380391ms, reduce=22ns; duck time-ns stats: p50=195.204313ms, p90=195.582759ms, max=196.329027ms; kernel_model: matmul=7.222591 GFLOP (36.788 GFLOP/s @ duck_max), param_stream=1.029439G (5.243 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.901 GB/s @ duck_max) [2026-04-08 08:54:28.835378 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=742, avg_tile_batch=3.54, prepare=663.144µs, send=17.226527ms, judge_wait=222.042348ms, fetch=21.649036ms, reduce=23ns; duck time-ns stats: p50=195.735423ms, p90=196.061767ms, max=196.221666ms; kernel_model: matmul=7.222591 GFLOP (36.808 GFLOP/s @ duck_max), param_stream=1.021182G (5.204 Gparam/s @ duck_max), weight_stream=1096.085 MiB (5.857 GB/s @ duck_max) [2026-04-08 08:54:28.858951 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=100, expert_tiles=106, avg_tile_batch=1.81, prepare=57.246µs, send=1.30153ms, judge_wait=19.835619ms, fetch=1.475241ms, reduce=19ns; duck time-ns stats: p50=19.583561ms, p90=19.611165ms, max=19.660428ms; kernel_model: matmul=0.528482 GFLOP (26.881 GFLOP/s @ duck_max), param_stream=0.145883G (7.420 Gparam/s @ duck_max), weight_stream=156.584 MiB (8.351 GB/s @ duck_max) [2026-04-08 08:54:29.162902 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=240, expert_tiles=746, avg_tile_batch=3.52, prepare=774.189µs, send=19.393841ms, judge_wait=223.807015ms, fetch=30.105423ms, reduce=140ns; duck time-ns stats: p50=196.306407ms, p90=196.564993ms, max=196.657969ms; kernel_model: matmul=7.222591 GFLOP (36.727 GFLOP/s @ duck_max), param_stream=1.026687G (5.221 Gparam/s @ duck_max), weight_stream=1101.994 MiB (5.876 GB/s @ duck_max) [2026-04-08 08:54:29.451109 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=755, avg_tile_batch=3.48, prepare=656.357µs, send=18.392578ms, judge_wait=231.374713ms, fetch=24.023781ms, reduce=137ns; duck time-ns stats: p50=203.960225ms, p90=204.169114ms, max=204.286641ms; kernel_model: matmul=7.222591 GFLOP (35.355 GFLOP/s @ duck_max), param_stream=1.039073G (5.086 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.725 GB/s @ duck_max) [2026-04-08 08:54:29.727931 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=754, avg_tile_batch=3.48, prepare=657.884µs, send=18.452641ms, judge_wait=221.688396ms, fetch=22.253243ms, reduce=136ns; duck time-ns stats: p50=193.231378ms, p90=193.555045ms, max=193.855266ms; kernel_model: matmul=7.222591 GFLOP (37.258 GFLOP/s @ duck_max), param_stream=1.037697G (5.353 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.025 GB/s @ duck_max) [2026-04-08 08:54:30.004693 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=756, avg_tile_batch=3.47, prepare=660.327µs, send=17.222968ms, judge_wait=221.68952ms, fetch=23.408593ms, reduce=20ns; duck time-ns stats: p50=193.529613ms, p90=193.809699ms, max=194.052568ms; kernel_model: matmul=7.222591 GFLOP (37.220 GFLOP/s @ duck_max), param_stream=1.040450G (5.362 Gparam/s @ duck_max), weight_stream=1116.766 MiB (6.035 GB/s @ duck_max) [2026-04-08 08:54:30.027224 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=86, expert_tiles=98, avg_tile_batch=1.96, prepare=56.697µs, send=1.425318ms, judge_wait=18.604411ms, fetch=1.489911ms, reduce=135ns; duck time-ns stats: p50=18.374039ms, p90=18.424823ms, max=18.463783ms; kernel_model: matmul=0.528482 GFLOP (28.623 GFLOP/s @ duck_max), param_stream=0.134873G (7.305 Gparam/s @ duck_max), weight_stream=144.766 MiB (8.221 GB/s @ duck_max) [2026-04-08 08:54:30.323235 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=752, avg_tile_batch=3.49, prepare=813.844µs, send=17.235812ms, judge_wait=224.392569ms, fetch=23.878546ms, reduce=19ns; duck time-ns stats: p50=196.409997ms, p90=196.98996ms, max=197.142037ms; kernel_model: matmul=7.222591 GFLOP (36.636 GFLOP/s @ duck_max), param_stream=1.034945G (5.250 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.909 GB/s @ duck_max) [2026-04-08 08:54:30.599130 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=747, avg_tile_batch=3.51, prepare=659.718µs, send=18.460274ms, judge_wait=219.301756ms, fetch=23.759364ms, reduce=133ns; duck time-ns stats: p50=191.449956ms, p90=191.811073ms, max=191.989704ms; kernel_model: matmul=7.222591 GFLOP (37.620 GFLOP/s @ duck_max), param_stream=1.028063G (5.355 Gparam/s @ duck_max), weight_stream=1103.471 MiB (6.027 GB/s @ duck_max) [2026-04-08 08:54:30.877017 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=237, expert_tiles=750, avg_tile_batch=3.50, prepare=660.643µs, send=17.222398ms, judge_wait=224.600322ms, fetch=21.631288ms, reduce=20ns; duck time-ns stats: p50=197.754492ms, p90=198.176631ms, max=198.523514ms; kernel_model: matmul=7.222591 GFLOP (36.382 GFLOP/s @ duck_max), param_stream=1.032192G (5.199 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.852 GB/s @ duck_max) [2026-04-08 08:54:31.155051 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=750, avg_tile_batch=3.50, prepare=656.533µs, send=17.209155ms, judge_wait=222.465043ms, fetch=23.976431ms, reduce=19ns; duck time-ns stats: p50=195.073178ms, p90=195.471151ms, max=195.83809ms; kernel_model: matmul=7.222591 GFLOP (36.880 GFLOP/s @ duck_max), param_stream=1.032192G (5.271 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.932 GB/s @ duck_max) [2026-04-08 08:54:31.180146 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=90, expert_tiles=99, avg_tile_batch=1.94, prepare=57.095µs, send=2.600168ms, judge_wait=20.035268ms, fetch=1.487308ms, reduce=145ns; duck time-ns stats: p50=19.78551ms, p90=19.874715ms, max=19.899196ms; kernel_model: matmul=0.528482 GFLOP (26.558 GFLOP/s @ duck_max), param_stream=0.136249G (6.847 Gparam/s @ duck_max), weight_stream=146.243 MiB (7.706 GB/s @ duck_max) [2026-04-08 08:54:31.473842 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=235, expert_tiles=759, avg_tile_batch=3.46, prepare=815.333µs, send=17.243058ms, judge_wait=223.314132ms, fetch=22.507348ms, reduce=20ns; duck time-ns stats: p50=194.965079ms, p90=195.325207ms, max=195.516468ms; kernel_model: matmul=7.222591 GFLOP (36.941 GFLOP/s @ duck_max), param_stream=1.044578G (5.343 Gparam/s @ duck_max), weight_stream=1121.197 MiB (6.013 GB/s @ duck_max) [2026-04-08 08:54:31.755944 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=751, avg_tile_batch=3.49, prepare=690.379µs, send=17.225238ms, judge_wait=227.992584ms, fetch=22.429209ms, reduce=21ns; duck time-ns stats: p50=199.585553ms, p90=199.84757ms, max=200.065147ms; kernel_model: matmul=7.222591 GFLOP (36.101 GFLOP/s @ duck_max), param_stream=1.033568G (5.166 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.814 GB/s @ duck_max) [2026-04-08 08:54:32.029210 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=238, expert_tiles=760, avg_tile_batch=3.45, prepare=665.246µs, send=17.225863ms, judge_wait=220.907422ms, fetch=20.628653ms, reduce=104ns; duck time-ns stats: p50=193.80424ms, p90=194.035477ms, max=194.191589ms; kernel_model: matmul=7.222591 GFLOP (37.193 GFLOP/s @ duck_max), param_stream=1.045955G (5.386 Gparam/s @ duck_max), weight_stream=1122.675 MiB (6.062 GB/s @ duck_max) [2026-04-08 08:54:32.307981 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=742, avg_tile_batch=3.54, prepare=667.271µs, send=17.226024ms, judge_wait=223.214516ms, fetch=23.879977ms, reduce=16ns; duck time-ns stats: p50=193.222527ms, p90=193.595871ms, max=194.484284ms; kernel_model: matmul=7.222591 GFLOP (37.137 GFLOP/s @ duck_max), param_stream=1.021182G (5.251 Gparam/s @ duck_max), weight_stream=1096.085 MiB (5.910 GB/s @ duck_max) [2026-04-08 08:54:32.331346 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=93, expert_tiles=104, avg_tile_batch=1.85, prepare=57.755µs, send=1.534909ms, judge_wait=19.365835ms, fetch=1.483924ms, reduce=115ns; duck time-ns stats: p50=19.191309ms, p90=19.213673ms, max=19.221939ms; kernel_model: matmul=0.528482 GFLOP (27.494 GFLOP/s @ duck_max), param_stream=0.143131G (7.446 Gparam/s @ duck_max), weight_stream=153.629 MiB (8.381 GB/s @ duck_max) [2026-04-08 08:54:32.623286 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=240, expert_tiles=757, avg_tile_batch=3.47, prepare=826.919µs, send=17.218636ms, judge_wait=221.523858ms, fetch=21.639424ms, reduce=132ns; duck time-ns stats: p50=192.787782ms, p90=193.075392ms, max=193.153463ms; kernel_model: matmul=7.222591 GFLOP (37.393 GFLOP/s @ duck_max), param_stream=1.041826G (5.394 Gparam/s @ duck_max), weight_stream=1118.243 MiB (6.071 GB/s @ duck_max) [2026-04-08 08:54:32.904053 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=753, avg_tile_batch=3.48, prepare=670.651µs, send=17.22278ms, judge_wait=223.85028ms, fetch=24.443756ms, reduce=21ns; duck time-ns stats: p50=196.287201ms, p90=196.675215ms, max=197.022422ms; kernel_model: matmul=7.222591 GFLOP (36.659 GFLOP/s @ duck_max), param_stream=1.036321G (5.260 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.920 GB/s @ duck_max) [2026-04-08 08:54:33.184229 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=745, avg_tile_batch=3.52, prepare=669.429µs, send=18.520168ms, judge_wait=223.907941ms, fetch=22.551501ms, reduce=19ns; duck time-ns stats: p50=195.223282ms, p90=195.481876ms, max=195.597773ms; kernel_model: matmul=7.222591 GFLOP (36.926 GFLOP/s @ duck_max), param_stream=1.025311G (5.242 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.900 GB/s @ duck_max) [2026-04-08 08:54:33.461355 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=228, expert_tiles=753, avg_tile_batch=3.48, prepare=694.229µs, send=18.433429ms, judge_wait=220.137146ms, fetch=23.221742ms, reduce=20ns; duck time-ns stats: p50=191.755998ms, p90=192.190537ms, max=192.443893ms; kernel_model: matmul=7.222591 GFLOP (37.531 GFLOP/s @ duck_max), param_stream=1.036321G (5.385 Gparam/s @ duck_max), weight_stream=1112.334 MiB (6.061 GB/s @ duck_max) [2026-04-08 08:54:33.483702 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=91, expert_tiles=97, avg_tile_batch=1.98, prepare=59.925µs, send=1.30394ms, judge_wait=18.491877ms, fetch=1.478491ms, reduce=135ns; duck time-ns stats: p50=18.232984ms, p90=18.26344ms, max=18.332801ms; kernel_model: matmul=0.528482 GFLOP (28.827 GFLOP/s @ duck_max), param_stream=0.133497G (7.282 Gparam/s @ duck_max), weight_stream=143.289 MiB (8.196 GB/s @ duck_max) [2026-04-08 08:54:33.779115 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=746, avg_tile_batch=3.52, prepare=804.41µs, send=18.339512ms, judge_wait=221.836501ms, fetch=23.250438ms, reduce=19ns; duck time-ns stats: p50=194.586071ms, p90=194.889841ms, max=195.160616ms; kernel_model: matmul=7.222591 GFLOP (37.008 GFLOP/s @ duck_max), param_stream=1.026687G (5.261 Gparam/s @ duck_max), weight_stream=1101.994 MiB (5.921 GB/s @ duck_max) [2026-04-08 08:54:34.056858 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=235, expert_tiles=758, avg_tile_batch=3.46, prepare=663.054µs, send=17.231949ms, judge_wait=221.469781ms, fetch=23.720854ms, reduce=21ns; duck time-ns stats: p50=193.510252ms, p90=193.841991ms, max=194.053871ms; kernel_model: matmul=7.222591 GFLOP (37.220 GFLOP/s @ duck_max), param_stream=1.043202G (5.376 Gparam/s @ duck_max), weight_stream=1119.720 MiB (6.050 GB/s @ duck_max) [2026-04-08 08:54:34.336435 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=240, expert_tiles=750, avg_tile_batch=3.50, prepare=665.657µs, send=17.224134ms, judge_wait=222.927252ms, fetch=24.020066ms, reduce=102ns; duck time-ns stats: p50=194.891487ms, p90=195.219987ms, max=195.357577ms; kernel_model: matmul=7.222591 GFLOP (36.971 GFLOP/s @ duck_max), param_stream=1.032192G (5.284 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.947 GB/s @ duck_max) [2026-04-08 08:54:34.617049 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=751, avg_tile_batch=3.49, prepare=662.641µs, send=18.415281ms, judge_wait=222.543564ms, fetch=24.387934ms, reduce=101ns; duck time-ns stats: p50=194.880415ms, p90=195.281299ms, max=195.392773ms; kernel_model: matmul=7.222591 GFLOP (36.964 GFLOP/s @ duck_max), param_stream=1.033568G (5.290 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.953 GB/s @ duck_max) [2026-04-08 08:54:34.640949 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=91, expert_tiles=105, avg_tile_batch=1.83, prepare=57.735µs, send=2.554281ms, judge_wait=18.871134ms, fetch=1.47277ms, reduce=15ns; duck time-ns stats: p50=18.681654ms, p90=18.710351ms, max=18.736281ms; kernel_model: matmul=0.528482 GFLOP (28.206 GFLOP/s @ duck_max), param_stream=0.144507G (7.713 Gparam/s @ duck_max), weight_stream=155.106 MiB (8.681 GB/s @ duck_max) [2026-04-08 08:54:34.948240 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=745, avg_tile_batch=3.52, prepare=801.66µs, send=18.359952ms, judge_wait=233.584416ms, fetch=23.469042ms, reduce=19ns; duck time-ns stats: p50=205.431551ms, p90=205.751165ms, max=206.026427ms; kernel_model: matmul=7.222591 GFLOP (35.057 GFLOP/s @ duck_max), param_stream=1.025311G (4.977 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.601 GB/s @ duck_max) [2026-04-08 08:54:35.246388 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=757, avg_tile_batch=3.47, prepare=659.368µs, send=18.347855ms, judge_wait=242.144457ms, fetch=22.334105ms, reduce=133ns; duck time-ns stats: p50=214.940237ms, p90=215.126539ms, max=215.437274ms; kernel_model: matmul=7.222591 GFLOP (33.525 GFLOP/s @ duck_max), param_stream=1.041826G (4.836 Gparam/s @ duck_max), weight_stream=1118.243 MiB (5.443 GB/s @ duck_max) [2026-04-08 08:54:35.545504 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=757, avg_tile_batch=3.47, prepare=660.719µs, send=18.566885ms, judge_wait=243.651616ms, fetch=21.667441ms, reduce=102ns; duck time-ns stats: p50=217.390374ms, p90=218.099062ms, max=218.752644ms; kernel_model: matmul=7.222591 GFLOP (33.017 GFLOP/s @ duck_max), param_stream=1.041826G (4.763 Gparam/s @ duck_max), weight_stream=1118.243 MiB (5.360 GB/s @ duck_max) [2026-04-08 08:54:35.841805 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=232, expert_tiles=749, avg_tile_batch=3.50, prepare=668.544µs, send=18.321458ms, judge_wait=241.103349ms, fetch=21.628195ms, reduce=142ns; duck time-ns stats: p50=215.605564ms, p90=215.918555ms, max=216.099185ms; kernel_model: matmul=7.222591 GFLOP (33.423 GFLOP/s @ duck_max), param_stream=1.030816G (4.770 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.369 GB/s @ duck_max) [2026-04-08 08:54:35.865417 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=86, expert_tiles=99, avg_tile_batch=1.94, prepare=57.282µs, send=1.306406ms, judge_wait=19.798584ms, fetch=1.467659ms, reduce=14ns; duck time-ns stats: p50=19.55822ms, p90=19.60873ms, max=19.616796ms; kernel_model: matmul=0.528482 GFLOP (26.940 GFLOP/s @ duck_max), param_stream=0.136249G (6.946 Gparam/s @ duck_max), weight_stream=146.243 MiB (7.817 GB/s @ duck_max) [2026-04-08 08:54:36.164392 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=755, avg_tile_batch=3.48, prepare=820.543µs, send=17.23889ms, judge_wait=226.77585ms, fetch=20.64ms, reduce=20ns; duck time-ns stats: p50=199.824077ms, p90=200.006018ms, max=200.239111ms; kernel_model: matmul=7.222591 GFLOP (36.070 GFLOP/s @ duck_max), param_stream=1.039073G (5.189 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.840 GB/s @ duck_max) [2026-04-08 08:54:36.438415 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=755, avg_tile_batch=3.48, prepare=672.516µs, send=17.230704ms, judge_wait=220.858469ms, fetch=20.650334ms, reduce=135ns; duck time-ns stats: p50=194.189475ms, p90=194.588422ms, max=194.910265ms; kernel_model: matmul=7.222591 GFLOP (37.056 GFLOP/s @ duck_max), param_stream=1.039073G (5.331 Gparam/s @ duck_max), weight_stream=1115.289 MiB (6.000 GB/s @ duck_max) [2026-04-08 08:54:36.716267 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=240, expert_tiles=761, avg_tile_batch=3.45, prepare=665.896µs, send=18.335096ms, judge_wait=222.554493ms, fetch=21.689013ms, reduce=104ns; duck time-ns stats: p50=196.210933ms, p90=196.412537ms, max=196.492955ms; kernel_model: matmul=7.222591 GFLOP (36.758 GFLOP/s @ duck_max), param_stream=1.047331G (5.330 Gparam/s @ duck_max), weight_stream=1124.152 MiB (5.999 GB/s @ duck_max) [2026-04-08 08:54:36.992400 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=749, avg_tile_batch=3.50, prepare=660.164µs, send=17.223806ms, judge_wait=220.197565ms, fetch=23.387599ms, reduce=15ns; duck time-ns stats: p50=192.913733ms, p90=193.207081ms, max=193.647079ms; kernel_model: matmul=7.222591 GFLOP (37.298 GFLOP/s @ duck_max), param_stream=1.030816G (5.323 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.991 GB/s @ duck_max) [2026-04-08 08:54:37.015071 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=88, expert_tiles=101, avg_tile_batch=1.90, prepare=59.047µs, send=1.455232ms, judge_wait=18.666333ms, fetch=1.476732ms, reduce=139ns; duck time-ns stats: p50=18.463839ms, p90=18.480804ms, max=18.51977ms; kernel_model: matmul=0.528482 GFLOP (28.536 GFLOP/s @ duck_max), param_stream=0.139002G (7.506 Gparam/s @ duck_max), weight_stream=149.198 MiB (8.447 GB/s @ duck_max) [2026-04-08 08:54:37.308335 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=239, expert_tiles=753, avg_tile_batch=3.48, prepare=806.569µs, send=17.219882ms, judge_wait=221.857713ms, fetch=21.649378ms, reduce=19ns; duck time-ns stats: p50=195.61733ms, p90=195.822871ms, max=195.943215ms; kernel_model: matmul=7.222591 GFLOP (36.861 GFLOP/s @ duck_max), param_stream=1.036321G (5.289 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.953 GB/s @ duck_max) [2026-04-08 08:54:37.584715 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=228, expert_tiles=744, avg_tile_batch=3.53, prepare=669.838µs, send=18.343263ms, judge_wait=220.324189ms, fetch=22.348116ms, reduce=133ns; duck time-ns stats: p50=193.259527ms, p90=193.586625ms, max=193.761801ms; kernel_model: matmul=7.222591 GFLOP (37.276 GFLOP/s @ duck_max), param_stream=1.023934G (5.285 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.948 GB/s @ duck_max) [2026-04-08 08:54:37.862696 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=757, avg_tile_batch=3.47, prepare=662.773µs, send=18.554412ms, judge_wait=223.497129ms, fetch=20.664008ms, reduce=136ns; duck time-ns stats: p50=197.524329ms, p90=197.699967ms, max=197.714683ms; kernel_model: matmul=7.222591 GFLOP (36.530 GFLOP/s @ duck_max), param_stream=1.041826G (5.269 Gparam/s @ duck_max), weight_stream=1118.243 MiB (5.931 GB/s @ duck_max) [2026-04-08 08:54:38.135726 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=745, avg_tile_batch=3.52, prepare=672.925µs, send=17.226202ms, judge_wait=219.883295ms, fetch=20.647311ms, reduce=21ns; duck time-ns stats: p50=194.29368ms, p90=194.561666ms, max=194.701666ms; kernel_model: matmul=7.222591 GFLOP (37.096 GFLOP/s @ duck_max), param_stream=1.025311G (5.266 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.927 GB/s @ duck_max) [2026-04-08 08:54:38.158856 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=91, expert_tiles=101, avg_tile_batch=1.90, prepare=57.327µs, send=1.304157ms, judge_wait=19.277194ms, fetch=1.483056ms, reduce=135ns; duck time-ns stats: p50=19.067039ms, p90=19.085723ms, max=19.107321ms; kernel_model: matmul=0.528482 GFLOP (27.659 GFLOP/s @ duck_max), param_stream=0.139002G (7.275 Gparam/s @ duck_max), weight_stream=149.198 MiB (8.188 GB/s @ duck_max) [2026-04-08 08:54:38.458325 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=242, expert_tiles=759, avg_tile_batch=3.46, prepare=802.956µs, send=18.344939ms, judge_wait=225.995731ms, fetch=23.156558ms, reduce=137ns; duck time-ns stats: p50=197.756702ms, p90=198.094251ms, max=198.279463ms; kernel_model: matmul=7.222591 GFLOP (36.426 GFLOP/s @ duck_max), param_stream=1.044578G (5.268 Gparam/s @ duck_max), weight_stream=1121.197 MiB (5.929 GB/s @ duck_max) [2026-04-08 08:54:38.735305 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=752, avg_tile_batch=3.49, prepare=658.717µs, send=18.371257ms, judge_wait=219.013795ms, fetch=24.216263ms, reduce=20ns; duck time-ns stats: p50=191.164701ms, p90=191.457867ms, max=191.917004ms; kernel_model: matmul=7.222591 GFLOP (37.634 GFLOP/s @ duck_max), param_stream=1.034945G (5.393 Gparam/s @ duck_max), weight_stream=1110.857 MiB (6.069 GB/s @ duck_max) [2026-04-08 08:54:39.008930 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=752, avg_tile_batch=3.49, prepare=667.349µs, send=18.503316ms, judge_wait=218.114655ms, fetch=21.635097ms, reduce=136ns; duck time-ns stats: p50=191.214394ms, p90=191.644271ms, max=191.736162ms; kernel_model: matmul=7.222591 GFLOP (37.669 GFLOP/s @ duck_max), param_stream=1.034945G (5.398 Gparam/s @ duck_max), weight_stream=1110.857 MiB (6.075 GB/s @ duck_max) [2026-04-08 08:54:39.279672 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=751, avg_tile_batch=3.49, prepare=666.437µs, send=17.222869ms, judge_wait=217.533676ms, fetch=20.655617ms, reduce=102ns; duck time-ns stats: p50=190.938452ms, p90=191.480585ms, max=192.097503ms; kernel_model: matmul=7.222591 GFLOP (37.599 GFLOP/s @ duck_max), param_stream=1.033568G (5.380 Gparam/s @ duck_max), weight_stream=1109.380 MiB (6.056 GB/s @ duck_max) [2026-04-08 08:54:39.302765 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=79, expert_tiles=94, avg_tile_batch=2.04, prepare=60.724µs, send=1.305079ms, judge_wait=19.210816ms, fetch=1.48446ms, reduce=102ns; duck time-ns stats: p50=18.989013ms, p90=19.014063ms, max=19.033104ms; kernel_model: matmul=0.528482 GFLOP (27.766 GFLOP/s @ duck_max), param_stream=0.129368G (6.797 Gparam/s @ duck_max), weight_stream=138.857 MiB (7.650 GB/s @ duck_max) [2026-04-08 08:54:39.603315 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=220, expert_tiles=753, avg_tile_batch=3.48, prepare=768.01µs, send=17.217815ms, judge_wait=228.477399ms, fetch=22.813832ms, reduce=134ns; duck time-ns stats: p50=201.393118ms, p90=201.845723ms, max=202.193408ms; kernel_model: matmul=7.222591 GFLOP (35.721 GFLOP/s @ duck_max), param_stream=1.036321G (5.125 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.769 GB/s @ duck_max) [2026-04-08 08:54:39.883219 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=219, expert_tiles=754, avg_tile_batch=3.48, prepare=666.353µs, send=18.549304ms, judge_wait=221.163353ms, fetch=24.787901ms, reduce=104ns; duck time-ns stats: p50=193.610777ms, p90=193.957288ms, max=194.41206ms; kernel_model: matmul=7.222591 GFLOP (37.151 GFLOP/s @ duck_max), param_stream=1.037697G (5.338 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.007 GB/s @ duck_max) [2026-04-08 08:54:40.163489 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=218, expert_tiles=745, avg_tile_batch=3.52, prepare=664.846µs, send=17.22406ms, judge_wait=223.514029ms, fetch=24.146126ms, reduce=136ns; duck time-ns stats: p50=196.048202ms, p90=196.510589ms, max=196.928465ms; kernel_model: matmul=7.222591 GFLOP (36.676 GFLOP/s @ duck_max), param_stream=1.025311G (5.207 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.860 GB/s @ duck_max) [2026-04-08 08:54:40.443911 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=216, expert_tiles=748, avg_tile_batch=3.51, prepare=664.947µs, send=18.502835ms, judge_wait=225.902663ms, fetch=20.634964ms, reduce=139ns; duck time-ns stats: p50=200.806466ms, p90=201.21472ms, max=201.305622ms; kernel_model: matmul=7.222591 GFLOP (35.879 GFLOP/s @ duck_max), param_stream=1.029439G (5.114 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.756 GB/s @ duck_max) [2026-04-08 08:54:40.465826 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=86, expert_tiles=95, avg_tile_batch=2.02, prepare=57.95µs, send=1.30253ms, judge_wait=18.072018ms, fetch=1.480642ms, reduce=135ns; duck time-ns stats: p50=17.849714ms, p90=17.885527ms, max=17.905888ms; kernel_model: matmul=0.528482 GFLOP (29.514 GFLOP/s @ duck_max), param_stream=0.130744G (7.302 Gparam/s @ duck_max), weight_stream=140.334 MiB (8.218 GB/s @ duck_max) [2026-04-08 08:54:40.767441 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=758, avg_tile_batch=3.46, prepare=751.106µs, send=17.236078ms, judge_wait=230.798037ms, fetch=21.638211ms, reduce=138ns; duck time-ns stats: p50=204.84728ms, p90=205.040245ms, max=205.342444ms; kernel_model: matmul=7.222591 GFLOP (35.173 GFLOP/s @ duck_max), param_stream=1.043202G (5.080 Gparam/s @ duck_max), weight_stream=1119.720 MiB (5.718 GB/s @ duck_max) [2026-04-08 08:54:41.045344 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=232, expert_tiles=758, avg_tile_batch=3.46, prepare=667.11µs, send=17.223496ms, judge_wait=221.079564ms, fetch=24.246457ms, reduce=20ns; duck time-ns stats: p50=193.336302ms, p90=193.63005ms, max=193.705553ms; kernel_model: matmul=7.222591 GFLOP (37.286 GFLOP/s @ duck_max), param_stream=1.043202G (5.386 Gparam/s @ duck_max), weight_stream=1119.720 MiB (6.061 GB/s @ duck_max) [2026-04-08 08:54:41.322963 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=752, avg_tile_batch=3.49, prepare=667.236µs, send=17.214133ms, judge_wait=221.684335ms, fetch=23.315528ms, reduce=135ns; duck time-ns stats: p50=194.477824ms, p90=195.070013ms, max=195.150273ms; kernel_model: matmul=7.222591 GFLOP (37.010 GFLOP/s @ duck_max), param_stream=1.034945G (5.303 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.969 GB/s @ duck_max) [2026-04-08 08:54:41.603203 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=751, avg_tile_batch=3.49, prepare=664.668µs, send=18.505648ms, judge_wait=222.4836ms, fetch=23.884111ms, reduce=19ns; duck time-ns stats: p50=195.156629ms, p90=195.416998ms, max=196.008616ms; kernel_model: matmul=7.222591 GFLOP (36.848 GFLOP/s @ duck_max), param_stream=1.033568G (5.273 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.935 GB/s @ duck_max) [2026-04-08 08:54:41.628312 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=88, expert_tiles=97, avg_tile_batch=1.98, prepare=57.215µs, send=2.49926ms, judge_wait=20.057519ms, fetch=1.483074ms, reduce=136ns; duck time-ns stats: p50=19.822122ms, p90=19.851835ms, max=19.876795ms; kernel_model: matmul=0.528482 GFLOP (26.588 GFLOP/s @ duck_max), param_stream=0.133497G (6.716 Gparam/s @ duck_max), weight_stream=143.289 MiB (7.559 GB/s @ duck_max) [2026-04-08 08:54:41.920538 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=241, expert_tiles=750, avg_tile_batch=3.50, prepare=770.387µs, send=18.364137ms, judge_wait=219.995221ms, fetch=22.713231ms, reduce=135ns; duck time-ns stats: p50=191.695423ms, p90=192.073171ms, max=192.318081ms; kernel_model: matmul=7.222591 GFLOP (37.555 GFLOP/s @ duck_max), param_stream=1.032192G (5.367 Gparam/s @ duck_max), weight_stream=1107.903 MiB (6.041 GB/s @ duck_max) [2026-04-08 08:54:42.196715 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=755, avg_tile_batch=3.48, prepare=665.538µs, send=18.255437ms, judge_wait=221.733097ms, fetch=21.642285ms, reduce=133ns; duck time-ns stats: p50=195.809193ms, p90=196.179056ms, max=196.385228ms; kernel_model: matmul=7.222591 GFLOP (36.778 GFLOP/s @ duck_max), param_stream=1.039073G (5.291 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.955 GB/s @ duck_max) [2026-04-08 08:54:42.476940 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=232, expert_tiles=755, avg_tile_batch=3.48, prepare=679.47µs, send=17.222502ms, judge_wait=224.627543ms, fetch=23.863191ms, reduce=22ns; duck time-ns stats: p50=196.728424ms, p90=197.080652ms, max=197.231471ms; kernel_model: matmul=7.222591 GFLOP (36.620 GFLOP/s @ duck_max), param_stream=1.039073G (5.268 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.929 GB/s @ duck_max) [2026-04-08 08:54:42.753211 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=754, avg_tile_batch=3.48, prepare=656.989µs, send=17.226139ms, judge_wait=222.861548ms, fetch=21.644297ms, reduce=140ns; duck time-ns stats: p50=196.662456ms, p90=197.052789ms, max=197.401856ms; kernel_model: matmul=7.222591 GFLOP (36.588 GFLOP/s @ duck_max), param_stream=1.037697G (5.257 Gparam/s @ duck_max), weight_stream=1113.811 MiB (5.916 GB/s @ duck_max) [2026-04-08 08:54:42.779678 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=94, expert_tiles=102, avg_tile_batch=1.88, prepare=57.794µs, send=2.650364ms, judge_wait=21.365677ms, fetch=1.466657ms, reduce=19ns; duck time-ns stats: p50=21.159051ms, p90=21.185217ms, max=21.219481ms; kernel_model: matmul=0.528482 GFLOP (24.906 GFLOP/s @ duck_max), param_stream=0.140378G (6.616 Gparam/s @ duck_max), weight_stream=150.675 MiB (7.446 GB/s @ duck_max) [2026-04-08 08:54:43.076150 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=750, avg_tile_batch=3.50, prepare=811.054µs, send=17.214463ms, judge_wait=226.08886ms, fetch=21.666444ms, reduce=20ns; duck time-ns stats: p50=199.809411ms, p90=200.123606ms, max=200.18508ms; kernel_model: matmul=7.222591 GFLOP (36.080 GFLOP/s @ duck_max), param_stream=1.032192G (5.156 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.803 GB/s @ duck_max) [2026-04-08 08:54:43.361694 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=752, avg_tile_batch=3.49, prepare=676.376µs, send=17.227012ms, judge_wait=230.862525ms, fetch=22.157531ms, reduce=20ns; duck time-ns stats: p50=202.299044ms, p90=202.610743ms, max=202.782049ms; kernel_model: matmul=7.222591 GFLOP (35.618 GFLOP/s @ duck_max), param_stream=1.034945G (5.104 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.744 GB/s @ duck_max) [2026-04-08 08:54:43.647962 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=224, expert_tiles=748, avg_tile_batch=3.51, prepare=677.742µs, send=17.228745ms, judge_wait=231.369136ms, fetch=22.358775ms, reduce=102ns; duck time-ns stats: p50=204.299261ms, p90=204.588425ms, max=204.6799ms; kernel_model: matmul=7.222591 GFLOP (35.287 GFLOP/s @ duck_max), param_stream=1.029439G (5.030 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.661 GB/s @ duck_max) [2026-04-08 08:54:43.930357 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=751, avg_tile_batch=3.49, prepare=664.351µs, send=18.570126ms, judge_wait=224.713ms, fetch=23.943431ms, reduce=21ns; duck time-ns stats: p50=197.289023ms, p90=197.555394ms, max=197.660351ms; kernel_model: matmul=7.222591 GFLOP (36.540 GFLOP/s @ duck_max), param_stream=1.033568G (5.229 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.885 GB/s @ duck_max) [2026-04-08 08:54:43.953066 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=79, expert_tiles=92, avg_tile_batch=2.09, prepare=55.396µs, send=1.45855ms, judge_wait=18.712223ms, fetch=1.475022ms, reduce=139ns; duck time-ns stats: p50=18.525449ms, p90=18.545265ms, max=18.565995ms; kernel_model: matmul=0.528482 GFLOP (28.465 GFLOP/s @ duck_max), param_stream=0.126616G (6.820 Gparam/s @ duck_max), weight_stream=135.903 MiB (7.676 GB/s @ duck_max) [2026-04-08 08:54:44.245456 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=753, avg_tile_batch=3.48, prepare=812.099µs, send=17.229468ms, judge_wait=220.931836ms, fetch=22.26972ms, reduce=20ns; duck time-ns stats: p50=192.226798ms, p90=192.472362ms, max=192.785025ms; kernel_model: matmul=7.222591 GFLOP (37.464 GFLOP/s @ duck_max), param_stream=1.036321G (5.376 Gparam/s @ duck_max), weight_stream=1112.334 MiB (6.050 GB/s @ duck_max) [2026-04-08 08:54:44.521038 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=746, avg_tile_batch=3.52, prepare=665.717µs, send=18.349189ms, judge_wait=220.097158ms, fetch=21.889258ms, reduce=20ns; duck time-ns stats: p50=191.522356ms, p90=191.753161ms, max=192.04601ms; kernel_model: matmul=7.222591 GFLOP (37.609 GFLOP/s @ duck_max), param_stream=1.026687G (5.346 Gparam/s @ duck_max), weight_stream=1101.994 MiB (6.017 GB/s @ duck_max) [2026-04-08 08:54:44.806602 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=223, expert_tiles=752, avg_tile_batch=3.49, prepare=665.797µs, send=18.355665ms, judge_wait=230.078589ms, fetch=21.78358ms, reduce=101ns; duck time-ns stats: p50=201.438971ms, p90=201.673276ms, max=201.800855ms; kernel_model: matmul=7.222591 GFLOP (35.791 GFLOP/s @ duck_max), param_stream=1.034945G (5.129 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.772 GB/s @ duck_max) [2026-04-08 08:54:45.087130 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=214, expert_tiles=748, avg_tile_batch=3.51, prepare=661.59µs, send=17.224415ms, judge_wait=223.893878ms, fetch=24.184419ms, reduce=134ns; duck time-ns stats: p50=196.248846ms, p90=196.572487ms, max=197.033074ms; kernel_model: matmul=7.222591 GFLOP (36.657 GFLOP/s @ duck_max), param_stream=1.029439G (5.225 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.880 GB/s @ duck_max) [2026-04-08 08:54:45.110487 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=84, expert_tiles=98, avg_tile_batch=1.96, prepare=56.613µs, send=1.30219ms, judge_wait=19.50496ms, fetch=1.478904ms, reduce=134ns; duck time-ns stats: p50=19.259682ms, p90=19.292926ms, max=19.334547ms; kernel_model: matmul=0.528482 GFLOP (27.334 GFLOP/s @ duck_max), param_stream=0.134873G (6.976 Gparam/s @ duck_max), weight_stream=144.766 MiB (7.851 GB/s @ duck_max) [2026-04-08 08:54:45.406120 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=758, avg_tile_batch=3.46, prepare=769.691µs, send=18.313289ms, judge_wait=224.615949ms, fetch=21.661305ms, reduce=135ns; duck time-ns stats: p50=198.642432ms, p90=198.841066ms, max=199.127113ms; kernel_model: matmul=7.222591 GFLOP (36.271 GFLOP/s @ duck_max), param_stream=1.043202G (5.239 Gparam/s @ duck_max), weight_stream=1119.720 MiB (5.896 GB/s @ duck_max) [2026-04-08 08:54:45.683299 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=748, avg_tile_batch=3.51, prepare=661.298µs, send=17.223571ms, judge_wait=221.439521ms, fetch=24.080852ms, reduce=20ns; duck time-ns stats: p50=193.6012ms, p90=193.863321ms, max=194.029739ms; kernel_model: matmul=7.222591 GFLOP (37.224 GFLOP/s @ duck_max), param_stream=1.029439G (5.306 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.971 GB/s @ duck_max) [2026-04-08 08:54:45.957066 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=747, avg_tile_batch=3.51, prepare=665.028µs, send=17.223744ms, judge_wait=221.375037ms, fetch=20.670351ms, reduce=20ns; duck time-ns stats: p50=194.37054ms, p90=194.754016ms, max=195.09441ms; kernel_model: matmul=7.222591 GFLOP (37.021 GFLOP/s @ duck_max), param_stream=1.028063G (5.270 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.931 GB/s @ duck_max) [2026-04-08 08:54:46.230779 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=223, expert_tiles=745, avg_tile_batch=3.52, prepare=591.333µs, send=17.213708ms, judge_wait=219.080851ms, fetch=23.082201ms, reduce=15ns; duck time-ns stats: p50=191.817018ms, p90=192.329394ms, max=192.396257ms; kernel_model: matmul=7.222591 GFLOP (37.540 GFLOP/s @ duck_max), param_stream=1.025311G (5.329 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.998 GB/s @ duck_max) [2026-04-08 08:54:46.256234 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=84, expert_tiles=95, avg_tile_batch=2.02, prepare=56.702µs, send=2.429252ms, judge_wait=20.545106ms, fetch=1.475552ms, reduce=135ns; duck time-ns stats: p50=20.340205ms, p90=20.371887ms, max=20.408372ms; kernel_model: matmul=0.528482 GFLOP (25.895 GFLOP/s @ duck_max), param_stream=0.130744G (6.406 Gparam/s @ duck_max), weight_stream=140.334 MiB (7.210 GB/s @ duck_max) [2026-04-08 08:54:46.549767 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=750, avg_tile_batch=3.50, prepare=753.613µs, send=17.228785ms, judge_wait=223.471881ms, fetch=22.297718ms, reduce=135ns; duck time-ns stats: p50=194.464597ms, p90=194.768417ms, max=194.885081ms; kernel_model: matmul=7.222591 GFLOP (37.061 GFLOP/s @ duck_max), param_stream=1.032192G (5.296 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.961 GB/s @ duck_max) [2026-04-08 08:54:46.839653 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=756, avg_tile_batch=3.47, prepare=669.899µs, send=18.540983ms, judge_wait=232.271315ms, fetch=24.694909ms, reduce=20ns; duck time-ns stats: p50=204.52467ms, p90=204.889994ms, max=205.252152ms; kernel_model: matmul=7.222591 GFLOP (35.189 GFLOP/s @ duck_max), param_stream=1.040450G (5.069 Gparam/s @ duck_max), weight_stream=1116.766 MiB (5.705 GB/s @ duck_max) [2026-04-08 08:54:47.111585 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=743, avg_tile_batch=3.53, prepare=663.744µs, send=18.52099ms, judge_wait=218.397934ms, fetch=20.626345ms, reduce=136ns; duck time-ns stats: p50=192.22645ms, p90=192.649053ms, max=192.922706ms; kernel_model: matmul=7.222591 GFLOP (37.438 GFLOP/s @ duck_max), param_stream=1.022558G (5.300 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.965 GB/s @ duck_max) [2026-04-08 08:54:47.390737 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=216, expert_tiles=755, avg_tile_batch=3.48, prepare=666.263µs, send=17.222594ms, judge_wait=225.165609ms, fetch=22.381565ms, reduce=19ns; duck time-ns stats: p50=196.768573ms, p90=196.931192ms, max=197.190174ms; kernel_model: matmul=7.222591 GFLOP (36.628 GFLOP/s @ duck_max), param_stream=1.039073G (5.269 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.931 GB/s @ duck_max) [2026-04-08 08:54:47.414873 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=90, expert_tiles=100, avg_tile_batch=1.92, prepare=56.665µs, send=2.648441ms, judge_wait=19.006912ms, fetch=1.478576ms, reduce=140ns; duck time-ns stats: p50=18.824425ms, p90=18.855349ms, max=18.868602ms; kernel_model: matmul=0.528482 GFLOP (28.009 GFLOP/s @ duck_max), param_stream=0.137626G (7.294 Gparam/s @ duck_max), weight_stream=147.720 MiB (8.209 GB/s @ duck_max) [2026-04-08 08:54:47.710390 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=757, avg_tile_batch=3.47, prepare=747.529µs, send=17.238242ms, judge_wait=223.997567ms, fetch=23.78451ms, reduce=19ns; duck time-ns stats: p50=195.985867ms, p90=196.189065ms, max=196.349455ms; kernel_model: matmul=7.222591 GFLOP (36.784 GFLOP/s @ duck_max), param_stream=1.041826G (5.306 Gparam/s @ duck_max), weight_stream=1118.243 MiB (5.972 GB/s @ duck_max) [2026-04-08 08:54:47.992144 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=753, avg_tile_batch=3.48, prepare=668.32µs, send=18.540948ms, judge_wait=227.176431ms, fetch=21.610417ms, reduce=20ns; duck time-ns stats: p50=201.959211ms, p90=202.275601ms, max=202.459329ms; kernel_model: matmul=7.222591 GFLOP (35.674 GFLOP/s @ duck_max), param_stream=1.036321G (5.119 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.761 GB/s @ duck_max) [2026-04-08 08:54:48.280876 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=755, avg_tile_batch=3.48, prepare=665.77µs, send=17.209739ms, judge_wait=236.449946ms, fetch=20.651443ms, reduce=136ns; duck time-ns stats: p50=214.469852ms, p90=214.807579ms, max=214.980675ms; kernel_model: matmul=7.222591 GFLOP (33.596 GFLOP/s @ duck_max), param_stream=1.039073G (4.833 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.440 GB/s @ duck_max) [2026-04-08 08:54:48.564981 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=218, expert_tiles=750, avg_tile_batch=3.50, prepare=673.824µs, send=17.216307ms, judge_wait=231.854956ms, fetch=20.618717ms, reduce=135ns; duck time-ns stats: p50=208.66298ms, p90=208.849747ms, max=209.204031ms; kernel_model: matmul=7.222591 GFLOP (34.524 GFLOP/s @ duck_max), param_stream=1.032192G (4.934 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.553 GB/s @ duck_max) [2026-04-08 08:54:48.589091 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=85, expert_tiles=95, avg_tile_batch=2.02, prepare=57.139µs, send=1.304791ms, judge_wait=20.33366ms, fetch=1.480928ms, reduce=105ns; duck time-ns stats: p50=20.07933ms, p90=20.106781ms, max=20.148697ms; kernel_model: matmul=0.528482 GFLOP (26.229 GFLOP/s @ duck_max), param_stream=0.130744G (6.489 Gparam/s @ duck_max), weight_stream=140.334 MiB (7.303 GB/s @ duck_max) [2026-04-08 08:54:48.897415 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=223, expert_tiles=749, avg_tile_batch=3.50, prepare=756.082µs, send=17.231414ms, judge_wait=238.874946ms, fetch=21.632361ms, reduce=136ns; duck time-ns stats: p50=212.671188ms, p90=212.972708ms, max=213.430106ms; kernel_model: matmul=7.222591 GFLOP (33.841 GFLOP/s @ duck_max), param_stream=1.030816G (4.830 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.436 GB/s @ duck_max) [2026-04-08 08:54:49.172039 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=750, avg_tile_batch=3.50, prepare=665.142µs, send=17.224056ms, judge_wait=219.275979ms, fetch=23.672365ms, reduce=140ns; duck time-ns stats: p50=192.118228ms, p90=192.41843ms, max=192.746134ms; kernel_model: matmul=7.222591 GFLOP (37.472 GFLOP/s @ duck_max), param_stream=1.032192G (5.355 Gparam/s @ duck_max), weight_stream=1107.903 MiB (6.027 GB/s @ duck_max) [2026-04-08 08:54:49.454540 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=750, avg_tile_batch=3.50, prepare=662.783µs, send=18.358489ms, judge_wait=228.088687ms, fetch=21.617782ms, reduce=132ns; duck time-ns stats: p50=201.718274ms, p90=202.167287ms, max=202.328143ms; kernel_model: matmul=7.222591 GFLOP (35.697 GFLOP/s @ duck_max), param_stream=1.032192G (5.102 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.742 GB/s @ duck_max) [2026-04-08 08:54:49.737430 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=221, expert_tiles=743, avg_tile_batch=3.53, prepare=664.32µs, send=17.222734ms, judge_wait=228.472482ms, fetch=22.788243ms, reduce=140ns; duck time-ns stats: p50=201.445994ms, p90=201.808691ms, max=201.915899ms; kernel_model: matmul=7.222591 GFLOP (35.770 GFLOP/s @ duck_max), param_stream=1.022558G (5.064 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.700 GB/s @ duck_max) [2026-04-08 08:54:49.759561 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=78, expert_tiles=92, avg_tile_batch=2.09, prepare=57.335µs, send=1.301952ms, judge_wait=18.38536ms, fetch=1.468314ms, reduce=20ns; duck time-ns stats: p50=18.149144ms, p90=18.182536ms, max=18.208407ms; kernel_model: matmul=0.528482 GFLOP (29.024 GFLOP/s @ duck_max), param_stream=0.126616G (6.954 Gparam/s @ duck_max), weight_stream=135.903 MiB (7.826 GB/s @ duck_max) [2026-04-08 08:54:50.064992 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=758, avg_tile_batch=3.46, prepare=749.23µs, send=17.237538ms, judge_wait=236.976323ms, fetch=20.643474ms, reduce=136ns; duck time-ns stats: p50=214.121056ms, p90=214.427276ms, max=214.605436ms; kernel_model: matmul=7.222591 GFLOP (33.655 GFLOP/s @ duck_max), param_stream=1.043202G (4.861 Gparam/s @ duck_max), weight_stream=1119.720 MiB (5.471 GB/s @ duck_max) [2026-04-08 08:54:50.355702 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=219, expert_tiles=743, avg_tile_batch=3.53, prepare=660.034µs, send=17.210327ms, judge_wait=237.446064ms, fetch=21.606915ms, reduce=136ns; duck time-ns stats: p50=218.751945ms, p90=219.153844ms, max=219.317558ms; kernel_model: matmul=7.222591 GFLOP (32.932 GFLOP/s @ duck_max), param_stream=1.022558G (4.662 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.248 GB/s @ duck_max) [2026-04-08 08:54:50.632103 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=220, expert_tiles=745, avg_tile_batch=3.52, prepare=659.809µs, send=17.222409ms, judge_wait=223.119324ms, fetch=21.651411ms, reduce=15ns; duck time-ns stats: p50=197.906123ms, p90=198.171193ms, max=198.68083ms; kernel_model: matmul=7.222591 GFLOP (36.353 GFLOP/s @ duck_max), param_stream=1.025311G (5.161 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.808 GB/s @ duck_max) [2026-04-08 08:54:50.913456 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=209, expert_tiles=744, avg_tile_batch=3.53, prepare=662.94µs, send=17.215159ms, judge_wait=227.958195ms, fetch=21.688672ms, reduce=137ns; duck time-ns stats: p50=206.028301ms, p90=206.590437ms, max=207.02541ms; kernel_model: matmul=7.222591 GFLOP (34.887 GFLOP/s @ duck_max), param_stream=1.023934G (4.946 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.567 GB/s @ duck_max) [2026-04-08 08:54:50.936622 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=89, expert_tiles=103, avg_tile_batch=1.86, prepare=56.923µs, send=1.303331ms, judge_wait=19.420568ms, fetch=1.473683ms, reduce=20ns; duck time-ns stats: p50=19.133769ms, p90=19.201939ms, max=19.243522ms; kernel_model: matmul=0.528482 GFLOP (27.463 GFLOP/s @ duck_max), param_stream=0.141754G (7.366 Gparam/s @ duck_max), weight_stream=152.152 MiB (8.291 GB/s @ duck_max) [2026-04-08 08:54:51.227320 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=237, expert_tiles=752, avg_tile_batch=3.49, prepare=762.241µs, send=17.213666ms, judge_wait=222.304387ms, fetch=20.629445ms, reduce=136ns; duck time-ns stats: p50=195.366074ms, p90=195.538417ms, max=195.841768ms; kernel_model: matmul=7.222591 GFLOP (36.880 GFLOP/s @ duck_max), param_stream=1.034945G (5.285 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.948 GB/s @ duck_max) [2026-04-08 08:54:51.502643 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=747, avg_tile_batch=3.51, prepare=677.242µs, send=17.21493ms, judge_wait=223.096901ms, fetch=20.646247ms, reduce=20ns; duck time-ns stats: p50=198.128554ms, p90=198.562151ms, max=199.261627ms; kernel_model: matmul=7.222591 GFLOP (36.247 GFLOP/s @ duck_max), param_stream=1.028063G (5.159 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.807 GB/s @ duck_max) [2026-04-08 08:54:51.774534 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=757, avg_tile_batch=3.47, prepare=704.773µs, send=17.224791ms, judge_wait=218.561146ms, fetch=21.634033ms, reduce=136ns; duck time-ns stats: p50=193.467169ms, p90=193.720271ms, max=193.882923ms; kernel_model: matmul=7.222591 GFLOP (37.252 GFLOP/s @ duck_max), param_stream=1.041826G (5.373 Gparam/s @ duck_max), weight_stream=1118.243 MiB (6.048 GB/s @ duck_max) [2026-04-08 08:54:52.046385 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=222, expert_tiles=747, avg_tile_batch=3.51, prepare=668.206µs, send=17.210394ms, judge_wait=217.619366ms, fetch=22.617357ms, reduce=21ns; duck time-ns stats: p50=190.512325ms, p90=190.787867ms, max=191.200164ms; kernel_model: matmul=7.222591 GFLOP (37.775 GFLOP/s @ duck_max), param_stream=1.028063G (5.377 Gparam/s @ duck_max), weight_stream=1103.471 MiB (6.052 GB/s @ duck_max) [2026-04-08 08:54:52.070370 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=93, expert_tiles=103, avg_tile_batch=1.86, prepare=57.295µs, send=2.539995ms, judge_wait=19.005579ms, fetch=1.467365ms, reduce=20ns; duck time-ns stats: p50=18.792121ms, p90=18.844645ms, max=18.863905ms; kernel_model: matmul=0.528482 GFLOP (28.016 GFLOP/s @ duck_max), param_stream=0.141754G (7.515 Gparam/s @ duck_max), weight_stream=152.152 MiB (8.458 GB/s @ duck_max) [2026-04-08 08:54:52.364514 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=757, avg_tile_batch=3.47, prepare=760.21µs, send=17.213126ms, judge_wait=222.475422ms, fetch=23.983049ms, reduce=20ns; duck time-ns stats: p50=194.428216ms, p90=194.864139ms, max=195.002792ms; kernel_model: matmul=7.222591 GFLOP (37.038 GFLOP/s @ duck_max), param_stream=1.041826G (5.343 Gparam/s @ duck_max), weight_stream=1118.243 MiB (6.013 GB/s @ duck_max) [2026-04-08 08:54:52.646633 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=750, avg_tile_batch=3.50, prepare=657.955µs, send=17.212169ms, judge_wait=226.069765ms, fetch=24.491876ms, reduce=20ns; duck time-ns stats: p50=198.657425ms, p90=198.982065ms, max=199.104286ms; kernel_model: matmul=7.222591 GFLOP (36.275 GFLOP/s @ duck_max), param_stream=1.032192G (5.184 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.835 GB/s @ duck_max) [2026-04-08 08:54:52.914352 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=222, expert_tiles=748, avg_tile_batch=3.51, prepare=663.428µs, send=17.223548ms, judge_wait=215.454966ms, fetch=20.660389ms, reduce=21ns; duck time-ns stats: p50=189.366936ms, p90=189.6611ms, max=189.850387ms; kernel_model: matmul=7.222591 GFLOP (38.044 GFLOP/s @ duck_max), param_stream=1.029439G (5.422 Gparam/s @ duck_max), weight_stream=1104.948 MiB (6.103 GB/s @ duck_max) [2026-04-08 08:54:53.208641 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=218, expert_tiles=754, avg_tile_batch=3.48, prepare=661.825µs, send=17.214563ms, judge_wait=242.002053ms, fetch=20.663199ms, reduce=22ns; duck time-ns stats: p50=217.202854ms, p90=217.453952ms, max=217.830475ms; kernel_model: matmul=7.222591 GFLOP (33.157 GFLOP/s @ duck_max), param_stream=1.037697G (4.764 Gparam/s @ duck_max), weight_stream=1113.811 MiB (5.362 GB/s @ duck_max) [2026-04-08 08:54:53.232579 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=85, expert_tiles=100, avg_tile_batch=1.92, prepare=58.507µs, send=1.301917ms, judge_wait=20.159717ms, fetch=1.477766ms, reduce=104ns; duck time-ns stats: p50=19.938913ms, p90=19.987044ms, max=20.013428ms; kernel_model: matmul=0.528482 GFLOP (26.406 GFLOP/s @ duck_max), param_stream=0.137626G (6.877 Gparam/s @ duck_max), weight_stream=147.720 MiB (7.740 GB/s @ duck_max) [2026-04-08 08:54:53.529105 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=209, expert_tiles=743, avg_tile_batch=3.53, prepare=3.609571ms, send=18.378023ms, judge_wait=220.702136ms, fetch=20.668428ms, reduce=20ns; duck time-ns stats: p50=196.828068ms, p90=197.119671ms, max=197.210506ms; kernel_model: matmul=7.222591 GFLOP (36.624 GFLOP/s @ duck_max), param_stream=1.022558G (5.185 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.836 GB/s @ duck_max) [2026-04-08 08:54:53.804167 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=216, expert_tiles=743, avg_tile_batch=3.53, prepare=688.176µs, send=18.46601ms, judge_wait=219.672518ms, fetch=21.666832ms, reduce=144ns; duck time-ns stats: p50=194.09797ms, p90=194.535062ms, max=194.670855ms; kernel_model: matmul=7.222591 GFLOP (37.102 GFLOP/s @ duck_max), param_stream=1.022558G (5.253 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.912 GB/s @ duck_max) [2026-04-08 08:54:54.075017 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=213, expert_tiles=741, avg_tile_batch=3.54, prepare=667.039µs, send=18.324138ms, judge_wait=216.650218ms, fetch=20.647686ms, reduce=19ns; duck time-ns stats: p50=191.746813ms, p90=192.231383ms, max=192.911061ms; kernel_model: matmul=7.222591 GFLOP (37.440 GFLOP/s @ duck_max), param_stream=1.019806G (5.286 Gparam/s @ duck_max), weight_stream=1094.608 MiB (5.950 GB/s @ duck_max) [2026-04-08 08:54:54.345896 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=222, expert_tiles=745, avg_tile_batch=3.52, prepare=674.239µs, send=18.349625ms, judge_wait=216.617633ms, fetch=20.676296ms, reduce=19ns; duck time-ns stats: p50=190.84735ms, p90=191.142458ms, max=191.615736ms; kernel_model: matmul=7.222591 GFLOP (37.693 GFLOP/s @ duck_max), param_stream=1.025311G (5.351 Gparam/s @ duck_max), weight_stream=1100.517 MiB (6.022 GB/s @ duck_max) [2026-04-08 08:54:54.370905 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=101, expert_tiles=108, avg_tile_batch=1.78, prepare=58.012µs, send=1.301036ms, judge_wait=21.164506ms, fetch=1.477812ms, reduce=102ns; duck time-ns stats: p50=20.88612ms, p90=20.92744ms, max=20.992857ms; kernel_model: matmul=0.528482 GFLOP (25.174 GFLOP/s @ duck_max), param_stream=0.148636G (7.080 Gparam/s @ duck_max), weight_stream=159.538 MiB (7.969 GB/s @ duck_max) [2026-04-08 08:54:54.698942 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=748, avg_tile_batch=3.51, prepare=3.536859ms, send=17.237642ms, judge_wait=218.362514ms, fetch=21.619485ms, reduce=134ns; duck time-ns stats: p50=192.87689ms, p90=193.306561ms, max=193.721119ms; kernel_model: matmul=7.222591 GFLOP (37.283 GFLOP/s @ duck_max), param_stream=1.029439G (5.314 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.981 GB/s @ duck_max) [2026-04-08 08:54:54.970917 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=232, expert_tiles=750, avg_tile_batch=3.50, prepare=659.431µs, send=18.345741ms, judge_wait=216.632681ms, fetch=21.64116ms, reduce=136ns; duck time-ns stats: p50=191.040299ms, p90=191.360519ms, max=191.456888ms; kernel_model: matmul=7.222591 GFLOP (37.724 GFLOP/s @ duck_max), param_stream=1.032192G (5.391 Gparam/s @ duck_max), weight_stream=1107.903 MiB (6.068 GB/s @ duck_max) [2026-04-08 08:54:55.244373 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=756, avg_tile_batch=3.47, prepare=659.833µs, send=17.225219ms, judge_wait=219.019387ms, fetch=21.625255ms, reduce=105ns; duck time-ns stats: p50=192.761113ms, p90=193.020758ms, max=193.133078ms; kernel_model: matmul=7.222591 GFLOP (37.397 GFLOP/s @ duck_max), param_stream=1.040450G (5.387 Gparam/s @ duck_max), weight_stream=1116.766 MiB (6.063 GB/s @ duck_max) [2026-04-08 08:54:55.517696 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=745, avg_tile_batch=3.52, prepare=660.584µs, send=17.22309ms, judge_wait=219.027086ms, fetch=21.663684ms, reduce=139ns; duck time-ns stats: p50=192.465503ms, p90=192.895112ms, max=193.0687ms; kernel_model: matmul=7.222591 GFLOP (37.409 GFLOP/s @ duck_max), param_stream=1.025311G (5.311 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.977 GB/s @ duck_max) [2026-04-08 08:54:55.541533 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=23, top_k=8, tasks=184, unique_experts=104, expert_tiles=110, avg_tile_batch=1.67, prepare=56.12µs, send=1.250541ms, judge_wait=20.151783ms, fetch=1.409329ms, reduce=20ns; duck time-ns stats: p50=19.948251ms, p90=19.97278ms, max=19.991259ms; kernel_model: matmul=0.506462 GFLOP (25.334 GFLOP/s @ duck_max), param_stream=0.151388G (7.573 Gparam/s @ duck_max), weight_stream=162.492 MiB (8.523 GB/s @ duck_max) [2026-04-08 08:54:55.559755 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.106388ms; phases: prepare=5.417µs, send=138.117µs, judge_wait=830.134µs, fetch=94.647µs, reduce=20ns, writeback=387ns; duck time-ns stats: p50=743.903µs, p90=747.763µs, max=753.764µs; effective_read: activated_experts=8, params=0.011010G (14.607 Gparam/s @ duck_max), memory=11.818 MiB (16.440 GB/s @ duck_max), judge_gap=76.37µs, judge_ratio=1.101x [2026-04-08 08:54:56.290117 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.166603ms; phases: prepare=6.147µs, send=242.607µs, judge_wait=783.605µs, fetch=96.749µs, reduce=20ns, writeback=446ns; duck time-ns stats: p50=699.642µs, p90=703.678µs, max=707.263µs; effective_read: activated_experts=8, params=0.011010G (15.567 Gparam/s @ duck_max), memory=11.818 MiB (17.521 GB/s @ duck_max), judge_gap=76.342µs, judge_ratio=1.108x Token # 1: 1888.468ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=6640 prop=6640 olap pair=700.1ms serial=1290.6ms gain=590.6ms ratio=0.46 s0=611.4ms s1=679.2ms wait=0.2/44.8ms pred gate=device [2026-04-08 08:54:56.294106 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 970.576µs; phases: prepare=3.369µs, send=61.567µs, judge_wait=776.307µs, fetch=92.594µs, reduce=20ns, writeback=401ns; duck time-ns stats: p50=691.892µs, p90=698.021µs, max=702.072µs; effective_read: activated_experts=8, params=0.011010G (15.682 Gparam/s @ duck_max), memory=11.818 MiB (17.650 GB/s @ duck_max), judge_gap=74.235µs, judge_ratio=1.106x Token # 2: 3.842ms; value: next_token_ids=tensor([6640], device='cuda:0') mtp accept=1 prop=6640 top1=6640 accp=0.989 next=pair draft=1134 prop=7578 pred gate=device [2026-04-08 08:54:56.406786 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 981.157µs; phases: prepare=3.603µs, send=61.744µs, judge_wait=785.105µs, fetch=90.827µs, reduce=21ns, writeback=598ns; duck time-ns stats: p50=696.314µs, p90=704.814µs, max=707.212µs; effective_read: activated_experts=8, params=0.011010G (15.568 Gparam/s @ duck_max), memory=11.818 MiB (17.522 GB/s @ duck_max), judge_gap=77.893µs, judge_ratio=1.110x Token # 3: 112.755ms; value: next_token_ids=tensor([16289], device='cuda:0') mtp accept=0 prop=7578 top1=1134 accp=0.742 next=draft=36652 prop=36652 olap pair=107.5ms serial=190.0ms gain=82.6ms ratio=0.43 s0=4.5ms s1=185.6ms wait=0.1/52.0ms pred gate=device [2026-04-08 08:54:56.521171 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.006354ms; phases: prepare=3.597µs, send=61.126µs, judge_wait=811.772µs, fetch=93.622µs, reduce=20ns, writeback=451ns; duck time-ns stats: p50=703.333µs, p90=708.966µs, max=713.192µs; effective_read: activated_experts=8, params=0.011010G (15.438 Gparam/s @ duck_max), memory=11.818 MiB (17.375 GB/s @ duck_max), judge_gap=98.58µs, judge_ratio=1.138x Token # 4: 114.426ms; value: next_token_ids=tensor([36652], device='cuda:0') mtp accept=1 prop=36652 top1=36652 accp=0.998 next=draft=5680 prop=5680 olap pair=109.0ms serial=192.9ms gain=83.9ms ratio=0.43 s0=4.5ms s1=188.4ms wait=0.1/51.9ms pred gate=device [2026-04-08 08:54:56.525109 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 975.129µs; phases: prepare=3.09µs, send=62.207µs, judge_wait=781.85µs, fetch=91.864µs, reduce=20ns, writeback=452ns; duck time-ns stats: p50=695.902µs, p90=700.733µs, max=706.827µs; effective_read: activated_experts=8, params=0.011010G (15.577 Gparam/s @ duck_max), memory=11.818 MiB (17.531 GB/s @ duck_max), judge_gap=75.023µs, judge_ratio=1.106x Token # 5: 3.791ms; value: next_token_ids=tensor([5680], device='cuda:0') mtp accept=1 prop=5680 top1=5680 accp=0.996 next=pair draft=2823 prop=2823 pred gate=device [2026-04-08 08:54:56.640439 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 983.705µs; phases: prepare=3.546µs, send=61.778µs, judge_wait=790.052µs, fetch=91.771µs, reduce=20ns, writeback=366ns; duck time-ns stats: p50=700.88µs, p90=704.582µs, max=709.509µs; effective_read: activated_experts=8, params=0.011010G (15.518 Gparam/s @ duck_max), memory=11.818 MiB (17.465 GB/s @ duck_max), judge_gap=80.543µs, judge_ratio=1.114x Token # 6: 115.449ms; value: next_token_ids=tensor([2823], device='cuda:0') mtp accept=1 prop=2823 top1=2823 accp=1.000 next=draft=7790 prop=20418 olap pair=110.1ms serial=195.4ms gain=85.3ms ratio=0.44 s0=3.9ms s1=191.5ms wait=0.1/53.1ms pred gate=device [2026-04-08 08:54:56.644312 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 958.618µs; phases: prepare=3.288µs, send=60.806µs, judge_wait=766.403µs, fetch=91.305µs, reduce=21ns, writeback=476ns; duck time-ns stats: p50=683.766µs, p90=688.936µs, max=693.182µs; effective_read: activated_experts=8, params=0.011010G (15.883 Gparam/s @ duck_max), memory=11.818 MiB (17.877 GB/s @ duck_max), judge_gap=73.221µs, judge_ratio=1.106x Token # 7: 3.769ms; value: next_token_ids=tensor([20418], device='cuda:0') mtp accept=1 prop=20418 top1=20418 accp=0.180 next=pair draft=4043 prop=4043 pred gate=device [2026-04-08 08:54:56.759263 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 972.163µs; phases: prepare=3.873µs, send=61.75µs, judge_wait=777.546µs, fetch=92.612µs, reduce=20ns, writeback=326ns; duck time-ns stats: p50=694.152µs, p90=698.108µs, max=700.083µs; effective_read: activated_experts=8, params=0.011010G (15.727 Gparam/s @ duck_max), memory=11.818 MiB (17.700 GB/s @ duck_max), judge_gap=77.463µs, judge_ratio=1.111x Token # 8: 115.037ms; value: next_token_ids=tensor([4043], device='cuda:0') mtp accept=1 prop=4043 top1=4043 accp=1.000 next=draft=7790 prop=7790 olap pair=109.8ms serial=194.9ms gain=85.1ms ratio=0.44 s0=3.8ms s1=191.1ms wait=0.1/53.2ms pred gate=device [2026-04-08 08:54:56.763108 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 972.215µs; phases: prepare=3.243µs, send=60.623µs, judge_wait=781.337µs, fetch=90.473µs, reduce=20ns, writeback=455ns; duck time-ns stats: p50=698.705µs, p90=704.292µs, max=705.519µs; effective_read: activated_experts=8, params=0.011010G (15.606 Gparam/s @ duck_max), memory=11.818 MiB (17.564 GB/s @ duck_max), judge_gap=75.818µs, judge_ratio=1.107x Token # 9: 3.752ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=7790 top1=303 accp=0.069 next=pair draft=1134 prop=1134 pred gate=device Token # 10: 115.231ms; value: next_token_ids=tensor([1134], device='cuda:0') mtp accept=1 prop=1134 top1=1134 accp=0.728 next=draft=10539 prop=10539 olap pair=109.9ms serial=195.2ms gain=85.3ms ratio=0.44 s0=3.8ms s1=191.4ms wait=0.1/53.2ms pred gate=device Token # 11: 3.712ms; value: next_token_ids=tensor([10539], device='cuda:0') mtp accept=1 prop=10539 top1=10539 accp=1.000 next=pair draft=642 prop=642 pred gate=device Token # 12: 115.917ms; value: next_token_ids=tensor([642], device='cuda:0') mtp accept=1 prop=642 top1=642 accp=0.827 next=draft=1395 prop=1395 olap pair=110.6ms serial=196.6ms gain=85.9ms ratio=0.44 s0=3.8ms s1=192.8ms wait=0.1/53.2ms pred gate=device Token # 13: 3.798ms; value: next_token_ids=tensor([1395], device='cuda:0') mtp accept=1 prop=1395 top1=1395 accp=1.000 next=pair draft=44750 prop=44750 pred gate=device Token # 14: 115.969ms; value: next_token_ids=tensor([44750], device='cuda:0') mtp accept=1 prop=44750 top1=44750 accp=0.994 next=draft=17001 prop=17001 olap pair=110.7ms serial=196.6ms gain=85.9ms ratio=0.44 s0=3.8ms s1=192.7ms wait=0.1/53.2ms pred gate=device Token # 15: 3.827ms; value: next_token_ids=tensor([17001], device='cuda:0') mtp accept=1 prop=17001 top1=17001 accp=1.000 next=pair draft=978 prop=978 pred gate=device Token # 16: 115.298ms; value: next_token_ids=tensor([978], device='cuda:0') mtp accept=1 prop=978 top1=978 accp=1.000 next=draft=9691 prop=8974 olap pair=110.0ms serial=195.5ms gain=85.4ms ratio=0.44 s0=3.8ms s1=191.7ms wait=0.1/53.2ms pred gate=device Token # 17: 3.852ms; value: next_token_ids=tensor([1081], device='cuda:0') mtp accept=0 prop=8974 top1=9691 accp=0.566 next=pair draft=2935 prop=2935 pred gate=device Token # 18: 116.669ms; value: next_token_ids=tensor([2935], device='cuda:0') mtp accept=1 prop=2935 top1=2935 accp=1.000 next=draft=30757 prop=30757 olap pair=111.3ms serial=197.8ms gain=86.5ms ratio=0.44 s0=3.9ms s1=193.9ms wait=0.1/53.1ms pred gate=device Token # 19: 3.762ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=30757 top1=303 accp=0.204 next=pair draft=39595 prop=39595 pred gate=device Token # 20: 116.010ms; value: next_token_ids=tensor([72861], device='cuda:0') mtp accept=0 prop=39595 top1=72861 accp=0.240 next=draft=118230 prop=118230 olap pair=110.7ms serial=196.8ms gain=86.1ms ratio=0.44 s0=3.7ms s1=193.1ms wait=0.1/53.3ms pred gate=device Token # 21: 116.587ms; value: next_token_ids=tensor([118230], device='cuda:0') mtp accept=1 prop=118230 top1=118230 accp=0.556 next=draft=5293 prop=5293 olap pair=111.1ms serial=197.5ms gain=86.4ms ratio=0.44 s0=3.8ms s1=193.7ms wait=0.1/53.2ms pred gate=device Token # 22: 3.832ms; value: next_token_ids=tensor([5293], device='cuda:0') mtp accept=1 prop=5293 top1=5293 accp=0.975 next=pair draft=2823 prop=2823 pred gate=device Token # 23: 116.314ms; value: next_token_ids=tensor([2823], device='cuda:0') mtp accept=1 prop=2823 top1=2823 accp=1.000 next=draft=18847 prop=18847 olap pair=111.0ms serial=197.2ms gain=86.3ms ratio=0.44 s0=3.8ms s1=193.5ms wait=0.1/53.2ms pred gate=device Token # 24: 3.752ms; value: next_token_ids=tensor([18847], device='cuda:0') mtp accept=1 prop=18847 top1=18847 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 25: 115.822ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=27620 prop=10539 olap pair=110.6ms serial=196.5ms gain=85.9ms ratio=0.44 s0=4.8ms s1=191.7ms wait=0.1/52.0ms pred gate=device Token # 26: 3.769ms; value: next_token_ids=tensor([10539], device='cuda:0') mtp accept=1 prop=10539 top1=556 accp=0.709 next=pair draft=8403 prop=17768 pred gate=device Token # 27: 115.471ms; value: next_token_ids=tensor([8403], device='cuda:0') mtp accept=0 prop=17768 top1=8403 accp=0.558 next=draft=60345 prop=556 olap pair=110.2ms serial=195.5ms gain=85.3ms ratio=0.44 s0=3.8ms s1=191.7ms wait=0.1/53.2ms pred gate=device Token # 28: 116.362ms; value: next_token_ids=tensor([61242], device='cuda:0') mtp accept=0 prop=556 top1=61242 accp=0.108 next=draft=3975 prop=3975 olap pair=110.9ms serial=197.0ms gain=86.2ms ratio=0.44 s0=3.8ms s1=193.3ms wait=0.1/53.3ms pred gate=device Token # 29: 116.606ms; value: next_token_ids=tensor([3975], device='cuda:0') mtp accept=1 prop=3975 top1=3975 accp=0.999 next=draft=13763 prop=13763 olap pair=111.2ms serial=197.6ms gain=86.4ms ratio=0.44 s0=4.0ms s1=193.6ms wait=0.1/52.9ms pred gate=device Token # 30: 3.772ms; value: next_token_ids=tensor([13763], device='cuda:0') mtp accept=1 prop=13763 top1=13763 accp=0.897 next=pair draft=11854 prop=11854 pred gate=device Token # 31: 116.151ms; value: next_token_ids=tensor([11854], device='cuda:0') mtp accept=1 prop=11854 top1=11854 accp=0.991 next=draft=303 prop=303 olap pair=110.7ms serial=196.7ms gain=85.9ms ratio=0.44 s0=4.0ms s1=192.7ms wait=0.1/53.0ms pred gate=device Token # 32: 3.750ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.998 next=pair draft=27620 prop=27620 pred gate=device Token # 33: 116.033ms; value: next_token_ids=tensor([27620], device='cuda:0') mtp accept=1 prop=27620 top1=27620 accp=0.996 next=draft=5680 prop=5680 olap pair=110.7ms serial=196.7ms gain=86.0ms ratio=0.44 s0=4.0ms s1=192.8ms wait=0.1/53.0ms pred gate=device Token # 34: 3.811ms; value: next_token_ids=tensor([5680], device='cuda:0') mtp accept=1 prop=5680 top1=5680 accp=1.000 next=pair draft=5149 prop=5149 pred gate=device Token # 35: 116.093ms; value: next_token_ids=tensor([5149], device='cuda:0') mtp accept=1 prop=5149 top1=5149 accp=1.000 next=draft=4382 prop=4382 olap pair=110.8ms serial=196.2ms gain=85.4ms ratio=0.44 s0=4.0ms s1=192.2ms wait=0.1/52.9ms pred gate=device Token # 36: 3.825ms; value: next_token_ids=tensor([4382], device='cuda:0') mtp accept=1 prop=4382 top1=4382 accp=0.899 next=pair draft=1555 prop=1555 pred gate=device Token # 37: 116.615ms; value: next_token_ids=tensor([1555], device='cuda:0') mtp accept=1 prop=1555 top1=642 accp=0.439 next=draft=112016 prop=48 olap pair=111.3ms serial=197.5ms gain=86.2ms ratio=0.44 s0=4.0ms s1=193.6ms wait=0.1/53.0ms pred gate=device Token # 38: 3.764ms; value: next_token_ids=tensor([112016], device='cuda:0') mtp accept=0 prop=48 top1=112016 accp=0.555 next=pair draft=24268 prop=24268 pred gate=device Token # 39: 116.180ms; value: next_token_ids=tensor([24268], device='cuda:0') mtp accept=1 prop=24268 top1=24268 accp=0.999 next=draft=13554 prop=13554 olap pair=110.8ms serial=196.7ms gain=85.9ms ratio=0.44 s0=4.1ms s1=192.6ms wait=0.1/52.8ms pred gate=device Token # 40: 3.857ms; value: next_token_ids=tensor([2971], device='cuda:0') mtp accept=0 prop=13554 top1=2971 accp=0.344 next=pair draft=13554 prop=13554 pred gate=device Token # 41: 116.470ms; value: next_token_ids=tensor([13554], device='cuda:0') mtp accept=1 prop=13554 top1=2053 accp=0.446 next=draft=2741 prop=2741 olap pair=111.1ms serial=197.4ms gain=86.3ms ratio=0.44 s0=4.0ms s1=193.4ms wait=0.1/53.0ms pred gate=device Token # 42: 3.876ms; value: next_token_ids=tensor([53091], device='cuda:0') mtp accept=0 prop=2741 top1=53091 accp=0.002 next=pair draft=4374 prop=4374 pred gate=device Token # 43: 116.571ms; value: next_token_ids=tensor([4374], device='cuda:0') mtp accept=1 prop=4374 top1=4374 accp=1.000 next=draft=1465 prop=1465 olap pair=111.2ms serial=197.6ms gain=86.4ms ratio=0.44 s0=3.9ms s1=193.8ms wait=0.1/53.2ms pred gate=device Token # 44: 3.797ms; value: next_token_ids=tensor([1465], device='cuda:0') mtp accept=1 prop=1465 top1=1465 accp=1.000 next=pair draft=13582 prop=13582 pred gate=device Token # 45: 116.096ms; value: next_token_ids=tensor([13582], device='cuda:0') mtp accept=1 prop=13582 top1=13582 accp=0.997 next=draft=21 prop=21 olap pair=110.8ms serial=196.9ms gain=86.1ms ratio=0.44 s0=3.8ms s1=193.1ms wait=0.1/53.2ms pred gate=device Token # 46: 3.752ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 47: 117.273ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=0.999 next=draft=20 prop=20 olap pair=111.8ms serial=199.0ms gain=87.2ms ratio=0.44 s0=3.8ms s1=195.1ms wait=0.1/53.2ms pred gate=device Token # 48: 3.840ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=36101 prop=36101 pred gate=device Token # 49: 116.266ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=1.000 next=draft=13808 prop=13808 olap pair=110.8ms serial=197.0ms gain=86.1ms ratio=0.44 s0=3.8ms s1=193.1ms wait=0.1/53.2ms pred gate=device Token # 50: 3.867ms; value: next_token_ids=tensor([13808], device='cuda:0') mtp accept=1 prop=13808 top1=13808 accp=0.999 next=pair draft=2823 prop=2823 pred gate=device Token # 51: 116.954ms; value: next_token_ids=tensor([2823], device='cuda:0') mtp accept=1 prop=2823 top1=2823 accp=0.989 next=draft=7790 prop=7790 olap pair=111.6ms serial=198.4ms gain=86.8ms ratio=0.44 s0=4.3ms s1=194.1ms wait=0.1/52.5ms pred gate=device Token # 52: 3.777ms; value: next_token_ids=tensor([7790], device='cuda:0') mtp accept=1 prop=7790 top1=7790 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 53: 116.010ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=6451 prop=24153 olap pair=110.7ms serial=196.8ms gain=86.1ms ratio=0.44 s0=3.9ms s1=192.8ms wait=0.1/53.0ms pred gate=device Token # 54: 3.783ms; value: next_token_ids=tensor([6451], device='cuda:0') mtp accept=0 prop=24153 top1=6451 accp=0.580 next=pair draft=445 prop=445 pred gate=device Token # 55: 116.618ms; value: next_token_ids=tensor([642], device='cuda:0') mtp accept=0 prop=445 top1=642 accp=0.454 next=draft=27400 prop=27400 olap pair=111.3ms serial=197.9ms gain=86.5ms ratio=0.44 s0=4.6ms s1=193.2ms wait=0.1/52.4ms pred gate=device Token # 56: 117.140ms; value: next_token_ids=tensor([2290], device='cuda:0') mtp accept=0 prop=27400 top1=2290 accp=0.190 next=draft=11612 prop=2426 olap pair=111.8ms serial=198.9ms gain=87.1ms ratio=0.44 s0=3.8ms s1=195.1ms wait=0.1/53.3ms pred gate=device Token # 57: 116.324ms; value: next_token_ids=tensor([2426], device='cuda:0') mtp accept=1 prop=2426 top1=2426 accp=0.316 next=draft=23085 prop=23085 olap pair=110.9ms serial=197.3ms gain=86.4ms ratio=0.44 s0=3.9ms s1=193.4ms wait=0.1/53.0ms pred gate=device Token # 58: 3.917ms; value: next_token_ids=tensor([23085], device='cuda:0') mtp accept=1 prop=23085 top1=23085 accp=1.000 next=pair draft=58 prop=58 pred gate=device Token # 59: 116.728ms; value: next_token_ids=tensor([58], device='cuda:0') mtp accept=1 prop=58 top1=58 accp=1.000 next=draft=24091 prop=24091 olap pair=111.4ms serial=197.6ms gain=86.2ms ratio=0.44 s0=4.9ms s1=192.8ms wait=0.1/51.9ms pred gate=device Token # 60: 3.877ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=24091 top1=223 accp=0.105 next=pair draft=24091 prop=24091 pred gate=device Token # 61: 116.744ms; value: next_token_ids=tensor([24091], device='cuda:0') mtp accept=1 prop=24091 top1=24091 accp=1.000 next=draft=18 prop=18 olap pair=111.3ms serial=197.9ms gain=86.6ms ratio=0.44 s0=3.9ms s1=194.1ms wait=0.1/53.1ms pred gate=device Token # 62: 3.817ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=pair draft=548 prop=548 pred gate=device Token # 63: 116.644ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=1 prop=548 top1=548 accp=1.000 next=draft=2111 prop=2111 olap pair=111.2ms serial=197.6ms gain=86.4ms ratio=0.44 s0=3.8ms s1=193.8ms wait=0.1/53.1ms pred gate=device Token # 64: 3.795ms; value: next_token_ids=tensor([2111], device='cuda:0') mtp accept=1 prop=2111 top1=2111 accp=1.000 next=pair draft=2426 prop=2426 pred gate=device Token # 65: 116.738ms; value: next_token_ids=tensor([2426], device='cuda:0') mtp accept=1 prop=2426 top1=2426 accp=1.000 next=draft=94353 prop=94353 olap pair=111.2ms serial=197.5ms gain=86.3ms ratio=0.44 s0=3.9ms s1=193.6ms wait=0.1/53.0ms pred gate=device Token # 66: 3.871ms; value: next_token_ids=tensor([94353], device='cuda:0') mtp accept=1 prop=94353 top1=94353 accp=1.000 next=pair draft=471 prop=471 pred gate=device Token # 67: 118.161ms; value: next_token_ids=tensor([471], device='cuda:0') mtp accept=1 prop=471 top1=471 accp=1.000 next=draft=1457 prop=1457 olap pair=112.8ms serial=199.8ms gain=87.0ms ratio=0.44 s0=3.8ms s1=196.0ms wait=0.1/53.1ms pred gate=device Token # 68: 3.811ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=pair draft=844 prop=844 pred gate=device Token # 69: 116.265ms; value: next_token_ids=tensor([844], device='cuda:0') mtp accept=1 prop=844 top1=844 accp=0.836 next=draft=13880 prop=13880 olap pair=110.8ms serial=196.8ms gain=86.0ms ratio=0.44 s0=4.0ms s1=192.8ms wait=0.1/52.9ms pred gate=device Token # 70: 3.880ms; value: next_token_ids=tensor([13880], device='cuda:0') mtp accept=1 prop=13880 top1=13880 accp=0.987 next=pair draft=303 prop=303 pred gate=device Token # 71: 116.247ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.999 next=draft=6032 prop=6032 olap pair=110.9ms serial=197.1ms gain=86.2ms ratio=0.44 s0=4.1ms s1=193.0ms wait=0.1/52.9ms pred gate=device Token # 72: 3.792ms; value: next_token_ids=tensor([6032], device='cuda:0') mtp accept=1 prop=6032 top1=6032 accp=0.816 next=pair draft=5659 prop=5659 pred gate=device Token # 73: 116.619ms; value: next_token_ids=tensor([8738], device='cuda:0') mtp accept=0 prop=5659 top1=8738 accp=0.518 next=draft=429 prop=429 olap pair=111.2ms serial=197.9ms gain=86.6ms ratio=0.44 s0=3.9ms s1=193.9ms wait=0.1/52.9ms pred gate=device Token # 74: 115.896ms; value: next_token_ids=tensor([429], device='cuda:0') mtp accept=1 prop=429 top1=429 accp=0.999 next=draft=28769 prop=28769 olap pair=110.4ms serial=196.2ms gain=85.8ms ratio=0.44 s0=3.8ms s1=192.3ms wait=0.1/53.2ms pred gate=device Token # 75: 3.846ms; value: next_token_ids=tensor([28769], device='cuda:0') mtp accept=1 prop=28769 top1=28769 accp=1.000 next=pair draft=36 prop=36 pred gate=device Token # 76: 116.406ms; value: next_token_ids=tensor([36], device='cuda:0') mtp accept=1 prop=36 top1=36 accp=1.000 next=draft=87459 prop=87459 olap pair=111.0ms serial=197.1ms gain=86.1ms ratio=0.44 s0=3.9ms s1=193.2ms wait=0.1/53.0ms pred gate=device Token # 77: 3.833ms; value: next_token_ids=tensor([10602], device='cuda:0') mtp accept=0 prop=87459 top1=10602 accp=0.015 next=pair draft=5209 prop=5209 pred gate=device Token # 78: 116.271ms; value: next_token_ids=tensor([5209], device='cuda:0') mtp accept=1 prop=5209 top1=5209 accp=1.000 next=draft=8842 prop=8842 olap pair=110.9ms serial=196.9ms gain=86.1ms ratio=0.44 s0=3.9ms s1=193.1ms wait=0.1/51.6ms pred gate=device Token # 79: 3.763ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 80: 117.381ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.749 next=draft=17839 prop=17839 olap pair=111.4ms serial=197.5ms gain=86.1ms ratio=0.44 s0=6.6ms s1=191.0ms wait=0.2/44.6ms pred gate=device Token # 81: 4.695ms; value: next_token_ids=tensor([10280], device='cuda:0') mtp accept=0 prop=17839 top1=1959 accp=0.022 next=pair draft=57135 prop=57135 pred gate=device Token # 82: 116.027ms; value: next_token_ids=tensor([57135], device='cuda:0') mtp accept=1 prop=57135 top1=57135 accp=0.912 next=draft=301 prop=301 olap pair=110.5ms serial=196.2ms gain=85.8ms ratio=0.44 s0=4.2ms s1=192.0ms wait=0.1/47.6ms pred gate=device Token # 83: 3.688ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=0.841 next=pair draft=3962 prop=478 pred gate=device Token # 84: 116.050ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.682 next=draft=8403 prop=7790 olap pair=110.9ms serial=197.0ms gain=86.2ms ratio=0.44 s0=4.4ms s1=192.6ms wait=0.1/47.3ms pred gate=device Token # 85: 3.701ms; value: next_token_ids=tensor([6640], device='cuda:0') mtp accept=0 prop=7790 top1=6640 accp=0.154 next=pair draft=5109 prop=7578 pred gate=device Token # 86: 115.623ms; value: next_token_ids=tensor([7578], device='cuda:0') mtp accept=1 prop=7578 top1=7578 accp=0.531 next=draft=3115 prop=6710 olap pair=110.4ms serial=196.0ms gain=85.7ms ratio=0.44 s0=3.8ms s1=192.2ms wait=0.1/48.0ms pred gate=device Token # 87: 3.768ms; value: next_token_ids=tensor([1395], device='cuda:0') mtp accept=0 prop=6710 top1=1395 accp=0.003 next=pair draft=44750 prop=44750 pred gate=device Token # 88: 116.773ms; value: next_token_ids=tensor([44750], device='cuda:0') mtp accept=1 prop=44750 top1=44750 accp=0.911 next=draft=17001 prop=17001 olap pair=111.3ms serial=197.9ms gain=86.5ms ratio=0.44 s0=3.9ms s1=193.9ms wait=0.1/47.8ms pred gate=device Token # 89: 3.747ms; value: next_token_ids=tensor([17001], device='cuda:0') mtp accept=1 prop=17001 top1=17001 accp=1.000 next=pair draft=978 prop=978 pred gate=device Token # 90: 115.871ms; value: next_token_ids=tensor([978], device='cuda:0') mtp accept=1 prop=978 top1=978 accp=1.000 next=draft=8974 prop=8974 olap pair=110.5ms serial=196.2ms gain=85.7ms ratio=0.44 s0=5.3ms s1=190.9ms wait=0.2/46.0ms pred gate=device Token # 91: 3.708ms; value: next_token_ids=tensor([8974], device='cuda:0') mtp accept=1 prop=8974 top1=8974 accp=1.000 next=pair draft=2554 prop=2554 pred gate=device Token # 92: 116.017ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=2554 top1=303 accp=0.422 next=draft=80276 prop=43913 olap pair=110.8ms serial=197.3ms gain=86.5ms ratio=0.44 s0=3.9ms s1=193.4ms wait=0.1/47.9ms pred gate=device Token # 93: 116.053ms; value: next_token_ids=tensor([43913], device='cuda:0') mtp accept=1 prop=43913 top1=80276 accp=0.687 next=draft=859 prop=859 olap pair=110.8ms serial=197.1ms gain=86.3ms ratio=0.44 s0=3.8ms s1=193.4ms wait=0.1/48.1ms pred gate=device Token # 94: 3.711ms; value: next_token_ids=tensor([859], device='cuda:0') mtp accept=1 prop=859 top1=859 accp=0.834 next=pair draft=4754 prop=4754 pred gate=device Token # 95: 116.126ms; value: next_token_ids=tensor([4754], device='cuda:0') mtp accept=1 prop=4754 top1=4754 accp=0.921 next=draft=6710 prop=6710 olap pair=110.9ms serial=197.0ms gain=86.1ms ratio=0.44 s0=4.4ms s1=192.7ms wait=0.1/47.3ms pred gate=device Token # 96: 3.717ms; value: next_token_ids=tensor([4916], device='cuda:0') mtp accept=0 prop=6710 top1=9209 accp=0.050 next=pair draft=642 prop=642 pred gate=device Token # 97: 115.971ms; value: next_token_ids=tensor([642], device='cuda:0') mtp accept=1 prop=642 top1=642 accp=0.964 next=draft=2081 prop=17542 olap pair=110.7ms serial=196.2ms gain=85.5ms ratio=0.44 s0=4.0ms s1=192.2ms wait=0.1/47.9ms pred gate=device Token # 98: 3.760ms; value: next_token_ids=tensor([2081], device='cuda:0') mtp accept=0 prop=17542 top1=2081 accp=0.499 next=pair draft=613 prop=613 pred gate=device Token # 99: 116.325ms; value: next_token_ids=tensor([613], device='cuda:0') mtp accept=1 prop=613 top1=613 accp=0.760 next=draft=6469 prop=6469 olap pair=111.0ms serial=196.1ms gain=85.2ms ratio=0.43 s0=4.2ms s1=192.0ms wait=0.1/47.7ms pred gate=device Token # 100: 3.874ms; value: next_token_ids=tensor([58372], device='cuda:0') mtp accept=0 prop=6469 top1=58372 accp=0.027 next=pair draft=17542 prop=17542 pred gate=device Token # 101: 116.705ms; value: next_token_ids=tensor([17542], device='cuda:0') mtp accept=1 prop=17542 top1=17542 accp=0.917 next=draft=12008 prop=12008 olap pair=111.4ms serial=198.3ms gain=86.9ms ratio=0.44 s0=3.9ms s1=194.4ms wait=0.1/47.9ms pred gate=device Token # 102: 3.720ms; value: next_token_ids=tensor([12008], device='cuda:0') mtp accept=1 prop=12008 top1=12008 accp=0.995 next=pair draft=303 prop=303 pred gate=device Token # 103: 115.906ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.856 next=draft=589 prop=589 olap pair=110.7ms serial=196.7ms gain=86.0ms ratio=0.44 s0=4.2ms s1=192.5ms wait=0.1/47.5ms pred gate=device Token # 104: 3.695ms; value: next_token_ids=tensor([589], device='cuda:0') mtp accept=1 prop=589 top1=79184 accp=0.150 next=pair draft=59025 prop=59025 pred gate=device Token # 105: 116.826ms; value: next_token_ids=tensor([1395], device='cuda:0') mtp accept=0 prop=59025 top1=59025 accp=0.549 next=draft=8767 prop=8767 olap pair=111.6ms serial=198.0ms gain=86.4ms ratio=0.44 s0=4.1ms s1=193.9ms wait=0.1/47.6ms pred gate=device Token # 106: 116.941ms; value: next_token_ids=tensor([8767], device='cuda:0') mtp accept=1 prop=8767 top1=8767 accp=0.984 next=draft=21134 prop=21134 olap pair=111.5ms serial=198.6ms gain=87.0ms ratio=0.44 s0=3.8ms s1=194.7ms wait=0.1/48.0ms pred gate=device Token # 107: 3.706ms; value: next_token_ids=tensor([21134], device='cuda:0') mtp accept=1 prop=21134 top1=21134 accp=1.000 next=pair draft=3516 prop=3516 pred gate=device Token # 108: 116.147ms; value: next_token_ids=tensor([2823], device='cuda:0') mtp accept=0 prop=3516 top1=2823 accp=0.046 next=draft=121682 prop=7790 olap pair=110.9ms serial=197.2ms gain=86.3ms ratio=0.44 s0=3.9ms s1=193.3ms wait=0.1/48.0ms pred gate=device Token # 109: 115.809ms; value: next_token_ids=tensor([7790], device='cuda:0') mtp accept=1 prop=7790 top1=121682 accp=0.655 next=draft=90592 prop=90592 olap pair=110.5ms serial=196.5ms gain=86.0ms ratio=0.44 s0=3.9ms s1=192.6ms wait=0.1/47.9ms pred gate=device Token # 110: 3.796ms; value: next_token_ids=tensor([90592], device='cuda:0') mtp accept=1 prop=90592 top1=90592 accp=0.996 next=pair draft=10294 prop=10294 pred gate=device Token # 111: 116.484ms; value: next_token_ids=tensor([10294], device='cuda:0') mtp accept=1 prop=10294 top1=10294 accp=0.999 next=draft=320 prop=320 olap pair=111.2ms serial=197.8ms gain=86.6ms ratio=0.44 s0=3.9ms s1=193.9ms wait=0.1/47.9ms pred gate=device Token # 112: 3.705ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=4534 prop=4534 pred gate=device Token # 113: 115.969ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=0 prop=4534 top1=1207 accp=0.257 next=draft=21713 prop=21713 olap pair=110.7ms serial=196.8ms gain=86.1ms ratio=0.44 s0=3.9ms s1=193.0ms wait=0.1/48.1ms pred gate=device Token # 114: 116.178ms; value: next_token_ids=tensor([2823], device='cuda:0') mtp accept=0 prop=21713 top1=2823 accp=0.031 next=draft=18847 prop=18847 olap pair=110.9ms serial=197.2ms gain=86.3ms ratio=0.44 s0=3.9ms s1=193.3ms wait=0.1/48.0ms pred gate=device Token # 115: 116.763ms; value: next_token_ids=tensor([18847], device='cuda:0') mtp accept=1 prop=18847 top1=18847 accp=1.000 next=draft=6788 prop=6788 olap pair=111.4ms serial=198.1ms gain=86.7ms ratio=0.44 s0=4.1ms s1=194.0ms wait=0.1/47.6ms pred gate=device Token # 116: 3.739ms; value: next_token_ids=tensor([6788], device='cuda:0') mtp accept=1 prop=6788 top1=6788 accp=0.996 next=pair draft=42562 prop=42562 pred gate=device Token # 117: 115.942ms; value: next_token_ids=tensor([42562], device='cuda:0') mtp accept=1 prop=42562 top1=42562 accp=1.000 next=draft=1654 prop=1654 olap pair=110.6ms serial=196.6ms gain=85.9ms ratio=0.44 s0=4.2ms s1=192.4ms wait=0.1/47.4ms pred gate=device Token # 118: 3.805ms; value: next_token_ids=tensor([1654], device='cuda:0') mtp accept=1 prop=1654 top1=1654 accp=0.873 next=pair draft=303 prop=303 pred gate=device Token # 119: 116.923ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=5802 prop=5802 olap pair=111.7ms serial=198.4ms gain=86.7ms ratio=0.44 s0=4.0ms s1=194.4ms wait=0.1/47.9ms pred gate=device Token # 120: 3.723ms; value: next_token_ids=tensor([5802], device='cuda:0') mtp accept=1 prop=5802 top1=5802 accp=0.880 next=pair draft=8283 prop=8283 pred gate=device Token # 121: 116.536ms; value: next_token_ids=tensor([10602], device='cuda:0') mtp accept=0 prop=8283 top1=10602 accp=0.385 next=draft=410 prop=410 olap pair=111.3ms serial=197.6ms gain=86.3ms ratio=0.44 s0=4.3ms s1=193.3ms wait=0.1/47.3ms pred gate=device Token # 122: 116.403ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=0.999 next=draft=10909 prop=10909 olap pair=111.1ms serial=197.5ms gain=86.4ms ratio=0.44 s0=4.2ms s1=193.3ms wait=0.1/47.4ms pred gate=device Token # 123: 3.814ms; value: next_token_ids=tensor([12145], device='cuda:0') mtp accept=0 prop=10909 top1=35991 accp=0.204 next=pair draft=410 prop=410 pred gate=device Token # 124: 116.647ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=1.000 next=draft=15801 prop=35991 olap pair=111.3ms serial=197.7ms gain=86.5ms ratio=0.44 s0=4.1ms s1=193.6ms wait=0.1/47.4ms pred gate=device Token # 125: 3.730ms; value: next_token_ids=tensor([10861], device='cuda:0') mtp accept=0 prop=35991 top1=10861 accp=0.081 next=pair draft=3374 prop=3374 pred gate=device Token # 126: 116.350ms; value: next_token_ids=tensor([10909], device='cuda:0') mtp accept=0 prop=3374 top1=10909 accp=0.085 next=draft=12659 prop=12659 olap pair=111.0ms serial=197.1ms gain=86.2ms ratio=0.44 s0=4.1ms s1=193.0ms wait=0.1/47.6ms pred gate=device Token # 127: 116.503ms; value: next_token_ids=tensor([69674], device='cuda:0') mtp accept=0 prop=12659 top1=69674 accp=0.049 next=draft=18901 prop=18901 olap pair=111.1ms serial=197.8ms gain=86.7ms ratio=0.44 s0=3.8ms s1=194.0ms wait=0.1/48.1ms pred gate=device Token # 128: 116.871ms; value: next_token_ids=tensor([18901], device='cuda:0') mtp accept=1 prop=18901 top1=18901 accp=0.990 next=draft=478 prop=478 olap pair=110.6ms serial=196.7ms gain=86.1ms ratio=0.44 s0=3.8ms s1=192.9ms wait=0.1/48.2ms pred gate=device Token # 129: 4.611ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.998 next=pair draft=8403 prop=8403 pred gate=device Token # 130: 116.844ms; value: next_token_ids=tensor([531], device='cuda:0') mtp accept=0 prop=8403 top1=531 accp=0.112 next=draft=8403 prop=8403 olap pair=111.5ms serial=197.1ms gain=85.7ms ratio=0.43 s0=6.3ms s1=190.8ms wait=0.2/45.1ms pred gate=device Token # 131: 116.580ms; value: next_token_ids=tensor([8403], device='cuda:0') mtp accept=1 prop=8403 top1=8403 accp=1.000 next=draft=28638 prop=28638 olap pair=111.3ms serial=197.9ms gain=86.6ms ratio=0.44 s0=4.0ms s1=193.9ms wait=0.1/47.9ms pred gate=device Token # 132: 3.732ms; value: next_token_ids=tensor([7790], device='cuda:0') mtp accept=0 prop=28638 top1=7790 accp=0.045 next=pair draft=33315 prop=4572 pred gate=device Token # 133: 116.351ms; value: next_token_ids=tensor([4572], device='cuda:0') mtp accept=1 prop=4572 top1=4572 accp=0.472 next=draft=2554 prop=2554 olap pair=111.0ms serial=197.4ms gain=86.4ms ratio=0.44 s0=3.8ms s1=193.7ms wait=0.1/48.1ms pred gate=device Token # 134: 3.695ms; value: next_token_ids=tensor([2554], device='cuda:0') mtp accept=1 prop=2554 top1=2554 accp=1.000 next=pair draft=50584 prop=50584 pred gate=device Token # 135: 116.315ms; value: next_token_ids=tensor([1735], device='cuda:0') mtp accept=0 prop=50584 top1=1735 accp=0.069 next=draft=23116 prop=3486 olap pair=111.1ms serial=197.1ms gain=86.0ms ratio=0.44 s0=4.1ms s1=193.1ms wait=0.1/47.7ms pred gate=device Token # 136: 117.002ms; value: next_token_ids=tensor([450], device='cuda:0') mtp accept=0 prop=3486 top1=450 accp=0.210 next=draft=8563 prop=8563 olap pair=111.7ms serial=197.8ms gain=86.2ms ratio=0.44 s0=4.2ms s1=193.7ms wait=0.1/47.6ms pred gate=device Token # 137: 117.003ms; value: next_token_ids=tensor([8563], device='cuda:0') mtp accept=1 prop=8563 top1=8563 accp=1.000 next=draft=4498 prop=4498 olap pair=111.6ms serial=197.8ms gain=86.3ms ratio=0.44 s0=4.2ms s1=193.6ms wait=0.1/47.7ms pred gate=device Token # 138: 3.767ms; value: next_token_ids=tensor([10861], device='cuda:0') mtp accept=0 prop=4498 top1=4498 accp=0.557 next=pair draft=4498 prop=4498 pred gate=device Token # 139: 116.024ms; value: next_token_ids=tensor([4498], device='cuda:0') mtp accept=1 prop=4498 top1=4498 accp=0.987 next=draft=303 prop=303 olap pair=110.7ms serial=196.8ms gain=86.2ms ratio=0.44 s0=3.8ms s1=193.0ms wait=0.1/48.5ms pred gate=device Token # 140: 3.763ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=4272 prop=4272 pred gate=device Token # 141: 116.433ms; value: next_token_ids=tensor([4272], device='cuda:0') mtp accept=1 prop=4272 top1=4272 accp=0.991 next=draft=3486 prop=3486 olap pair=111.2ms serial=197.8ms gain=86.6ms ratio=0.44 s0=3.8ms s1=194.0ms wait=0.1/48.4ms pred gate=device Token # 142: 3.710ms; value: next_token_ids=tensor([63017], device='cuda:0') mtp accept=0 prop=3486 top1=1255 accp=0.046 next=pair draft=3257 prop=3257 pred gate=device Token # 143: 116.967ms; value: next_token_ids=tensor([3486], device='cuda:0') mtp accept=0 prop=3257 top1=3257 accp=0.900 next=draft=1078 prop=1078 olap pair=111.7ms serial=198.7ms gain=87.0ms ratio=0.44 s0=3.9ms s1=194.8ms wait=0.1/48.3ms pred gate=device Token # 144: 116.838ms; value: next_token_ids=tensor([1078], device='cuda:0') mtp accept=1 prop=1078 top1=1078 accp=1.000 next=draft=27520 prop=27520 olap pair=111.5ms serial=198.4ms gain=86.8ms ratio=0.44 s0=4.0ms s1=194.4ms wait=0.1/48.2ms pred gate=device Token # 145: 3.691ms; value: next_token_ids=tensor([27520], device='cuda:0') mtp accept=1 prop=27520 top1=27520 accp=0.983 next=pair draft=55779 prop=55779 pred gate=device Token # 146: 116.245ms; value: next_token_ids=tensor([1255], device='cuda:0') mtp accept=0 prop=55779 top1=1255 accp=0.001 next=draft=112016 prop=112016 olap pair=111.0ms serial=197.4ms gain=86.4ms ratio=0.44 s0=4.1ms s1=193.3ms wait=0.1/47.9ms pred gate=device Token # 147: 116.173ms; value: next_token_ids=tensor([112016], device='cuda:0') mtp accept=1 prop=112016 top1=112016 accp=1.000 next=draft=24268 prop=24268 olap pair=110.7ms serial=197.1ms gain=86.4ms ratio=0.44 s0=3.8ms s1=193.3ms wait=0.1/48.5ms pred gate=device Token # 148: 3.732ms; value: next_token_ids=tensor([24268], device='cuda:0') mtp accept=1 prop=24268 top1=24268 accp=1.000 next=pair draft=2971 prop=2971 pred gate=device Token # 149: 115.999ms; value: next_token_ids=tensor([2971], device='cuda:0') mtp accept=1 prop=2971 top1=2971 accp=0.998 next=draft=55779 prop=55779 olap pair=110.8ms serial=197.0ms gain=86.2ms ratio=0.44 s0=3.7ms s1=193.3ms wait=0.1/48.6ms pred gate=device Token # 150: 3.728ms; value: next_token_ids=tensor([55779], device='cuda:0') mtp accept=1 prop=55779 top1=1958 accp=0.520 next=pair draft=320 prop=320 pred gate=device Token # 151: 116.910ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=320 top1=303 accp=0.513 next=draft=4755 prop=4755 olap pair=111.7ms serial=198.8ms gain=87.1ms ratio=0.44 s0=3.8ms s1=195.0ms wait=0.1/48.5ms pred gate=device Token # 152: 116.363ms; value: next_token_ids=tensor([1759], device='cuda:0') mtp accept=0 prop=4755 top1=1759 accp=0.412 next=draft=84964 prop=84964 olap pair=111.0ms serial=197.1ms gain=86.1ms ratio=0.44 s0=4.4ms s1=192.8ms wait=0.1/47.3ms pred gate=device Token # 153: 117.063ms; value: next_token_ids=tensor([84964], device='cuda:0') mtp accept=1 prop=84964 top1=84964 accp=0.867 next=draft=4382 prop=4382 olap pair=111.7ms serial=199.0ms gain=87.2ms ratio=0.44 s0=4.0ms s1=195.0ms wait=0.1/48.3ms pred gate=device Token # 154: 3.747ms; value: next_token_ids=tensor([2823], device='cuda:0') mtp accept=0 prop=4382 top1=2823 accp=0.176 next=pair draft=7185 prop=7185 pred gate=device Token # 155: 117.018ms; value: next_token_ids=tensor([7185], device='cuda:0') mtp accept=1 prop=7185 top1=7185 accp=0.433 next=draft=24958 prop=24958 olap pair=111.7ms serial=197.2ms gain=85.5ms ratio=0.43 s0=4.2ms s1=193.0ms wait=0.1/47.9ms pred gate=device Token # 156: 3.721ms; value: next_token_ids=tensor([24958], device='cuda:0') mtp accept=1 prop=24958 top1=24958 accp=0.509 next=pair draft=320 prop=320 pred gate=device Token # 157: 116.621ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=0 prop=320 top1=548 accp=0.033 next=draft=4339 prop=4339 olap pair=111.3ms serial=196.5ms gain=85.2ms ratio=0.43 s0=4.2ms s1=192.3ms wait=0.1/48.2ms pred gate=device Token # 158: 117.595ms; value: next_token_ids=tensor([4354], device='cuda:0') mtp accept=0 prop=4339 top1=4354 accp=0.094 next=draft=320 prop=320 olap pair=112.2ms serial=199.9ms gain=87.7ms ratio=0.44 s0=4.0ms s1=195.9ms wait=0.1/48.3ms pred gate=device Token # 159: 117.189ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.981 next=draft=17839 prop=17839 olap pair=111.8ms serial=199.1ms gain=87.3ms ratio=0.44 s0=3.9ms s1=195.2ms wait=0.1/48.5ms pred gate=device Token # 160: 3.746ms; value: next_token_ids=tensor([1959], device='cuda:0') mtp accept=0 prop=17839 top1=1959 accp=0.028 next=pair draft=4572 prop=4572 pred gate=device Token # 161: 116.595ms; value: next_token_ids=tensor([55443], device='cuda:0') mtp accept=0 prop=4572 top1=55443 accp=0.055 next=draft=4572 prop=4572 olap pair=111.3ms serial=198.1ms gain=86.8ms ratio=0.44 s0=3.8ms s1=194.3ms wait=0.1/48.5ms pred gate=device Token # 162: 116.767ms; value: next_token_ids=tensor([18143], device='cuda:0') mtp accept=0 prop=4572 top1=18143 accp=0.326 next=draft=1263 prop=1263 olap pair=111.4ms serial=198.3ms gain=86.9ms ratio=0.44 s0=3.8ms s1=194.4ms wait=0.1/48.6ms pred gate=device Token # 163: 116.001ms; value: next_token_ids=tensor([86964], device='cuda:0') mtp accept=0 prop=1263 top1=86964 accp=0.405 next=draft=303 prop=303 olap pair=110.6ms serial=196.7ms gain=86.1ms ratio=0.44 s0=3.8ms s1=193.0ms wait=0.1/48.5ms pred gate=device Token # 164: 116.030ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=34071 prop=34071 olap pair=110.7ms serial=196.9ms gain=86.1ms ratio=0.44 s0=5.1ms s1=191.8ms wait=0.1/46.9ms pred gate=device Token # 165: 3.759ms; value: next_token_ids=tensor([34071], device='cuda:0') mtp accept=1 prop=34071 top1=34071 accp=0.997 next=pair draft=18901 prop=18901 pred gate=device Token # 166: 116.769ms; value: next_token_ids=tensor([6710], device='cuda:0') mtp accept=0 prop=18901 top1=6710 accp=0.356 next=draft=1959 prop=1959 olap pair=111.0ms serial=196.4ms gain=85.4ms ratio=0.43 s0=6.5ms s1=189.9ms wait=0.2/45.3ms pred gate=device Token # 167: 117.473ms; value: next_token_ids=tensor([1959], device='cuda:0') mtp accept=1 prop=1959 top1=1959 accp=0.678 next=draft=4572 prop=4572 olap pair=111.3ms serial=196.9ms gain=85.6ms ratio=0.43 s0=7.7ms s1=189.2ms wait=0.2/43.6ms pred gate=device Token # 168: 4.623ms; value: next_token_ids=tensor([4572], device='cuda:0') mtp accept=1 prop=4572 top1=4572 accp=0.999 next=pair draft=478 prop=478 pred gate=device Token # 169: 117.366ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=draft=7831 prop=7831 olap pair=111.2ms serial=196.8ms gain=85.6ms ratio=0.43 s0=8.4ms s1=188.4ms wait=0.2/42.8ms pred gate=device Token # 170: 4.650ms; value: next_token_ids=tensor([4631], device='cuda:0') mtp accept=0 prop=7831 top1=4631 accp=0.036 next=pair draft=4477 prop=8974 pred gate=device Token # 171: 117.455ms; value: next_token_ids=tensor([8974], device='cuda:0') mtp accept=1 prop=8974 top1=8974 accp=0.406 next=draft=1263 prop=1263 olap pair=111.2ms serial=196.4ms gain=85.2ms ratio=0.43 s0=6.6ms s1=189.9ms wait=0.2/45.1ms pred gate=device Token # 172: 4.579ms; value: next_token_ids=tensor([1263], device='cuda:0') mtp accept=1 prop=1263 top1=1263 accp=0.955 next=pair draft=1395 prop=1395 pred gate=device Token # 173: 116.910ms; value: next_token_ids=tensor([7590], device='cuda:0') mtp accept=0 prop=1395 top1=7590 accp=0.272 next=draft=45242 prop=45242 olap pair=110.7ms serial=195.8ms gain=85.1ms ratio=0.43 s0=5.5ms s1=190.3ms wait=0.1/46.4ms pred gate=device Token # 174: 116.856ms; value: next_token_ids=tensor([45242], device='cuda:0') mtp accept=1 prop=45242 top1=45242 accp=1.000 next=draft=4377 prop=4377 olap pair=111.0ms serial=194.7ms gain=83.7ms ratio=0.43 s0=7.1ms s1=187.5ms wait=0.2/44.4ms pred gate=device Token # 175: 3.772ms; value: next_token_ids=tensor([4377], device='cuda:0') mtp accept=1 prop=4377 top1=4377 accp=0.614 next=pair draft=3745 prop=3745 pred gate=device Token # 176: 116.526ms; value: next_token_ids=tensor([1169], device='cuda:0') mtp accept=0 prop=3745 top1=1169 accp=0.037 next=draft=320 prop=320 olap pair=111.3ms serial=197.1ms gain=85.9ms ratio=0.44 s0=4.1ms s1=193.0ms wait=0.1/48.3ms pred gate=device Token # 177: 116.283ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.997 next=draft=7882 prop=7882 olap pair=110.8ms serial=197.1ms gain=86.3ms ratio=0.44 s0=3.8ms s1=193.3ms wait=0.1/48.5ms pred gate=device Token # 178: 3.718ms; value: next_token_ids=tensor([7882], device='cuda:0') mtp accept=1 prop=7882 top1=7882 accp=1.000 next=pair draft=428 prop=428 pred gate=device Token # 179: 116.799ms; value: next_token_ids=tensor([428], device='cuda:0') mtp accept=1 prop=428 top1=428 accp=0.999 next=draft=31826 prop=31826 olap pair=111.6ms serial=198.3ms gain=86.7ms ratio=0.44 s0=4.3ms s1=194.0ms wait=0.1/47.8ms pred gate=device Token # 180: 3.726ms; value: next_token_ids=tensor([31826], device='cuda:0') mtp accept=1 prop=31826 top1=31826 accp=0.991 next=pair draft=3968 prop=3968 pred gate=device Token # 181: 116.652ms; value: next_token_ids=tensor([3968], device='cuda:0') mtp accept=1 prop=3968 top1=3968 accp=0.999 next=draft=18452 prop=18452 olap pair=111.4ms serial=198.1ms gain=86.8ms ratio=0.44 s0=3.9ms s1=194.2ms wait=0.1/48.4ms pred gate=device Token # 182: 3.731ms; value: next_token_ids=tensor([18452], device='cuda:0') mtp accept=1 prop=18452 top1=18452 accp=1.000 next=pair draft=1316 prop=1316 pred gate=device Token # 183: 116.081ms; value: next_token_ids=tensor([1316], device='cuda:0') mtp accept=1 prop=1316 top1=1316 accp=1.000 next=draft=3613 prop=3613 olap pair=110.8ms serial=197.1ms gain=86.3ms ratio=0.44 s0=3.8ms s1=193.3ms wait=0.1/48.5ms pred gate=device Token # 184: 3.794ms; value: next_token_ids=tensor([3613], device='cuda:0') mtp accept=1 prop=3613 top1=3613 accp=1.000 next=pair draft=970 prop=970 pred gate=device Token # 185: 115.900ms; value: next_token_ids=tensor([970], device='cuda:0') mtp accept=1 prop=970 top1=970 accp=1.000 next=draft=867 prop=867 olap pair=110.7ms serial=196.9ms gain=86.2ms ratio=0.44 s0=4.0ms s1=192.9ms wait=0.1/48.2ms pred gate=device Token # 186: 3.778ms; value: next_token_ids=tensor([867], device='cuda:0') mtp accept=1 prop=867 top1=867 accp=1.000 next=pair draft=2055 prop=2055 pred gate=device Token # 187: 116.008ms; value: next_token_ids=tensor([2055], device='cuda:0') mtp accept=1 prop=2055 top1=2055 accp=1.000 next=draft=76148 prop=76148 olap pair=110.7ms serial=196.1ms gain=85.4ms ratio=0.44 s0=6.9ms s1=189.2ms wait=0.2/44.6ms pred gate=device Token # 188: 3.705ms; value: next_token_ids=tensor([76148], device='cuda:0') mtp accept=1 prop=76148 top1=76148 accp=1.000 next=pair draft=430 prop=430 pred gate=device Token # 189: 117.027ms; value: next_token_ids=tensor([430], device='cuda:0') mtp accept=1 prop=430 top1=430 accp=0.636 next=draft=1263 prop=1263 olap pair=111.8ms serial=196.0ms gain=84.2ms ratio=0.43 s0=6.0ms s1=190.0ms wait=0.2/45.7ms pred gate=device Token # 190: 3.716ms; value: next_token_ids=tensor([3279], device='cuda:0') mtp accept=0 prop=1263 top1=26965 accp=0.421 next=pair draft=18937 prop=18937 pred gate=device Token # 191: 116.350ms; value: next_token_ids=tensor([38879], device='cuda:0') mtp accept=0 prop=18937 top1=38879 accp=0.243 next=draft=303 prop=303 olap pair=111.0ms serial=196.1ms gain=85.0ms ratio=0.43 s0=5.1ms s1=190.9ms wait=0.1/46.9ms pred gate=device Token # 192: 117.302ms; value: next_token_ids=tensor([1263], device='cuda:0') mtp accept=0 prop=303 top1=1263 accp=0.222 next=draft=1395 prop=1395 olap pair=112.0ms serial=198.6ms gain=86.6ms ratio=0.44 s0=6.5ms s1=192.2ms wait=0.2/45.3ms pred gate=device Token # 193: 116.239ms; value: next_token_ids=tensor([1395], device='cuda:0') mtp accept=1 prop=1395 top1=1395 accp=0.926 next=draft=49590 prop=55767 olap pair=110.9ms serial=197.4ms gain=86.5ms ratio=0.44 s0=3.8ms s1=193.7ms wait=0.1/48.5ms pred gate=device Token # 194: 3.780ms; value: next_token_ids=tensor([49590], device='cuda:0') mtp accept=0 prop=55767 top1=49590 accp=0.716 next=pair draft=3745 prop=3745 pred gate=device Token # 195: 116.915ms; value: next_token_ids=tensor([7156], device='cuda:0') mtp accept=0 prop=3745 top1=7156 accp=0.002 next=draft=17675 prop=17675 olap pair=111.6ms serial=198.7ms gain=87.0ms ratio=0.44 s0=3.8ms s1=194.9ms wait=0.1/48.7ms pred gate=device Token # 196: 116.990ms; value: next_token_ids=tensor([52120], device='cuda:0') mtp accept=0 prop=17675 top1=52120 accp=0.002 next=draft=8563 prop=8563 olap pair=111.7ms serial=199.0ms gain=87.3ms ratio=0.44 s0=3.7ms s1=195.3ms wait=0.1/48.6ms pred gate=device Token # 197: 116.542ms; value: next_token_ids=tensor([8563], device='cuda:0') mtp accept=1 prop=8563 top1=8563 accp=1.000 next=draft=4498 prop=4498 olap pair=111.1ms serial=196.4ms gain=85.3ms ratio=0.43 s0=4.9ms s1=191.4ms wait=0.1/47.1ms pred gate=device Token # 198: 3.755ms; value: next_token_ids=tensor([4498], device='cuda:0') mtp accept=1 prop=4498 top1=4498 accp=1.000 next=pair draft=8720 prop=8720 pred gate=device Token # 199: 117.294ms; value: next_token_ids=tensor([8720], device='cuda:0') mtp accept=1 prop=8720 top1=8720 accp=1.000 next=draft=4706 prop=4706 olap pair=111.9ms serial=198.7ms gain=86.8ms ratio=0.44 s0=4.8ms s1=193.9ms wait=0.1/47.2ms pred gate=device Token # 200: 3.747ms; value: next_token_ids=tensor([4706], device='cuda:0') mtp accept=1 prop=4706 top1=4706 accp=1.000 next=pair draft=430 prop=430 pred gate=device Token # 201: 116.719ms; value: next_token_ids=tensor([430], device='cuda:0') mtp accept=1 prop=430 top1=430 accp=1.000 next=draft=1959 prop=1959 olap pair=111.4ms serial=196.8ms gain=85.4ms ratio=0.43 s0=4.5ms s1=192.4ms wait=0.1/47.8ms pred gate=device Token # 202: 3.821ms; value: next_token_ids=tensor([3279], device='cuda:0') mtp accept=0 prop=1959 top1=3279 accp=0.491 next=pair draft=18937 prop=18937 pred gate=device Token # 203: 116.415ms; value: next_token_ids=tensor([844], device='cuda:0') mtp accept=0 prop=18937 top1=844 accp=0.001 next=draft=24153 prop=24153 olap pair=111.1ms serial=196.9ms gain=85.8ms ratio=0.44 s0=4.7ms s1=192.2ms wait=0.1/47.5ms pred gate=device Token # 204: 116.984ms; value: next_token_ids=tensor([24153], device='cuda:0') mtp accept=1 prop=24153 top1=24153 accp=1.000 next=draft=9308 prop=1263 olap pair=111.6ms serial=196.6ms gain=85.0ms ratio=0.43 s0=6.6ms s1=190.0ms wait=0.2/44.6ms pred gate=device Token # 205: 3.731ms; value: next_token_ids=tensor([9308], device='cuda:0') mtp accept=0 prop=1263 top1=9308 accp=0.560 next=pair draft=1395 prop=1395 pred gate=device Token # 206: 117.133ms; value: next_token_ids=tensor([23220], device='cuda:0') mtp accept=0 prop=1395 top1=1395 accp=0.702 next=draft=6268 prop=6268 olap pair=111.9ms serial=197.6ms gain=85.7ms ratio=0.43 s0=5.0ms s1=192.6ms wait=0.1/47.2ms pred gate=device Token # 207: 117.796ms; value: next_token_ids=tensor([978], device='cuda:0') mtp accept=0 prop=6268 top1=978 accp=0.002 next=draft=2843 prop=2843 olap pair=111.7ms serial=197.5ms gain=85.8ms ratio=0.43 s0=6.9ms s1=190.5ms wait=0.2/44.7ms pred gate=device Token # 208: 118.186ms; value: next_token_ids=tensor([2843], device='cuda:0') mtp accept=1 prop=2843 top1=2843 accp=1.000 next=draft=6788 prop=3745 olap pair=111.8ms serial=197.7ms gain=85.9ms ratio=0.43 s0=8.4ms s1=189.3ms wait=0.2/42.9ms pred gate=device Token # 209: 4.378ms; value: next_token_ids=tensor([6788], device='cuda:0') mtp accept=0 prop=3745 top1=6788 accp=0.571 next=pair draft=478 prop=478 pred gate=device Token # 210: 117.200ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.999 next=draft=2823 prop=2823 olap pair=111.1ms serial=196.7ms gain=85.7ms ratio=0.44 s0=7.0ms s1=189.7ms wait=0.2/44.5ms pred gate=device Token # 211: 4.563ms; value: next_token_ids=tensor([2823], device='cuda:0') mtp accept=1 prop=2823 top1=2823 accp=0.999 next=pair draft=18847 prop=18847 pred gate=device Token # 212: 116.503ms; value: next_token_ids=tensor([18847], device='cuda:0') mtp accept=1 prop=18847 top1=18847 accp=0.999 next=draft=3202 prop=3202 olap pair=111.0ms serial=197.4ms gain=86.4ms ratio=0.44 s0=4.2ms s1=193.2ms wait=0.1/48.2ms pred gate=device Token # 213: 3.711ms; value: next_token_ids=tensor([3343], device='cuda:0') mtp accept=0 prop=3202 top1=3343 accp=0.499 next=pair draft=2554 prop=2554 pred gate=device Token # 214: 116.124ms; value: next_token_ids=tensor([589], device='cuda:0') mtp accept=0 prop=2554 top1=589 accp=0.043 next=draft=84602 prop=84602 olap pair=110.7ms serial=196.9ms gain=86.2ms ratio=0.44 s0=3.8ms s1=193.1ms wait=0.1/48.5ms pred gate=device Token # 215: 117.288ms; value: next_token_ids=tensor([5433], device='cuda:0') mtp accept=0 prop=84602 top1=5433 accp=0.082 next=draft=15685 prop=15685 olap pair=111.9ms serial=198.3ms gain=86.4ms ratio=0.44 s0=4.0ms s1=194.3ms wait=0.1/48.2ms pred gate=device Token # 216: 116.617ms; value: next_token_ids=tensor([15685], device='cuda:0') mtp accept=1 prop=15685 top1=15685 accp=1.000 next=draft=303 prop=303 olap pair=111.2ms serial=198.0ms gain=86.8ms ratio=0.44 s0=3.8ms s1=194.2ms wait=0.1/48.5ms pred gate=device Token # 217: 3.716ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=2916 prop=2916 pred gate=device Token # 218: 116.312ms; value: next_token_ids=tensor([5802], device='cuda:0') mtp accept=0 prop=2916 top1=5802 accp=0.290 next=draft=8283 prop=8283 olap pair=111.1ms serial=197.0ms gain=86.0ms ratio=0.44 s0=5.7ms s1=191.3ms wait=0.1/45.9ms pred gate=device Token # 219: 116.513ms; value: next_token_ids=tensor([8283], device='cuda:0') mtp accept=1 prop=8283 top1=8283 accp=0.891 next=draft=410 prop=410 olap pair=111.2ms serial=197.9ms gain=86.7ms ratio=0.44 s0=3.8ms s1=194.1ms wait=0.1/48.5ms pred gate=device Token # 220: 3.729ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=1.000 next=pair draft=10602 prop=10602 pred gate=device Token # 221: 116.720ms; value: next_token_ids=tensor([48442], device='cuda:0') mtp accept=0 prop=10602 top1=48442 accp=0.043 next=draft=410 prop=410 olap pair=111.5ms serial=198.0ms gain=86.5ms ratio=0.44 s0=4.8ms s1=193.2ms wait=0.1/47.2ms pred gate=device Token # 222: 116.776ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=0.994 next=draft=35991 prop=35991 olap pair=111.5ms serial=197.2ms gain=85.7ms ratio=0.43 s0=6.8ms s1=190.4ms wait=0.2/44.8ms pred gate=device Token # 223: 3.720ms; value: next_token_ids=tensor([10861], device='cuda:0') mtp accept=0 prop=35991 top1=10861 accp=0.089 next=pair draft=10909 prop=10909 pred gate=device Token # 224: 116.863ms; value: next_token_ids=tensor([10909], device='cuda:0') mtp accept=1 prop=10909 top1=10909 accp=0.999 next=draft=42562 prop=42562 olap pair=111.5ms serial=197.0ms gain=85.6ms ratio=0.43 s0=7.1ms s1=190.0ms wait=0.2/44.6ms pred gate=device Token # 225: 3.772ms; value: next_token_ids=tensor([42562], device='cuda:0') mtp accept=1 prop=42562 top1=42562 accp=0.953 next=pair draft=76767 prop=76767 pred gate=device Token # 226: 115.914ms; value: next_token_ids=tensor([3800], device='cuda:0') mtp accept=0 prop=76767 top1=3800 accp=0.092 next=draft=320 prop=320 olap pair=110.7ms serial=196.8ms gain=86.1ms ratio=0.44 s0=4.1ms s1=192.7ms wait=0.1/47.9ms pred gate=device Token # 227: 117.002ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.999 next=draft=53091 prop=53091 olap pair=111.6ms serial=198.1ms gain=86.5ms ratio=0.44 s0=4.2ms s1=193.9ms wait=0.1/47.7ms pred gate=device Token # 228: 3.719ms; value: next_token_ids=tensor([111671], device='cuda:0') mtp accept=0 prop=53091 top1=13511 accp=0.057 next=pair draft=90974 prop=90974 pred gate=device Token # 229: 119.416ms; value: next_token_ids=tensor([90974], device='cuda:0') mtp accept=1 prop=90974 top1=90974 accp=0.701 next=draft=1121 prop=1121 olap pair=113.3ms serial=200.7ms gain=87.4ms ratio=0.44 s0=5.6ms s1=195.1ms wait=0.1/45.9ms pred gate=device Token # 230: 4.758ms; value: next_token_ids=tensor([856], device='cuda:0') mtp accept=0 prop=1121 top1=856 accp=0.424 next=pair draft=16 prop=16 pred gate=device Token # 231: 117.020ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=14882 prop=14882 olap pair=111.6ms serial=197.6ms gain=86.0ms ratio=0.44 s0=7.8ms s1=189.8ms wait=0.2/43.5ms pred gate=device Token # 232: 3.699ms; value: next_token_ids=tensor([14882], device='cuda:0') mtp accept=1 prop=14882 top1=14882 accp=1.000 next=pair draft=28828 prop=28828 pred gate=device Token # 233: 116.260ms; value: next_token_ids=tensor([28828], device='cuda:0') mtp accept=1 prop=28828 top1=28828 accp=1.000 next=draft=2283 prop=2283 olap pair=110.8ms serial=196.6ms gain=85.8ms ratio=0.44 s0=6.1ms s1=190.6ms wait=0.2/45.6ms pred gate=device Token # 234: 3.754ms; value: next_token_ids=tensor([2283], device='cuda:0') mtp accept=1 prop=2283 top1=2283 accp=1.000 next=pair draft=410 prop=410 pred gate=device Token # 235: 115.881ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=0.995 next=draft=17136 prop=17136 olap pair=110.6ms serial=196.5ms gain=85.9ms ratio=0.44 s0=5.2ms s1=191.3ms wait=0.1/46.9ms pred gate=device Token # 236: 3.760ms; value: next_token_ids=tensor([17136], device='cuda:0') mtp accept=1 prop=17136 top1=17136 accp=1.000 next=pair draft=14811 prop=14811 pred gate=device Token # 237: 116.102ms; value: next_token_ids=tensor([14811], device='cuda:0') mtp accept=1 prop=14811 top1=14811 accp=1.000 next=draft=223 prop=223 olap pair=110.8ms serial=196.5ms gain=85.7ms ratio=0.44 s0=5.6ms s1=190.9ms wait=0.2/46.2ms pred gate=device Token # 238: 3.810ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=18 prop=18 pred gate=device Token # 239: 117.053ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=16 prop=16 olap pair=111.8ms serial=198.2ms gain=86.4ms ratio=0.44 s0=4.1ms s1=194.1ms wait=0.1/48.0ms pred gate=device Token # 240: 3.781ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 241: 116.644ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=11274 prop=11274 olap pair=111.0ms serial=195.8ms gain=84.8ms ratio=0.43 s0=5.2ms s1=190.6ms wait=0.2/47.0ms pred gate=device Token # 242: 3.775ms; value: next_token_ids=tensor([11274], device='cuda:0') mtp accept=1 prop=11274 top1=11274 accp=0.982 next=pair draft=410 prop=410 pred gate=device Token # 243: 117.614ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=0.867 next=draft=8835 prop=8835 olap pair=111.6ms serial=197.1ms gain=85.6ms ratio=0.43 s0=8.0ms s1=189.1ms wait=0.2/43.3ms pred gate=device Token # 244: 4.655ms; value: next_token_ids=tensor([8835], device='cuda:0') mtp accept=1 prop=8835 top1=8835 accp=0.999 next=pair draft=4894 prop=4894 pred gate=device Token # 245: 117.844ms; value: next_token_ids=tensor([4894], device='cuda:0') mtp accept=1 prop=4894 top1=4894 accp=1.000 next=draft=223 prop=223 olap pair=111.5ms serial=197.1ms gain=85.6ms ratio=0.43 s0=4.8ms s1=192.3ms wait=0.1/47.5ms pred gate=device Token # 246: 4.664ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=2738 prop=2738 pred gate=device Token # 247: 116.420ms; value: next_token_ids=tensor([2738], device='cuda:0') mtp accept=1 prop=2738 top1=2738 accp=1.000 next=draft=7016 prop=7016 olap pair=111.0ms serial=196.2ms gain=85.2ms ratio=0.43 s0=8.6ms s1=187.6ms wait=0.2/42.7ms pred gate=device Token # 248: 3.803ms; value: next_token_ids=tensor([7016], device='cuda:0') mtp accept=1 prop=7016 top1=7016 accp=1.000 next=pair draft=11274 prop=11274 pred gate=device Token # 249: 116.975ms; value: next_token_ids=tensor([11274], device='cuda:0') mtp accept=1 prop=11274 top1=11274 accp=1.000 next=draft=2554 prop=2554 olap pair=111.7ms serial=197.2ms gain=85.5ms ratio=0.43 s0=6.9ms s1=190.3ms wait=0.2/44.7ms pred gate=device Token # 250: 3.738ms; value: next_token_ids=tensor([2554], device='cuda:0') mtp accept=1 prop=2554 top1=2554 accp=0.978 next=pair draft=2920 prop=2920 pred gate=device Token # 251: 116.511ms; value: next_token_ids=tensor([2920], device='cuda:0') mtp accept=1 prop=2920 top1=2920 accp=0.991 next=draft=69674 prop=69674 olap pair=111.2ms serial=197.1ms gain=86.0ms ratio=0.44 s0=4.2ms s1=193.0ms wait=0.1/47.9ms pred gate=device Token # 252: 3.851ms; value: next_token_ids=tensor([69674], device='cuda:0') mtp accept=1 prop=69674 top1=69674 accp=0.999 next=pair draft=32627 prop=32627 pred gate=device Token # 253: 116.053ms; value: next_token_ids=tensor([1532], device='cuda:0') mtp accept=0 prop=32627 top1=1532 accp=0.149 next=draft=1295 prop=1295 olap pair=110.8ms serial=196.0ms gain=85.3ms ratio=0.43 s0=6.1ms s1=190.0ms wait=0.2/45.6ms pred gate=device Token # 254: 116.135ms; value: next_token_ids=tensor([1295], device='cuda:0') mtp accept=1 prop=1295 top1=1295 accp=0.996 next=draft=18901 prop=18901 olap pair=110.6ms serial=196.1ms gain=85.4ms ratio=0.44 s0=7.3ms s1=188.7ms wait=0.2/44.2ms pred gate=device Token # 255: 3.771ms; value: next_token_ids=tensor([18901], device='cuda:0') mtp accept=1 prop=18901 top1=18901 accp=1.000 next=pair draft=478 prop=478 pred gate=device Token # 256: 117.068ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.996 next=draft=6451 prop=69440 olap pair=111.1ms serial=196.5ms gain=85.4ms ratio=0.43 s0=7.7ms s1=188.8ms wait=0.2/43.7ms pred gate=device Token # 257: 4.618ms; value: next_token_ids=tensor([6451], device='cuda:0') mtp accept=0 prop=69440 top1=6451 accp=0.328 next=pair draft=642 prop=642 pred gate=device Token # 258: 116.400ms; value: next_token_ids=tensor([1255], device='cuda:0') mtp accept=0 prop=642 top1=1255 accp=0.005 next=draft=112016 prop=112016 olap pair=111.0ms serial=197.0ms gain=86.0ms ratio=0.44 s0=5.9ms s1=191.1ms wait=0.2/46.0ms pred gate=device Token # 259: 117.595ms; value: next_token_ids=tensor([112016], device='cuda:0') mtp accept=1 prop=112016 top1=112016 accp=0.995 next=draft=24268 prop=24268 olap pair=112.2ms serial=199.2ms gain=87.0ms ratio=0.44 s0=4.0ms s1=195.2ms wait=0.1/48.0ms pred gate=device Token # 260: 3.725ms; value: next_token_ids=tensor([24268], device='cuda:0') mtp accept=1 prop=24268 top1=24268 accp=1.000 next=pair draft=3257 prop=3257 pred gate=device Token # 261: 116.637ms; value: next_token_ids=tensor([3257], device='cuda:0') mtp accept=1 prop=3257 top1=3257 accp=0.956 next=draft=3486 prop=3486 olap pair=111.3ms serial=197.1ms gain=85.8ms ratio=0.44 s0=8.4ms s1=188.7ms wait=0.2/42.8ms pred gate=device Token # 262: 3.757ms; value: next_token_ids=tensor([3486], device='cuda:0') mtp accept=1 prop=3486 top1=3486 accp=0.998 next=pair draft=1078 prop=1078 pred gate=device Token # 263: 116.641ms; value: next_token_ids=tensor([1078], device='cuda:0') mtp accept=1 prop=1078 top1=1078 accp=0.993 next=draft=22067 prop=22067 olap pair=111.3ms serial=196.3ms gain=85.0ms ratio=0.43 s0=4.7ms s1=191.6ms wait=0.1/47.5ms pred gate=device Token # 264: 3.779ms; value: next_token_ids=tensor([1959], device='cuda:0') mtp accept=0 prop=22067 top1=1959 accp=0.119 next=pair draft=3343 prop=3343 pred gate=device Token # 265: 116.027ms; value: next_token_ids=tensor([11416], device='cuda:0') mtp accept=0 prop=3343 top1=11416 accp=0.003 next=draft=86964 prop=86964 olap pair=110.8ms serial=197.1ms gain=86.3ms ratio=0.44 s0=3.9ms s1=193.2ms wait=0.1/48.5ms pred gate=device Token # 266: 117.760ms; value: next_token_ids=tensor([12210], device='cuda:0') mtp accept=0 prop=86964 top1=12210 accp=0.701 next=draft=303 prop=303 olap pair=111.6ms serial=198.3ms gain=86.7ms ratio=0.44 s0=4.7ms s1=193.6ms wait=0.1/47.3ms pred gate=device Token # 267: 116.948ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=1700 prop=1700 olap pair=111.2ms serial=196.9ms gain=85.7ms ratio=0.44 s0=7.1ms s1=189.8ms wait=0.2/44.5ms pred gate=device Token # 268: 3.752ms; value: next_token_ids=tensor([1700], device='cuda:0') mtp accept=1 prop=1700 top1=1700 accp=0.996 next=pair draft=19897 prop=7606 pred gate=device Token # 269: 116.398ms; value: next_token_ids=tensor([7790], device='cuda:0') mtp accept=0 prop=7606 top1=7790 accp=0.057 next=draft=23220 prop=23220 olap pair=111.1ms serial=197.2ms gain=86.1ms ratio=0.44 s0=4.1ms s1=193.1ms wait=0.1/48.3ms pred gate=device Token # 270: 117.475ms; value: next_token_ids=tensor([450], device='cuda:0') mtp accept=0 prop=23220 top1=450 accp=0.110 next=draft=7606 prop=7606 olap pair=111.3ms serial=197.6ms gain=86.3ms ratio=0.44 s0=6.3ms s1=191.3ms wait=0.2/45.3ms pred gate=device Token # 271: 117.754ms; value: next_token_ids=tensor([7606], device='cuda:0') mtp accept=1 prop=7606 top1=7606 accp=1.000 next=draft=946 prop=946 olap pair=111.4ms serial=197.5ms gain=86.1ms ratio=0.44 s0=5.7ms s1=191.7ms wait=0.2/46.0ms pred gate=device Token # 272: 4.669ms; value: next_token_ids=tensor([946], device='cuda:0') mtp accept=1 prop=946 top1=946 accp=0.982 next=pair draft=320 prop=320 pred gate=device Token # 273: 116.713ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=2099 prop=2099 olap pair=111.3ms serial=196.5ms gain=85.1ms ratio=0.43 s0=7.7ms s1=188.7ms wait=0.2/43.8ms pred gate=device Token # 274: 3.715ms; value: next_token_ids=tensor([5802], device='cuda:0') mtp accept=0 prop=2099 top1=5802 accp=0.003 next=pair draft=48 prop=48 pred gate=device Token # 275: 116.584ms; value: next_token_ids=tensor([89267], device='cuda:0') mtp accept=0 prop=48 top1=89267 accp=0.001 next=draft=3523 prop=3523 olap pair=111.2ms serial=197.2ms gain=86.0ms ratio=0.44 s0=4.2ms s1=193.1ms wait=0.1/48.0ms pred gate=device Token # 276: 116.700ms; value: next_token_ids=tensor([14169], device='cuda:0') mtp accept=0 prop=3523 top1=115564 accp=0.002 next=draft=410 prop=410 olap pair=111.4ms serial=197.5ms gain=86.1ms ratio=0.44 s0=5.4ms s1=192.1ms wait=0.1/46.6ms pred gate=device Token # 277: 116.738ms; value: next_token_ids=tensor([109936], device='cuda:0') mtp accept=0 prop=410 top1=109936 accp=0.000 next=draft=728 prop=728 olap pair=111.4ms serial=197.0ms gain=85.6ms ratio=0.43 s0=4.2ms s1=192.7ms wait=0.1/47.8ms pred gate=device Token # 278: 117.016ms; value: next_token_ids=tensor([9308], device='cuda:0') mtp accept=0 prop=728 top1=9308 accp=0.000 next=draft=1395 prop=1395 olap pair=111.7ms serial=196.8ms gain=85.1ms ratio=0.43 s0=4.3ms s1=192.5ms wait=0.1/48.1ms pred gate=device Token # 279: 117.439ms; value: next_token_ids=tensor([1395], device='cuda:0') mtp accept=1 prop=1395 top1=1395 accp=1.000 next=draft=29790 prop=29790 olap pair=112.0ms serial=198.2ms gain=86.2ms ratio=0.44 s0=7.2ms s1=191.0ms wait=0.2/44.4ms pred gate=device Token # 280: 3.809ms; value: next_token_ids=tensor([29790], device='cuda:0') mtp accept=1 prop=29790 top1=29790 accp=1.000 next=pair draft=3745 prop=3745 pred gate=device Token # 281: 116.992ms; value: next_token_ids=tensor([1169], device='cuda:0') mtp accept=0 prop=3745 top1=1169 accp=0.190 next=draft=320 prop=320 olap pair=111.6ms serial=197.6ms gain=86.0ms ratio=0.44 s0=4.1ms s1=193.5ms wait=0.1/48.3ms pred gate=device Token # 282: 116.534ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=0 prop=320 top1=478 accp=0.523 next=draft=17534 prop=17534 olap pair=111.3ms serial=197.8ms gain=86.5ms ratio=0.44 s0=4.0ms s1=193.8ms wait=0.1/48.1ms pred gate=device Token # 283: 115.820ms; value: next_token_ids=tensor([14475], device='cuda:0') mtp accept=0 prop=17534 top1=14475 accp=0.004 next=draft=38 prop=38 olap pair=110.5ms serial=195.0ms gain=84.5ms ratio=0.43 s0=4.2ms s1=190.8ms wait=0.1/48.1ms pred gate=device Token # 284: 117.479ms; value: next_token_ids=tensor([38], device='cuda:0') mtp accept=1 prop=38 top1=38 accp=1.000 next=draft=35991 prop=35991 olap pair=112.1ms serial=197.0ms gain=84.8ms ratio=0.43 s0=6.7ms s1=190.2ms wait=0.2/44.8ms pred gate=device Token # 285: 3.756ms; value: next_token_ids=tensor([35991], device='cuda:0') mtp accept=1 prop=35991 top1=35991 accp=0.677 next=pair draft=301 prop=301 pred gate=device Token # 286: 116.308ms; value: next_token_ids=tensor([72417], device='cuda:0') mtp accept=0 prop=301 top1=72417 accp=0.330 next=draft=3343 prop=3343 olap pair=111.1ms serial=197.6ms gain=86.4ms ratio=0.44 s0=4.0ms s1=193.6ms wait=0.1/48.3ms pred gate=device Token # 287: 116.517ms; value: next_token_ids=tensor([3343], device='cuda:0') mtp accept=1 prop=3343 top1=3343 accp=0.784 next=draft=10280 prop=39802 olap pair=111.1ms serial=197.0ms gain=85.8ms ratio=0.44 s0=4.6ms s1=192.4ms wait=0.1/47.5ms pred gate=device Token # 288: 3.787ms; value: next_token_ids=tensor([39802], device='cuda:0') mtp accept=1 prop=39802 top1=39802 accp=0.344 next=pair draft=303 prop=303 pred gate=device Token # 289: 115.279ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=2386 prop=2386 olap pair=110.1ms serial=195.4ms gain=85.3ms ratio=0.44 s0=5.8ms s1=189.6ms wait=0.2/46.1ms pred gate=device Token # 290: 3.697ms; value: next_token_ids=tensor([2386], device='cuda:0') mtp accept=1 prop=2386 top1=589 accp=0.766 next=pair draft=1555 prop=1555 pred gate=device Token # 291: 117.616ms; value: next_token_ids=tensor([17165], device='cuda:0') mtp accept=0 prop=1555 top1=17165 accp=0.352 next=draft=63492 prop=63492 olap pair=111.6ms serial=198.1ms gain=86.4ms ratio=0.44 s0=6.0ms s1=192.0ms wait=0.2/45.8ms pred gate=device Token # 292: 116.405ms; value: next_token_ids=tensor([6977], device='cuda:0') mtp accept=0 prop=63492 top1=6977 accp=0.013 next=draft=6656 prop=6656 olap pair=110.9ms serial=196.9ms gain=86.0ms ratio=0.44 s0=5.7ms s1=191.2ms wait=0.2/46.2ms pred gate=device Token # 293: 117.090ms; value: next_token_ids=tensor([6656], device='cuda:0') mtp accept=1 prop=6656 top1=6656 accp=0.798 next=draft=3461 prop=26182 olap pair=111.7ms serial=198.1ms gain=86.4ms ratio=0.44 s0=3.9ms s1=194.2ms wait=0.1/48.4ms pred gate=device Token # 294: 3.722ms; value: next_token_ids=tensor([2684], device='cuda:0') mtp accept=0 prop=26182 top1=3461 accp=0.612 next=pair draft=751 prop=751 pred gate=device Token # 295: 116.625ms; value: next_token_ids=tensor([31304], device='cuda:0') mtp accept=0 prop=751 top1=12072 accp=0.084 next=draft=303 prop=303 olap pair=111.3ms serial=197.7ms gain=86.4ms ratio=0.44 s0=3.9ms s1=193.8ms wait=0.1/48.5ms pred gate=device Token # 296: 117.107ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.830 next=draft=3515 prop=3515 olap pair=111.0ms serial=196.3ms gain=85.3ms ratio=0.43 s0=8.0ms s1=188.3ms wait=0.2/43.3ms pred gate=device Token # 297: 4.571ms; value: next_token_ids=tensor([90738], device='cuda:0') mtp accept=0 prop=3515 top1=90738 accp=0.077 next=pair draft=548 prop=548 pred gate=device Token # 298: 116.445ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=1 prop=548 top1=548 accp=0.991 next=draft=24268 prop=24268 olap pair=110.9ms serial=196.4ms gain=85.4ms ratio=0.44 s0=7.3ms s1=189.0ms wait=0.2/44.2ms pred gate=device Token # 299: 3.751ms; value: next_token_ids=tensor([41354], device='cuda:0') mtp accept=0 prop=24268 top1=41354 accp=0.003 next=pair draft=17349 prop=17349 pred gate=device Token # 300: 116.230ms; value: next_token_ids=tensor([17349], device='cuda:0') mtp accept=1 prop=17349 top1=17349 accp=0.962 next=draft=12411 prop=6668 olap pair=110.9ms serial=196.6ms gain=85.7ms ratio=0.44 s0=6.4ms s1=190.2ms wait=0.2/45.0ms pred gate=device Token # 301: 3.783ms; value: next_token_ids=tensor([6668], device='cuda:0') mtp accept=1 prop=6668 top1=6668 accp=0.289 next=pair draft=1710 prop=1710 pred gate=device Token # 302: 115.960ms; value: next_token_ids=tensor([1710], device='cuda:0') mtp accept=1 prop=1710 top1=1710 accp=1.000 next=draft=320 prop=320 olap pair=110.7ms serial=196.0ms gain=85.3ms ratio=0.44 s0=4.2ms s1=191.8ms wait=0.1/47.7ms pred gate=device Token # 303: 3.761ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.909 next=pair draft=556 prop=556 pred gate=device Token # 304: 117.287ms; value: next_token_ids=tensor([5802], device='cuda:0') mtp accept=0 prop=556 top1=5802 accp=0.007 next=draft=4958 prop=4958 olap pair=111.3ms serial=197.1ms gain=85.7ms ratio=0.44 s0=7.0ms s1=190.1ms wait=0.2/44.4ms pred gate=device Token # 305: 116.652ms; value: next_token_ids=tensor([70359], device='cuda:0') mtp accept=0 prop=4958 top1=70359 accp=0.003 next=draft=548 prop=548 olap pair=111.1ms serial=196.9ms gain=85.8ms ratio=0.44 s0=7.5ms s1=189.4ms wait=0.2/43.9ms pred gate=device Token # 306: 116.550ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=0 prop=548 top1=4339 accp=0.346 next=draft=410 prop=410 olap pair=111.2ms serial=197.8ms gain=86.6ms ratio=0.44 s0=4.1ms s1=193.6ms wait=0.1/48.2ms pred gate=device Token # 307: 117.392ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=0.824 next=draft=23590 prop=23590 olap pair=111.2ms serial=197.4ms gain=86.2ms ratio=0.44 s0=5.1ms s1=192.3ms wait=0.1/47.1ms pred gate=device Token # 308: 4.719ms; value: next_token_ids=tensor([23590], device='cuda:0') mtp accept=1 prop=23590 top1=23590 accp=1.000 next=pair draft=4339 prop=4339 pred gate=device Token # 309: 116.763ms; value: next_token_ids=tensor([8266], device='cuda:0') mtp accept=0 prop=4339 top1=7157 accp=0.026 next=draft=13669 prop=50656 olap pair=111.3ms serial=196.5ms gain=85.2ms ratio=0.43 s0=8.8ms s1=187.7ms wait=0.2/42.5ms pred gate=device Token # 310: 116.238ms; value: next_token_ids=tensor([97464], device='cuda:0') mtp accept=0 prop=50656 top1=97464 accp=0.001 next=draft=69674 prop=13669 olap pair=111.0ms serial=197.4ms gain=86.4ms ratio=0.44 s0=3.9ms s1=193.5ms wait=0.1/48.3ms pred gate=device Token # 311: 116.043ms; value: next_token_ids=tensor([12659], device='cuda:0') mtp accept=0 prop=13669 top1=12659 accp=0.129 next=draft=18901 prop=18901 olap pair=110.7ms serial=195.5ms gain=84.8ms ratio=0.43 s0=4.2ms s1=191.3ms wait=0.1/48.0ms pred gate=device Token # 312: 117.784ms; value: next_token_ids=tensor([3486], device='cuda:0') mtp accept=0 prop=18901 top1=3486 accp=0.299 next=draft=8844 prop=8844 olap pair=111.5ms serial=197.0ms gain=85.6ms ratio=0.43 s0=5.8ms s1=191.2ms wait=0.1/45.9ms pred gate=device Token # 313: 117.705ms; value: next_token_ids=tensor([8844], device='cuda:0') mtp accept=1 prop=8844 top1=8844 accp=0.999 next=draft=478 prop=478 olap pair=111.5ms serial=197.0ms gain=85.5ms ratio=0.43 s0=8.7ms s1=188.3ms wait=0.2/42.5ms pred gate=device Token # 314: 3.780ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=pair draft=17534 prop=17534 pred gate=device Token # 315: 116.009ms; value: next_token_ids=tensor([11458], device='cuda:0') mtp accept=0 prop=17534 top1=17534 accp=0.505 next=draft=6710 prop=34071 olap pair=110.8ms serial=196.3ms gain=85.5ms ratio=0.44 s0=5.6ms s1=190.8ms wait=0.2/46.2ms pred gate=device Token # 316: 117.630ms; value: next_token_ids=tensor([1700], device='cuda:0') mtp accept=0 prop=34071 top1=445 accp=0.230 next=draft=7790 prop=7790 olap pair=111.6ms serial=195.0ms gain=83.4ms ratio=0.43 s0=8.6ms s1=186.5ms wait=0.2/42.8ms pred gate=device Token # 317: 116.412ms; value: next_token_ids=tensor([1913], device='cuda:0') mtp accept=0 prop=7790 top1=1913 accp=0.105 next=draft=30472 prop=30472 olap pair=111.0ms serial=196.7ms gain=85.7ms ratio=0.44 s0=3.9ms s1=192.8ms wait=0.1/48.5ms pred gate=device Token # 318: 115.705ms; value: next_token_ids=tensor([30472], device='cuda:0') mtp accept=1 prop=30472 top1=30472 accp=1.000 next=draft=3154 prop=3154 olap pair=110.3ms serial=196.2ms gain=85.9ms ratio=0.44 s0=4.2ms s1=191.9ms wait=0.1/47.7ms pred gate=device Token # 319: 3.822ms; value: next_token_ids=tensor([3154], device='cuda:0') mtp accept=1 prop=3154 top1=3154 accp=0.984 next=pair draft=3414 prop=3414 pred gate=device Token # 320: 116.041ms; value: next_token_ids=tensor([3414], device='cuda:0') mtp accept=1 prop=3414 top1=3414 accp=1.000 next=draft=89032 prop=89032 olap pair=110.8ms serial=196.3ms gain=85.6ms ratio=0.44 s0=4.5ms s1=191.8ms wait=0.1/47.7ms pred gate=device Token # 321: 4.342ms; value: next_token_ids=tensor([89032], device='cuda:0') mtp accept=1 prop=89032 top1=89032 accp=0.983 next=pair draft=6715 prop=6715 pred gate=device Token # 322: 116.164ms; value: next_token_ids=tensor([41996], device='cuda:0') mtp accept=0 prop=6715 top1=41996 accp=0.064 next=draft=2823 prop=2823 olap pair=110.0ms serial=193.9ms gain=83.9ms ratio=0.43 s0=8.9ms s1=185.1ms wait=0.2/42.4ms pred gate=device Token # 323: 117.032ms; value: next_token_ids=tensor([49590], device='cuda:0') mtp accept=0 prop=2823 top1=49590 accp=0.035 next=draft=24629 prop=24629 olap pair=110.7ms serial=196.2ms gain=85.5ms ratio=0.44 s0=6.4ms s1=189.8ms wait=0.2/45.3ms pred gate=device Token # 324: 115.993ms; value: next_token_ids=tensor([24629], device='cuda:0') mtp accept=1 prop=24629 top1=24629 accp=1.000 next=draft=51957 prop=51957 olap pair=110.5ms serial=195.8ms gain=85.3ms ratio=0.44 s0=4.1ms s1=191.7ms wait=0.1/48.1ms pred gate=device Token # 325: 3.718ms; value: next_token_ids=tensor([55443], device='cuda:0') mtp accept=0 prop=51957 top1=55443 accp=0.096 next=pair draft=303 prop=303 pred gate=device Token # 326: 116.559ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.992 next=draft=36108 prop=36108 olap pair=111.3ms serial=197.6ms gain=86.3ms ratio=0.44 s0=5.9ms s1=191.7ms wait=0.2/45.9ms pred gate=device Token # 327: 3.685ms; value: next_token_ids=tensor([15495], device='cuda:0') mtp accept=0 prop=36108 top1=15495 accp=0.079 next=pair draft=76326 prop=76326 pred gate=device Token # 328: 116.390ms; value: next_token_ids=tensor([4068], device='cuda:0') mtp accept=0 prop=76326 top1=4068 accp=0.024 next=draft=74766 prop=74766 olap pair=111.1ms serial=197.6ms gain=86.4ms ratio=0.44 s0=3.9ms s1=193.6ms wait=0.1/48.3ms pred gate=device Token # 329: 117.325ms; value: next_token_ids=tensor([74766], device='cuda:0') mtp accept=1 prop=74766 top1=74766 accp=1.000 next=draft=24629 prop=24629 olap pair=111.2ms serial=197.4ms gain=86.3ms ratio=0.44 s0=5.1ms s1=192.3ms wait=0.1/46.7ms pred gate=device Token # 330: 4.737ms; value: next_token_ids=tensor([24629], device='cuda:0') mtp accept=1 prop=24629 top1=24629 accp=0.999 next=pair draft=7018 prop=7018 pred gate=device Token # 331: 116.152ms; value: next_token_ids=tensor([7018], device='cuda:0') mtp accept=1 prop=7018 top1=7018 accp=1.000 next=draft=320 prop=320 olap pair=110.6ms serial=196.2ms gain=85.5ms ratio=0.44 s0=4.8ms s1=191.4ms wait=0.1/47.4ms pred gate=device Token # 332: 3.687ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=1263 prop=6451 pred gate=device Token # 333: 116.638ms; value: next_token_ids=tensor([1263], device='cuda:0') mtp accept=0 prop=6451 top1=1263 accp=0.507 next=draft=9456 prop=9456 olap pair=111.3ms serial=197.5ms gain=86.2ms ratio=0.44 s0=4.0ms s1=193.5ms wait=0.1/48.3ms pred gate=device Token # 334: 116.994ms; value: next_token_ids=tensor([1107], device='cuda:0') mtp accept=0 prop=9456 top1=1107 accp=0.076 next=draft=3745 prop=3745 olap pair=111.6ms serial=197.4ms gain=85.8ms ratio=0.43 s0=4.3ms s1=193.1ms wait=0.1/47.8ms pred gate=device Token # 335: 116.184ms; value: next_token_ids=tensor([3745], device='cuda:0') mtp accept=1 prop=3745 top1=3745 accp=0.995 next=draft=1395 prop=1395 olap pair=110.8ms serial=196.9ms gain=86.1ms ratio=0.44 s0=4.1ms s1=192.8ms wait=0.1/48.2ms pred gate=device Token # 336: 3.736ms; value: next_token_ids=tensor([9511], device='cuda:0') mtp accept=0 prop=1395 top1=9511 accp=0.009 next=pair draft=5776 prop=5776 pred gate=device Token # 337: 116.292ms; value: next_token_ids=tensor([4055], device='cuda:0') mtp accept=0 prop=5776 top1=4055 accp=0.267 next=draft=303 prop=303 olap pair=110.9ms serial=197.2ms gain=86.3ms ratio=0.44 s0=4.0ms s1=193.1ms wait=0.1/48.3ms pred gate=device Token # 338: 115.884ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=1700 prop=1700 olap pair=110.4ms serial=196.4ms gain=86.0ms ratio=0.44 s0=3.9ms s1=192.5ms wait=0.1/48.4ms pred gate=device Token # 339: 3.728ms; value: next_token_ids=tensor([1700], device='cuda:0') mtp accept=1 prop=1700 top1=1700 accp=1.000 next=pair draft=14041 prop=14041 pred gate=device Token # 340: 116.345ms; value: next_token_ids=tensor([14041], device='cuda:0') mtp accept=1 prop=14041 top1=14041 accp=0.833 next=draft=1395 prop=1395 olap pair=111.1ms serial=197.6ms gain=86.6ms ratio=0.44 s0=3.9ms s1=193.8ms wait=0.1/48.4ms pred gate=device Token # 341: 3.732ms; value: next_token_ids=tensor([1395], device='cuda:0') mtp accept=1 prop=1395 top1=1395 accp=1.000 next=pair draft=95512 prop=95512 pred gate=device Token # 342: 115.775ms; value: next_token_ids=tensor([95512], device='cuda:0') mtp accept=1 prop=95512 top1=95512 accp=0.885 next=draft=303 prop=303 olap pair=110.4ms serial=196.4ms gain=86.0ms ratio=0.44 s0=3.8ms s1=192.6ms wait=0.1/48.6ms pred gate=device Token # 343: 3.788ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.989 next=pair draft=1207 prop=1207 pred gate=device Token # 344: 115.732ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=1 prop=1207 top1=1207 accp=0.986 next=draft=38451 prop=4916 olap pair=110.5ms serial=196.7ms gain=86.1ms ratio=0.44 s0=3.8ms s1=192.9ms wait=0.1/48.6ms pred gate=device Token # 345: 3.716ms; value: next_token_ids=tensor([4916], device='cuda:0') mtp accept=1 prop=4916 top1=4916 accp=0.080 next=pair draft=7462 prop=7462 pred gate=device Token # 346: 116.316ms; value: next_token_ids=tensor([7462], device='cuda:0') mtp accept=1 prop=7462 top1=7462 accp=1.000 next=draft=28638 prop=28638 olap pair=111.0ms serial=197.0ms gain=85.9ms ratio=0.44 s0=4.3ms s1=192.7ms wait=0.1/47.8ms pred gate=device Token # 347: 3.793ms; value: next_token_ids=tensor([1532], device='cuda:0') mtp accept=0 prop=28638 top1=1532 accp=0.222 next=pair draft=1100 prop=1100 pred gate=device Token # 348: 117.504ms; value: next_token_ids=tensor([1100], device='cuda:0') mtp accept=1 prop=1100 top1=1100 accp=1.000 next=draft=478 prop=478 olap pair=111.4ms serial=197.0ms gain=85.6ms ratio=0.43 s0=6.4ms s1=190.6ms wait=0.2/45.1ms pred gate=device Token # 349: 4.676ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.882 next=pair draft=853 prop=853 pred gate=device Token # 350: 117.289ms; value: next_token_ids=tensor([853], device='cuda:0') mtp accept=1 prop=853 top1=853 accp=0.986 next=draft=303 prop=303 olap pair=111.1ms serial=196.3ms gain=85.3ms ratio=0.43 s0=8.4ms s1=188.0ms wait=0.2/43.1ms pred gate=device Token # 351: 4.604ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=531 prop=531 pred gate=device Token # 352: 116.701ms; value: next_token_ids=tensor([531], device='cuda:0') mtp accept=1 prop=531 top1=531 accp=0.958 next=draft=13820 prop=13820 olap pair=111.2ms serial=197.2ms gain=86.0ms ratio=0.44 s0=5.0ms s1=192.3ms wait=0.1/47.0ms pred gate=device Token # 353: 3.732ms; value: next_token_ids=tensor([13820], device='cuda:0') mtp accept=1 prop=13820 top1=13820 accp=0.956 next=pair draft=2516 prop=2516 pred gate=device Token # 354: 116.626ms; value: next_token_ids=tensor([2516], device='cuda:0') mtp accept=1 prop=2516 top1=2516 accp=1.000 next=draft=3391 prop=3391 olap pair=111.3ms serial=195.8ms gain=84.5ms ratio=0.43 s0=4.2ms s1=191.6ms wait=0.1/48.3ms pred gate=device Token # 355: 3.807ms; value: next_token_ids=tensor([3391], device='cuda:0') mtp accept=1 prop=3391 top1=3391 accp=1.000 next=pair draft=125724 prop=125724 pred gate=device Token # 356: 116.452ms; value: next_token_ids=tensor([125724], device='cuda:0') mtp accept=1 prop=125724 top1=125724 accp=1.000 next=draft=320 prop=320 olap pair=111.2ms serial=196.6ms gain=85.4ms ratio=0.43 s0=4.8ms s1=191.8ms wait=0.1/47.2ms pred gate=device Token # 357: 3.800ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.998 next=pair draft=6710 prop=6710 pred gate=device Token # 358: 117.896ms; value: next_token_ids=tensor([90080], device='cuda:0') mtp accept=0 prop=6710 top1=90080 accp=0.061 next=draft=2823 prop=2823 olap pair=111.8ms serial=198.0ms gain=86.1ms ratio=0.43 s0=8.7ms s1=189.3ms wait=0.2/42.4ms pred gate=device Token # 359: 116.947ms; value: next_token_ids=tensor([3599], device='cuda:0') mtp accept=0 prop=2823 top1=3599 accp=0.002 next=draft=2823 prop=2823 olap pair=111.4ms serial=197.5ms gain=86.2ms ratio=0.44 s0=5.9ms s1=191.6ms wait=0.2/45.8ms pred gate=device Token # 360: 116.565ms; value: next_token_ids=tensor([2823], device='cuda:0') mtp accept=1 prop=2823 top1=2823 accp=1.000 next=draft=3975 prop=3975 olap pair=111.2ms serial=197.8ms gain=86.6ms ratio=0.44 s0=3.8ms s1=194.0ms wait=0.1/48.5ms pred gate=device Token # 361: 3.722ms; value: next_token_ids=tensor([18847], device='cuda:0') mtp accept=0 prop=3975 top1=18847 accp=0.107 next=pair draft=303 prop=303 pred gate=device Token # 362: 116.479ms; value: next_token_ids=tensor([19916], device='cuda:0') mtp accept=0 prop=303 top1=19916 accp=0.157 next=draft=303 prop=303 olap pair=111.1ms serial=197.6ms gain=86.5ms ratio=0.44 s0=3.8ms s1=193.7ms wait=0.1/48.5ms pred gate=device Token # 363: 116.575ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=1700 prop=1700 olap pair=111.3ms serial=196.6ms gain=85.4ms ratio=0.43 s0=4.1ms s1=192.5ms wait=0.1/48.3ms pred gate=device Token # 364: 3.716ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=0 prop=1700 top1=1207 accp=0.012 next=pair draft=1700 prop=1700 pred gate=device Token # 365: 116.434ms; value: next_token_ids=tensor([1700], device='cuda:0') mtp accept=1 prop=1700 top1=1700 accp=0.999 next=draft=7831 prop=7831 olap pair=111.1ms serial=197.7ms gain=86.6ms ratio=0.44 s0=3.8ms s1=193.9ms wait=0.1/48.6ms pred gate=device Token # 366: 3.765ms; value: next_token_ids=tensor([7831], device='cuda:0') mtp accept=1 prop=7831 top1=7831 accp=0.763 next=pair draft=1395 prop=1395 pred gate=device Token # 367: 116.562ms; value: next_token_ids=tensor([1395], device='cuda:0') mtp accept=1 prop=1395 top1=1395 accp=0.992 next=draft=4377 prop=44750 olap pair=111.3ms serial=196.6ms gain=85.3ms ratio=0.43 s0=4.2ms s1=192.3ms wait=0.1/47.8ms pred gate=device Token # 368: 3.751ms; value: next_token_ids=tensor([44750], device='cuda:0') mtp accept=1 prop=44750 top1=44750 accp=0.688 next=pair draft=410 prop=410 pred gate=device Token # 369: 116.747ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=0.999 next=draft=1395 prop=1395 olap pair=111.4ms serial=197.8ms gain=86.4ms ratio=0.44 s0=5.1ms s1=192.8ms wait=0.1/47.2ms pred gate=device Token # 370: 3.811ms; value: next_token_ids=tensor([1395], device='cuda:0') mtp accept=1 prop=1395 top1=1395 accp=1.000 next=pair draft=49590 prop=49590 pred gate=device Token # 371: 117.807ms; value: next_token_ids=tensor([49590], device='cuda:0') mtp accept=1 prop=49590 top1=49590 accp=1.000 next=draft=303 prop=303 olap pair=112.5ms serial=198.1ms gain=85.7ms ratio=0.43 s0=4.2ms s1=193.9ms wait=0.1/48.0ms pred gate=device Token # 372: 3.700ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=1700 prop=1700 pred gate=device Token # 373: 116.751ms; value: next_token_ids=tensor([23220], device='cuda:0') mtp accept=0 prop=1700 top1=23220 accp=0.036 next=draft=7606 prop=7606 olap pair=111.5ms serial=198.2ms gain=86.7ms ratio=0.44 s0=4.5ms s1=193.7ms wait=0.1/47.7ms pred gate=device Token # 374: 116.660ms; value: next_token_ids=tensor([21134], device='cuda:0') mtp accept=0 prop=7606 top1=21134 accp=0.104 next=draft=2823 prop=2823 olap pair=111.3ms serial=198.1ms gain=86.8ms ratio=0.44 s0=3.8ms s1=194.3ms wait=0.1/48.6ms pred gate=device Token # 375: 116.462ms; value: next_token_ids=tensor([2823], device='cuda:0') mtp accept=1 prop=2823 top1=2823 accp=0.895 next=draft=7790 prop=7790 olap pair=111.1ms serial=197.4ms gain=86.4ms ratio=0.44 s0=4.1ms s1=193.3ms wait=0.1/48.0ms pred gate=device Token # 376: 3.762ms; value: next_token_ids=tensor([7790], device='cuda:0') mtp accept=1 prop=7790 top1=7790 accp=1.000 next=pair draft=100314 prop=100314 pred gate=device Token # 377: 116.324ms; value: next_token_ids=tensor([39567], device='cuda:0') mtp accept=0 prop=100314 top1=39567 accp=0.012 next=draft=320 prop=320 olap pair=111.0ms serial=197.2ms gain=86.2ms ratio=0.44 s0=4.5ms s1=192.7ms wait=0.1/47.6ms pred gate=device Token # 378: 116.430ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=18912 prop=18912 olap pair=111.1ms serial=197.7ms gain=86.6ms ratio=0.44 s0=4.0ms s1=193.7ms wait=0.1/48.2ms pred gate=device Token # 379: 3.742ms; value: next_token_ids=tensor([18912], device='cuda:0') mtp accept=1 prop=18912 top1=18912 accp=0.978 next=pair draft=18901 prop=6710 pred gate=device Token # 380: 115.937ms; value: next_token_ids=tensor([4754], device='cuda:0') mtp accept=0 prop=6710 top1=4754 accp=0.092 next=draft=6710 prop=6710 olap pair=110.7ms serial=196.7ms gain=86.0ms ratio=0.44 s0=3.9ms s1=192.8ms wait=0.1/48.3ms pred gate=device Token # 381: 116.618ms; value: next_token_ids=tensor([4916], device='cuda:0') mtp accept=0 prop=6710 top1=4916 accp=0.133 next=draft=44148 prop=44148 olap pair=111.3ms serial=197.6ms gain=86.3ms ratio=0.44 s0=5.6ms s1=192.1ms wait=0.1/46.5ms pred gate=device Token # 382: 116.133ms; value: next_token_ids=tensor([28269], device='cuda:0') mtp accept=0 prop=44148 top1=28269 accp=0.021 next=draft=40880 prop=40880 olap pair=110.8ms serial=197.2ms gain=86.4ms ratio=0.44 s0=3.8ms s1=193.5ms wait=0.1/48.6ms pred gate=device Token # 383: 116.003ms; value: next_token_ids=tensor([8649], device='cuda:0') mtp accept=0 prop=40880 top1=8649 accp=0.055 next=draft=303 prop=303 olap pair=110.6ms serial=195.7ms gain=85.0ms ratio=0.43 s0=7.2ms s1=188.5ms wait=0.2/44.5ms pred gate=device Token # 384: 116.266ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=6710 prop=6710 olap pair=111.0ms serial=197.0ms gain=86.1ms ratio=0.44 s0=5.1ms s1=191.9ms wait=0.1/46.9ms pred gate=device Token # 385: 3.742ms; value: next_token_ids=tensor([89538], device='cuda:0') mtp accept=0 prop=6710 top1=6710 accp=0.330 next=pair draft=766 prop=766 pred gate=device Token # 386: 116.514ms; value: next_token_ids=tensor([766], device='cuda:0') mtp accept=1 prop=766 top1=766 accp=1.000 next=draft=28638 prop=28638 olap pair=111.1ms serial=196.2ms gain=85.1ms ratio=0.43 s0=5.1ms s1=191.1ms wait=0.1/47.1ms pred gate=device Token # 387: 3.758ms; value: next_token_ids=tensor([28638], device='cuda:0') mtp accept=1 prop=28638 top1=28638 accp=0.963 next=pair draft=24629 prop=20522 pred gate=device Token # 388: 116.202ms; value: next_token_ids=tensor([24629], device='cuda:0') mtp accept=0 prop=20522 top1=24629 accp=0.737 next=draft=62870 prop=3975 olap pair=110.9ms serial=197.4ms gain=86.5ms ratio=0.44 s0=3.8ms s1=193.6ms wait=0.1/48.6ms pred gate=device Token # 389: 115.954ms; value: next_token_ids=tensor([3975], device='cuda:0') mtp accept=1 prop=3975 top1=62870 accp=0.769 next=draft=320 prop=320 olap pair=110.6ms serial=196.7ms gain=86.1ms ratio=0.44 s0=3.8ms s1=192.9ms wait=0.1/48.6ms pred gate=device Token # 390: 3.718ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.992 next=pair draft=128799 prop=128799 pred gate=device Token # 391: 116.487ms; value: next_token_ids=tensor([128799], device='cuda:0') mtp accept=1 prop=128799 top1=128799 accp=1.000 next=draft=5 prop=5 olap pair=111.3ms serial=197.1ms gain=85.9ms ratio=0.44 s0=5.8ms s1=191.3ms wait=0.2/46.2ms pred gate=device Token # 392: 3.715ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=0 prop=5 top1=666 accp=0.004 next=pair draft=378 prop=378 pred gate=device Token # 393: 115.670ms; value: next_token_ids=tensor([378], device='cuda:0') mtp accept=1 prop=378 top1=378 accp=1.000 next=draft=24268 prop=24268 olap pair=110.4ms serial=196.4ms gain=86.0ms ratio=0.44 s0=3.9ms s1=192.6ms wait=0.1/48.4ms pred gate=device Token # 394: 3.726ms; value: next_token_ids=tensor([24268], device='cuda:0') mtp accept=1 prop=24268 top1=24268 accp=1.000 next=pair draft=1131 prop=1131 pred gate=device Token # 395: 116.028ms; value: next_token_ids=tensor([1131], device='cuda:0') mtp accept=1 prop=1131 top1=1131 accp=1.000 next=draft=2670 prop=2670 olap pair=110.8ms serial=197.0ms gain=86.2ms ratio=0.44 s0=4.0ms s1=193.0ms wait=0.1/48.3ms pred gate=device Token # 396: 3.679ms; value: next_token_ids=tensor([642], device='cuda:0') mtp accept=0 prop=2670 top1=2670 accp=0.627 next=pair draft=768 prop=768 pred gate=device Token # 397: 115.849ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.996 next=draft=1255 prop=112016 olap pair=110.6ms serial=196.7ms gain=86.1ms ratio=0.44 s0=3.9ms s1=192.9ms wait=0.1/48.4ms pred gate=device Token # 398: 3.729ms; value: next_token_ids=tensor([47507], device='cuda:0') mtp accept=0 prop=112016 top1=47507 accp=0.227 next=pair draft=112016 prop=112016 pred gate=device Token # 399: 116.958ms; value: next_token_ids=tensor([428], device='cuda:0') mtp accept=0 prop=112016 top1=428 accp=0.269 next=draft=112016 prop=112016 olap pair=111.6ms serial=198.7ms gain=87.0ms ratio=0.44 s0=3.9ms s1=194.8ms wait=0.1/48.4ms pred gate=device Token # 400: 117.779ms; value: next_token_ids=tensor([112016], device='cuda:0') mtp accept=1 prop=112016 top1=112016 accp=1.000 next=draft=24268 prop=24268 olap pair=111.6ms serial=198.1ms gain=86.6ms ratio=0.44 s0=5.0ms s1=193.2ms wait=0.1/47.2ms pred gate=device Token # 401: 3.951ms; value: next_token_ids=tensor([24268], device='cuda:0') mtp accept=1 prop=24268 top1=24268 accp=1.000 next=pair draft=430 prop=430 pred gate=device Token # 402: 116.705ms; value: next_token_ids=tensor([430], device='cuda:0') mtp accept=1 prop=430 top1=430 accp=1.000 next=draft=2053 prop=2053 olap pair=111.4ms serial=197.4ms gain=85.9ms ratio=0.44 s0=7.1ms s1=190.3ms wait=0.2/44.3ms pred gate=device Token # 403: 3.742ms; value: next_token_ids=tensor([2053], device='cuda:0') mtp accept=1 prop=2053 top1=2053 accp=0.997 next=pair draft=44032 prop=44032 pred gate=device Token # 404: 116.147ms; value: next_token_ids=tensor([44032], device='cuda:0') mtp accept=1 prop=44032 top1=44032 accp=0.992 next=draft=2741 prop=2741 olap pair=110.9ms serial=195.7ms gain=84.9ms ratio=0.43 s0=4.9ms s1=190.9ms wait=0.2/47.0ms pred gate=device Token # 405: 3.770ms; value: next_token_ids=tensor([984], device='cuda:0') mtp accept=0 prop=2741 top1=984 accp=0.056 next=pair draft=2529 prop=2529 pred gate=device Token # 406: 117.218ms; value: next_token_ids=tensor([4793], device='cuda:0') mtp accept=0 prop=2529 top1=4793 accp=0.003 next=draft=22651 prop=22651 olap pair=111.9ms serial=198.6ms gain=86.7ms ratio=0.44 s0=4.1ms s1=194.5ms wait=0.1/48.0ms pred gate=device Token # 407: 118.303ms; value: next_token_ids=tensor([22651], device='cuda:0') mtp accept=1 prop=22651 top1=22651 accp=0.895 next=draft=4374 prop=4374 olap pair=112.9ms serial=199.6ms gain=86.7ms ratio=0.43 s0=4.2ms s1=195.4ms wait=0.1/48.0ms pred gate=device Token # 408: 3.809ms; value: next_token_ids=tensor([4374], device='cuda:0') mtp accept=1 prop=4374 top1=4374 accp=1.000 next=pair draft=1465 prop=1465 pred gate=device Token # 409: 117.777ms; value: next_token_ids=tensor([1465], device='cuda:0') mtp accept=1 prop=1465 top1=1465 accp=1.000 next=draft=13582 prop=13582 olap pair=112.4ms serial=199.1ms gain=86.7ms ratio=0.44 s0=6.0ms s1=193.1ms wait=0.2/45.9ms pred gate=device Token # 410: 3.782ms; value: next_token_ids=tensor([13582], device='cuda:0') mtp accept=1 prop=13582 top1=13582 accp=1.000 next=pair draft=21 prop=21 pred gate=device Token # 411: 116.652ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=draft=16 prop=16 olap pair=111.4ms serial=197.8ms gain=86.4ms ratio=0.44 s0=4.3ms s1=193.5ms wait=0.1/47.6ms pred gate=device Token # 412: 3.852ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 413: 117.356ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=223 prop=223 olap pair=112.1ms serial=197.8ms gain=85.7ms ratio=0.43 s0=4.3ms s1=193.6ms wait=0.1/48.1ms pred gate=device Token # 414: 3.843ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=36101 prop=36101 pred gate=device Token # 415: 118.320ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=1.000 next=draft=2971 prop=2971 olap pair=112.2ms serial=198.1ms gain=85.9ms ratio=0.43 s0=6.4ms s1=191.7ms wait=0.2/45.5ms pred gate=device Token # 416: 4.632ms; value: next_token_ids=tensor([2971], device='cuda:0') mtp accept=1 prop=2971 top1=2971 accp=1.000 next=pair draft=5866 prop=5866 pred gate=device Token # 417: 117.656ms; value: next_token_ids=tensor([5367], device='cuda:0') mtp accept=0 prop=5866 top1=5367 accp=0.167 next=draft=31826 prop=31826 olap pair=111.5ms serial=196.9ms gain=85.4ms ratio=0.43 s0=8.8ms s1=188.1ms wait=0.2/42.3ms pred gate=device Token # 418: 117.274ms; value: next_token_ids=tensor([1959], device='cuda:0') mtp accept=0 prop=31826 top1=1959 accp=0.160 next=draft=2971 prop=2971 olap pair=111.8ms serial=197.9ms gain=86.1ms ratio=0.44 s0=4.5ms s1=193.4ms wait=0.1/47.8ms pred gate=device Token # 419: 116.809ms; value: next_token_ids=tensor([2971], device='cuda:0') mtp accept=1 prop=2971 top1=2971 accp=0.785 next=draft=18317 prop=18317 olap pair=111.4ms serial=197.3ms gain=85.9ms ratio=0.44 s0=7.1ms s1=190.2ms wait=0.2/44.2ms pred gate=device Token # 420: 3.728ms; value: next_token_ids=tensor([18317], device='cuda:0') mtp accept=1 prop=18317 top1=18317 accp=1.000 next=pair draft=3968 prop=3968 pred gate=device Token # 421: 116.544ms; value: next_token_ids=tensor([3968], device='cuda:0') mtp accept=1 prop=3968 top1=3968 accp=0.744 next=draft=18452 prop=18452 olap pair=111.2ms serial=197.4ms gain=86.2ms ratio=0.44 s0=5.9ms s1=191.5ms wait=0.2/46.0ms pred gate=device Token # 422: 3.772ms; value: next_token_ids=tensor([18452], device='cuda:0') mtp accept=1 prop=18452 top1=18452 accp=1.000 next=pair draft=1316 prop=1316 pred gate=device Token # 423: 118.395ms; value: next_token_ids=tensor([1316], device='cuda:0') mtp accept=1 prop=1316 top1=1316 accp=1.000 next=draft=3613 prop=3613 olap pair=112.3ms serial=199.3ms gain=87.0ms ratio=0.44 s0=6.3ms s1=192.9ms wait=0.2/45.4ms pred gate=device Token # 424: 4.602ms; value: next_token_ids=tensor([3613], device='cuda:0') mtp accept=1 prop=3613 top1=3613 accp=1.000 next=pair draft=867 prop=867 pred gate=device Token # 425: 117.009ms; value: next_token_ids=tensor([970], device='cuda:0') mtp accept=0 prop=867 top1=970 accp=0.041 next=draft=59708 prop=59708 olap pair=111.6ms serial=197.8ms gain=86.2ms ratio=0.44 s0=4.5ms s1=193.2ms wait=0.1/47.8ms pred gate=device Token # 426: 116.553ms; value: next_token_ids=tensor([867], device='cuda:0') mtp accept=0 prop=59708 top1=867 accp=0.166 next=draft=13332 prop=16320 olap pair=111.2ms serial=197.8ms gain=86.6ms ratio=0.44 s0=4.0ms s1=193.9ms wait=0.1/48.3ms pred gate=device Token # 427: 116.549ms; value: next_token_ids=tensor([16320], device='cuda:0') mtp accept=1 prop=16320 top1=13332 accp=0.695 next=draft=303 prop=303 olap pair=111.2ms serial=197.9ms gain=86.7ms ratio=0.44 s0=3.8ms s1=194.1ms wait=0.1/48.6ms pred gate=device Token # 428: 3.782ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=1877 prop=1877 pred gate=device Token # 429: 116.782ms; value: next_token_ids=tensor([1877], device='cuda:0') mtp accept=1 prop=1877 top1=1877 accp=1.000 next=draft=34221 prop=34221 olap pair=111.5ms serial=197.8ms gain=86.3ms ratio=0.44 s0=4.1ms s1=193.7ms wait=0.1/48.1ms pred gate=device Token # 430: 3.734ms; value: next_token_ids=tensor([34221], device='cuda:0') mtp accept=1 prop=34221 top1=34221 accp=0.999 next=pair draft=1057 prop=1057 pred gate=device Token # 431: 116.967ms; value: next_token_ids=tensor([428], device='cuda:0') mtp accept=0 prop=1057 top1=428 accp=0.002 next=draft=112016 prop=112016 olap pair=111.6ms serial=197.2ms gain=85.5ms ratio=0.43 s0=4.2ms s1=193.0ms wait=0.1/48.1ms pred gate=device Token # 432: 117.713ms; value: next_token_ids=tensor([112016], device='cuda:0') mtp accept=1 prop=112016 top1=112016 accp=1.000 next=draft=24268 prop=24268 olap pair=112.3ms serial=199.5ms gain=87.1ms ratio=0.44 s0=4.2ms s1=195.3ms wait=0.1/47.7ms pred gate=device Token # 433: 3.697ms; value: next_token_ids=tensor([24268], device='cuda:0') mtp accept=1 prop=24268 top1=24268 accp=1.000 next=pair draft=26663 prop=26663 pred gate=device Token # 434: 116.089ms; value: next_token_ids=tensor([26663], device='cuda:0') mtp accept=1 prop=26663 top1=26663 accp=1.000 next=draft=114131 prop=114131 olap pair=110.8ms serial=197.0ms gain=86.1ms ratio=0.44 s0=5.1ms s1=191.9ms wait=0.1/46.7ms pred gate=device Token # 435: 3.792ms; value: next_token_ids=tensor([32652], device='cuda:0') mtp accept=0 prop=114131 top1=32652 accp=0.129 next=pair draft=6221 prop=6221 pred gate=device Token # 436: 116.651ms; value: next_token_ids=tensor([6221], device='cuda:0') mtp accept=1 prop=6221 top1=6221 accp=1.000 next=draft=2073 prop=2073 olap pair=111.4ms serial=197.9ms gain=86.6ms ratio=0.44 s0=4.3ms s1=193.6ms wait=0.1/47.4ms pred gate=device Token # 437: 3.754ms; value: next_token_ids=tensor([2073], device='cuda:0') mtp accept=1 prop=2073 top1=2073 accp=1.000 next=pair draft=3515 prop=3515 pred gate=device Token # 438: 117.087ms; value: next_token_ids=tensor([3515], device='cuda:0') mtp accept=1 prop=3515 top1=3515 accp=0.999 next=draft=30305 prop=4427 olap pair=111.4ms serial=196.5ms gain=85.1ms ratio=0.43 s0=7.6ms s1=188.9ms wait=0.2/43.9ms pred gate=device Token # 439: 3.821ms; value: next_token_ids=tensor([38304], device='cuda:0') mtp accept=0 prop=4427 top1=30305 accp=0.438 next=pair draft=71836 prop=71836 pred gate=device Token # 440: 115.839ms; value: next_token_ids=tensor([71836], device='cuda:0') mtp accept=1 prop=71836 top1=71836 accp=1.000 next=draft=2823 prop=2823 olap pair=110.5ms serial=196.4ms gain=85.9ms ratio=0.44 s0=4.0ms s1=192.4ms wait=0.1/48.2ms pred gate=device Token # 441: 3.769ms; value: next_token_ids=tensor([2823], device='cuda:0') mtp accept=1 prop=2823 top1=2823 accp=0.954 next=pair draft=12543 prop=12543 pred gate=device Token # 442: 116.755ms; value: next_token_ids=tensor([88618], device='cuda:0') mtp accept=0 prop=12543 top1=12543 accp=0.763 next=draft=303 prop=303 olap pair=111.5ms serial=198.4ms gain=86.9ms ratio=0.44 s0=3.8ms s1=194.6ms wait=0.1/48.6ms pred gate=device Token # 443: 116.590ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.979 next=draft=8563 prop=8563 olap pair=111.2ms serial=197.9ms gain=86.6ms ratio=0.44 s0=4.0ms s1=193.9ms wait=0.1/48.2ms pred gate=device Token # 444: 3.764ms; value: next_token_ids=tensor([8563], device='cuda:0') mtp accept=1 prop=8563 top1=8563 accp=0.857 next=pair draft=10277 prop=10277 pred gate=device Token # 445: 117.123ms; value: next_token_ids=tensor([10277], device='cuda:0') mtp accept=1 prop=10277 top1=10277 accp=0.950 next=draft=54906 prop=54906 olap pair=111.8ms serial=197.1ms gain=85.2ms ratio=0.43 s0=6.5ms s1=190.6ms wait=0.2/45.2ms pred gate=device Token # 446: 3.748ms; value: next_token_ids=tensor([54906], device='cuda:0') mtp accept=1 prop=54906 top1=54906 accp=1.000 next=pair draft=116863 prop=116863 pred gate=device Token # 447: 116.245ms; value: next_token_ids=tensor([116863], device='cuda:0') mtp accept=1 prop=116863 top1=116863 accp=0.999 next=draft=36101 prop=36101 olap pair=111.0ms serial=197.1ms gain=86.1ms ratio=0.44 s0=4.4ms s1=192.7ms wait=0.1/47.5ms pred gate=device Token # 448: 3.787ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=0.517 next=pair draft=3500 prop=3500 pred gate=device Token # 449: 117.199ms; value: next_token_ids=tensor([3500], device='cuda:0') mtp accept=1 prop=3500 top1=3500 accp=0.999 next=draft=303 prop=303 olap pair=111.8ms serial=198.1ms gain=86.3ms ratio=0.44 s0=5.6ms s1=192.6ms wait=0.1/46.4ms pred gate=device Token # 450: 3.794ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.794 next=pair draft=1380 prop=1380 pred gate=device Token # 451: 117.175ms; value: next_token_ids=tensor([1380], device='cuda:0') mtp accept=1 prop=1380 top1=1380 accp=0.873 next=draft=6032 prop=6032 olap pair=111.1ms serial=196.1ms gain=85.0ms ratio=0.43 s0=5.7ms s1=190.4ms wait=0.1/46.4ms pred gate=device Token # 452: 4.560ms; value: next_token_ids=tensor([6032], device='cuda:0') mtp accept=1 prop=6032 top1=6032 accp=0.903 next=pair draft=580 prop=580 pred gate=device Token # 453: 116.450ms; value: next_token_ids=tensor([580], device='cuda:0') mtp accept=1 prop=580 top1=580 accp=0.501 next=draft=984 prop=984 olap pair=111.0ms serial=196.5ms gain=85.5ms ratio=0.43 s0=8.0ms s1=188.5ms wait=0.2/43.2ms pred gate=device Token # 454: 3.793ms; value: next_token_ids=tensor([653], device='cuda:0') mtp accept=0 prop=984 top1=653 accp=0.028 next=pair draft=8093 prop=8093 pred gate=device Token # 455: 117.535ms; value: next_token_ids=tensor([8093], device='cuda:0') mtp accept=1 prop=8093 top1=8093 accp=1.000 next=draft=6742 prop=6742 olap pair=111.3ms serial=197.4ms gain=86.0ms ratio=0.44 s0=5.5ms s1=191.9ms wait=0.1/46.5ms pred gate=device Token # 456: 4.403ms; value: next_token_ids=tensor([6742], device='cuda:0') mtp accept=1 prop=6742 top1=6742 accp=1.000 next=pair draft=8738 prop=8738 pred gate=device Token # 457: 116.247ms; value: next_token_ids=tensor([8738], device='cuda:0') mtp accept=1 prop=8738 top1=8738 accp=1.000 next=draft=429 prop=429 olap pair=111.0ms serial=196.0ms gain=85.0ms ratio=0.43 s0=6.3ms s1=189.7ms wait=0.2/45.3ms pred gate=device Token # 458: 3.768ms; value: next_token_ids=tensor([429], device='cuda:0') mtp accept=1 prop=429 top1=429 accp=1.000 next=pair draft=30869 prop=30869 pred gate=device Token # 459: 116.473ms; value: next_token_ids=tensor([30869], device='cuda:0') mtp accept=1 prop=30869 top1=30869 accp=1.000 next=draft=22651 prop=22651 olap pair=111.2ms serial=197.7ms gain=86.6ms ratio=0.44 s0=3.8ms s1=193.9ms wait=0.1/48.6ms pred gate=device Token # 460: 3.838ms; value: next_token_ids=tensor([22651], device='cuda:0') mtp accept=1 prop=22651 top1=22651 accp=0.998 next=pair draft=4374 prop=4374 pred gate=device Token # 461: 117.441ms; value: next_token_ids=tensor([4374], device='cuda:0') mtp accept=1 prop=4374 top1=4374 accp=1.000 next=draft=1465 prop=1465 olap pair=111.3ms serial=197.7ms gain=86.3ms ratio=0.44 s0=4.3ms s1=193.4ms wait=0.1/48.1ms pred gate=device Token # 462: 4.757ms; value: next_token_ids=tensor([1465], device='cuda:0') mtp accept=1 prop=1465 top1=1465 accp=1.000 next=pair draft=13582 prop=13582 pred gate=device Token # 463: 116.276ms; value: next_token_ids=tensor([13582], device='cuda:0') mtp accept=1 prop=13582 top1=13582 accp=1.000 next=draft=21 prop=21 olap pair=110.9ms serial=196.7ms gain=85.9ms ratio=0.44 s0=5.8ms s1=191.0ms wait=0.2/46.2ms pred gate=device Token # 464: 3.775ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 465: 116.419ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=20 prop=20 olap pair=110.3ms serial=195.1ms gain=84.8ms ratio=0.43 s0=7.4ms s1=187.7ms wait=0.2/43.9ms pred gate=device Token # 466: 4.586ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=478 prop=478 pred gate=device Token # 467: 116.554ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=478 top1=223 accp=0.020 next=draft=8842 prop=8842 olap pair=110.2ms serial=195.1ms gain=84.8ms ratio=0.43 s0=7.1ms s1=187.9ms wait=0.2/44.6ms pred gate=device Token # 468: 117.549ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=1.000 next=draft=478 prop=478 olap pair=112.0ms serial=197.9ms gain=85.9ms ratio=0.43 s0=6.3ms s1=191.7ms wait=0.2/45.7ms pred gate=device Token # 469: 3.736ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=pair draft=666 prop=666 pred gate=device Token # 470: 116.361ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=7346 prop=7346 olap pair=111.0ms serial=197.3ms gain=86.3ms ratio=0.44 s0=3.8ms s1=193.4ms wait=0.1/48.6ms pred gate=device Token # 471: 3.740ms; value: next_token_ids=tensor([1735], device='cuda:0') mtp accept=0 prop=7346 top1=1735 accp=0.278 next=pair draft=938 prop=938 pred gate=device Token # 472: 117.236ms; value: next_token_ids=tensor([938], device='cuda:0') mtp accept=1 prop=938 top1=938 accp=1.000 next=draft=8563 prop=8563 olap pair=111.9ms serial=198.3ms gain=86.4ms ratio=0.44 s0=5.9ms s1=192.4ms wait=0.2/45.9ms pred gate=device Token # 473: 3.761ms; value: next_token_ids=tensor([8563], device='cuda:0') mtp accept=1 prop=8563 top1=8563 accp=1.000 next=pair draft=11049 prop=11049 pred gate=device Token # 474: 117.670ms; value: next_token_ids=tensor([11049], device='cuda:0') mtp accept=1 prop=11049 top1=11049 accp=1.000 next=draft=5866 prop=5866 olap pair=111.6ms serial=197.6ms gain=86.0ms ratio=0.44 s0=7.5ms s1=190.1ms wait=0.2/44.0ms pred gate=device Token # 475: 4.601ms; value: next_token_ids=tensor([5866], device='cuda:0') mtp accept=1 prop=5866 top1=5866 accp=1.000 next=pair draft=12145 prop=12145 pred gate=device Token # 476: 117.343ms; value: next_token_ids=tensor([2541], device='cuda:0') mtp accept=0 prop=12145 top1=2541 accp=0.401 next=draft=223 prop=223 olap pair=111.6ms serial=197.5ms gain=85.9ms ratio=0.43 s0=8.3ms s1=189.2ms wait=0.2/42.8ms pred gate=device Token # 477: 116.800ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=0 prop=223 top1=223 accp=0.813 next=draft=19 prop=19 olap pair=111.4ms serial=197.4ms gain=86.0ms ratio=0.44 s0=4.1ms s1=193.3ms wait=0.1/48.1ms pred gate=device Token # 478: 116.925ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=2863 prop=2863 olap pair=111.6ms serial=197.2ms gain=85.7ms ratio=0.43 s0=5.7ms s1=191.5ms wait=0.1/46.4ms pred gate=device Token # 479: 3.774ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=2863 top1=223 accp=0.438 next=pair draft=2353 prop=2353 pred gate=device Token # 480: 116.065ms; value: next_token_ids=tensor([2353], device='cuda:0') mtp accept=1 prop=2353 top1=2353 accp=0.996 next=draft=26348 prop=26348 olap pair=110.7ms serial=196.8ms gain=86.1ms ratio=0.44 s0=3.9ms s1=192.9ms wait=0.1/48.4ms pred gate=device Token # 481: 3.782ms; value: next_token_ids=tensor([26348], device='cuda:0') mtp accept=1 prop=26348 top1=26348 accp=1.000 next=pair draft=58 prop=58 pred gate=device Token # 482: 117.385ms; value: next_token_ids=tensor([58], device='cuda:0') mtp accept=1 prop=58 top1=58 accp=1.000 next=draft=223 prop=223 olap pair=112.1ms serial=198.9ms gain=86.8ms ratio=0.44 s0=4.7ms s1=194.2ms wait=0.1/47.4ms pred gate=device Token # 483: 3.790ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=24091 prop=24091 pred gate=device Token # 484: 116.947ms; value: next_token_ids=tensor([24091], device='cuda:0') mtp accept=1 prop=24091 top1=24091 accp=1.000 next=draft=18 prop=18 olap pair=111.7ms serial=198.0ms gain=86.3ms ratio=0.44 s0=4.1ms s1=193.9ms wait=0.1/48.2ms pred gate=device Token # 485: 3.684ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=pair draft=666 prop=666 pred gate=device Token # 486: 116.379ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=0 prop=666 top1=940 accp=0.453 next=draft=223 prop=223 olap pair=111.1ms serial=197.4ms gain=86.3ms ratio=0.44 s0=4.8ms s1=192.6ms wait=0.1/47.4ms pred gate=device Token # 487: 116.587ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=2111 prop=2111 olap pair=111.3ms serial=198.0ms gain=86.8ms ratio=0.44 s0=3.8ms s1=194.2ms wait=0.1/48.6ms pred gate=device Token # 488: 3.737ms; value: next_token_ids=tensor([2111], device='cuda:0') mtp accept=1 prop=2111 top1=2111 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 489: 116.744ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=2426 prop=2426 olap pair=111.4ms serial=198.4ms gain=86.9ms ratio=0.44 s0=3.9ms s1=194.5ms wait=0.1/48.4ms pred gate=device Token # 490: 3.698ms; value: next_token_ids=tensor([2426], device='cuda:0') mtp accept=1 prop=2426 top1=2426 accp=1.000 next=pair draft=38775 prop=38775 pred gate=device Token # 491: 117.127ms; value: next_token_ids=tensor([38775], device='cuda:0') mtp accept=1 prop=38775 top1=38775 accp=1.000 next=draft=471 prop=471 olap pair=111.1ms serial=196.7ms gain=85.6ms ratio=0.44 s0=6.9ms s1=189.8ms wait=0.2/44.7ms pred gate=device Token # 492: 3.834ms; value: next_token_ids=tensor([471], device='cuda:0') mtp accept=1 prop=471 top1=471 accp=1.000 next=pair draft=1457 prop=1457 pred gate=device Token # 493: 116.596ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=draft=666 prop=223 olap pair=111.3ms serial=197.5ms gain=86.2ms ratio=0.44 s0=5.1ms s1=192.4ms wait=0.1/47.0ms pred gate=device Token # 494: 3.755ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.653 next=pair draft=844 prop=844 pred gate=device Token # 495: 115.643ms; value: next_token_ids=tensor([844], device='cuda:0') mtp accept=1 prop=844 top1=844 accp=0.733 next=draft=41727 prop=41727 olap pair=110.4ms serial=196.4ms gain=86.0ms ratio=0.44 s0=3.8ms s1=192.6ms wait=0.1/48.5ms pred gate=device Token # 496: 3.767ms; value: next_token_ids=tensor([41727], device='cuda:0') mtp accept=1 prop=41727 top1=41727 accp=0.935 next=pair draft=666 prop=666 pred gate=device Token # 497: 116.280ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=303 prop=303 olap pair=111.0ms serial=197.4ms gain=86.4ms ratio=0.44 s0=3.8ms s1=193.6ms wait=0.1/48.7ms pred gate=device Token # 498: 3.736ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=8738 prop=8738 pred gate=device Token # 499: 115.912ms; value: next_token_ids=tensor([8738], device='cuda:0') mtp accept=1 prop=8738 top1=8738 accp=0.997 next=draft=2619 prop=2619 olap pair=110.6ms serial=196.7ms gain=86.1ms ratio=0.44 s0=3.8ms s1=192.9ms wait=0.1/48.6ms pred gate=device Token # 500: 3.770ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.982 next=pair draft=53091 prop=53091 pred gate=device Token # 501: 117.010ms; value: next_token_ids=tensor([53091], device='cuda:0') mtp accept=1 prop=53091 top1=53091 accp=0.995 next=draft=4374 prop=4374 olap pair=111.7ms serial=198.9ms gain=87.2ms ratio=0.44 s0=3.8ms s1=195.1ms wait=0.1/48.6ms pred gate=device Token # 502: 3.823ms; value: next_token_ids=tensor([4374], device='cuda:0') mtp accept=1 prop=4374 top1=4374 accp=1.000 next=pair draft=1465 prop=1465 pred gate=device Token # 503: 116.525ms; value: next_token_ids=tensor([1465], device='cuda:0') mtp accept=1 prop=1465 top1=1465 accp=1.000 next=draft=13582 prop=13582 olap pair=111.3ms serial=198.0ms gain=86.7ms ratio=0.44 s0=3.8ms s1=194.2ms wait=0.1/48.6ms pred gate=device Token # 504: 3.789ms; value: next_token_ids=tensor([13582], device='cuda:0') mtp accept=1 prop=13582 top1=13582 accp=1.000 next=pair draft=21 prop=21 pred gate=device Token # 505: 116.271ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=draft=16 prop=16 olap pair=111.1ms serial=197.6ms gain=86.5ms ratio=0.44 s0=3.8ms s1=193.8ms wait=0.1/48.6ms pred gate=device Token # 506: 3.815ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 507: 116.570ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=1237 prop=1237 olap pair=111.3ms serial=197.9ms gain=86.6ms ratio=0.44 s0=3.8ms s1=194.1ms wait=0.1/48.4ms pred gate=device Token # 508: 3.758ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=0.993 next=pair draft=984 prop=28769 pred gate=device Token # 509: 117.189ms; value: next_token_ids=tensor([28769], device='cuda:0') mtp accept=1 prop=28769 top1=28769 accp=0.390 next=draft=36 prop=36 olap pair=111.8ms serial=197.5ms gain=85.7ms ratio=0.43 s0=4.2ms s1=193.3ms wait=0.1/48.0ms pred gate=device Token # 510: 3.804ms; value: next_token_ids=tensor([36], device='cuda:0') mtp accept=1 prop=36 top1=36 accp=0.999 next=pair draft=223 prop=223 pred gate=device Token # 511: 117.318ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=10602 prop=10602 olap pair=111.9ms serial=197.2ms gain=85.2ms ratio=0.43 s0=4.4ms s1=192.8ms wait=0.1/47.9ms pred gate=device Token # 512: 3.784ms; value: next_token_ids=tensor([10602], device='cuda:0') mtp accept=1 prop=10602 top1=10602 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 513: 116.813ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=12519 prop=12519 olap pair=111.3ms serial=197.2ms gain=85.9ms ratio=0.44 s0=4.2ms s1=193.0ms wait=0.1/48.2ms pred gate=device Token # 514: 3.778ms; value: next_token_ids=tensor([12519], device='cuda:0') mtp accept=1 prop=12519 top1=3910 accp=0.438 next=pair draft=373 prop=373 pred gate=device Token # 515: 117.767ms; value: next_token_ids=tensor([373], device='cuda:0') mtp accept=1 prop=373 top1=373 accp=1.000 next=draft=8835 prop=8835 olap pair=111.7ms serial=197.4ms gain=85.7ms ratio=0.43 s0=8.1ms s1=189.4ms wait=0.2/43.3ms pred gate=device Token # 516: 4.192ms; value: next_token_ids=tensor([8835], device='cuda:0') mtp accept=1 prop=8835 top1=8835 accp=1.000 next=pair draft=2578 prop=223 pred gate=device Token # 517: 117.161ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.465 next=draft=38229 prop=38229 olap pair=111.8ms serial=197.6ms gain=85.8ms ratio=0.43 s0=6.7ms s1=190.9ms wait=0.2/45.3ms pred gate=device Token # 518: 3.760ms; value: next_token_ids=tensor([15158], device='cuda:0') mtp accept=0 prop=38229 top1=15158 accp=0.137 next=pair draft=1564 prop=1564 pred gate=device Token # 519: 117.149ms; value: next_token_ids=tensor([1564], device='cuda:0') mtp accept=1 prop=1564 top1=1564 accp=1.000 next=draft=1227 prop=1227 olap pair=111.8ms serial=198.1ms gain=86.3ms ratio=0.44 s0=5.0ms s1=193.2ms wait=0.1/47.4ms pred gate=device Token # 520: 3.745ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=1 prop=1227 top1=1227 accp=1.000 next=pair draft=666 prop=666 pred gate=device Token # 521: 118.428ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=320 prop=320 olap pair=112.3ms serial=198.6ms gain=86.3ms ratio=0.43 s0=4.3ms s1=194.3ms wait=0.1/48.3ms pred gate=device Token # 522: 3.890ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=85566 prop=85566 pred gate=device Token # 523: 117.345ms; value: next_token_ids=tensor([85566], device='cuda:0') mtp accept=1 prop=85566 top1=85566 accp=1.000 next=draft=5075 prop=5075 olap pair=112.1ms serial=198.6ms gain=86.5ms ratio=0.44 s0=4.2ms s1=194.5ms wait=0.1/48.2ms pred gate=device Token # 524: 3.720ms; value: next_token_ids=tensor([43080], device='cuda:0') mtp accept=0 prop=5075 top1=43080 accp=0.001 next=pair draft=2619 prop=2619 pred gate=device Token # 525: 117.031ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=2619 top1=223 accp=0.676 next=draft=5769 prop=5769 olap pair=111.7ms serial=198.7ms gain=87.0ms ratio=0.44 s0=4.1ms s1=194.6ms wait=0.1/48.2ms pred gate=device Token # 526: 116.284ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=1.000 next=draft=22 prop=22 olap pair=110.8ms serial=196.8ms gain=86.0ms ratio=0.44 s0=4.0ms s1=192.9ms wait=0.1/48.4ms pred gate=device Token # 527: 3.739ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=pair draft=17840 prop=17840 pred gate=device Token # 528: 117.865ms; value: next_token_ids=tensor([17840], device='cuda:0') mtp accept=1 prop=17840 top1=17840 accp=0.982 next=draft=223 prop=223 olap pair=111.8ms serial=198.4ms gain=86.5ms ratio=0.44 s0=4.8ms s1=193.6ms wait=0.1/47.5ms pred gate=device Token # 529: 4.219ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=14171 prop=14171 pred gate=device Token # 530: 118.089ms; value: next_token_ids=tensor([14171], device='cuda:0') mtp accept=1 prop=14171 top1=14171 accp=0.979 next=draft=6533 prop=6533 olap pair=111.9ms serial=197.9ms gain=86.0ms ratio=0.43 s0=8.7ms s1=189.2ms wait=0.2/42.4ms pred gate=device Token # 531: 4.589ms; value: next_token_ids=tensor([6533], device='cuda:0') mtp accept=1 prop=6533 top1=6533 accp=0.590 next=pair draft=525 prop=525 pred gate=device Token # 532: 116.665ms; value: next_token_ids=tensor([525], device='cuda:0') mtp accept=1 prop=525 top1=525 accp=1.000 next=draft=1237 prop=1237 olap pair=111.2ms serial=197.8ms gain=86.6ms ratio=0.44 s0=4.1ms s1=193.7ms wait=0.1/48.3ms pred gate=device Token # 533: 3.744ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=36132 prop=36132 pred gate=device Token # 534: 116.360ms; value: next_token_ids=tensor([15150], device='cuda:0') mtp accept=0 prop=36132 top1=15150 accp=0.387 next=draft=4055 prop=4055 olap pair=111.1ms serial=197.5ms gain=86.4ms ratio=0.44 s0=3.8ms s1=193.7ms wait=0.1/48.6ms pred gate=device Token # 535: 117.605ms; value: next_token_ids=tensor([4055], device='cuda:0') mtp accept=1 prop=4055 top1=4055 accp=1.000 next=draft=2284 prop=2284 olap pair=111.5ms serial=197.7ms gain=86.2ms ratio=0.44 s0=5.4ms s1=192.3ms wait=0.1/46.5ms pred gate=device Token # 536: 4.648ms; value: next_token_ids=tensor([2284], device='cuda:0') mtp accept=1 prop=2284 top1=2284 accp=0.550 next=pair draft=7163 prop=7163 pred gate=device Token # 537: 117.286ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.976 next=draft=27521 prop=27521 olap pair=111.8ms serial=197.3ms gain=85.4ms ratio=0.43 s0=5.9ms s1=191.3ms wait=0.2/45.7ms pred gate=device Token # 538: 3.734ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 539: 116.788ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=draft=223 prop=223 olap pair=111.5ms serial=198.0ms gain=86.5ms ratio=0.44 s0=5.0ms s1=193.0ms wait=0.1/46.9ms pred gate=device Token # 540: 3.847ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=0 prop=223 top1=389 accp=0.019 next=pair draft=1703 prop=1703 pred gate=device Token # 541: 118.062ms; value: next_token_ids=tensor([1703], device='cuda:0') mtp accept=1 prop=1703 top1=1703 accp=1.000 next=draft=996 prop=996 olap pair=111.8ms serial=197.9ms gain=86.1ms ratio=0.44 s0=5.4ms s1=192.5ms wait=0.1/46.8ms pred gate=device Token # 542: 4.745ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=pair draft=3467 prop=3467 pred gate=device Token # 543: 116.904ms; value: next_token_ids=tensor([3467], device='cuda:0') mtp accept=1 prop=3467 top1=3467 accp=1.000 next=draft=4231 prop=1148 olap pair=111.5ms serial=198.2ms gain=86.6ms ratio=0.44 s0=5.1ms s1=193.1ms wait=0.1/46.8ms pred gate=device Token # 544: 3.782ms; value: next_token_ids=tensor([4231], device='cuda:0') mtp accept=0 prop=1148 top1=4231 accp=0.956 next=pair draft=9047 prop=9047 pred gate=device Token # 545: 117.803ms; value: next_token_ids=tensor([66], device='cuda:0') mtp accept=0 prop=9047 top1=66 accp=0.016 next=draft=9047 prop=9047 olap pair=111.5ms serial=194.8ms gain=83.2ms ratio=0.43 s0=8.6ms s1=186.1ms wait=0.2/42.5ms pred gate=device Token # 546: 117.466ms; value: next_token_ids=tensor([9047], device='cuda:0') mtp accept=1 prop=9047 top1=9047 accp=1.000 next=draft=36666 prop=36666 olap pair=111.9ms serial=198.3ms gain=86.4ms ratio=0.44 s0=7.0ms s1=191.3ms wait=0.2/44.6ms pred gate=device Token # 547: 3.804ms; value: next_token_ids=tensor([36666], device='cuda:0') mtp accept=1 prop=36666 top1=36666 accp=1.000 next=pair draft=45834 prop=45834 pred gate=device Token # 548: 116.435ms; value: next_token_ids=tensor([45834], device='cuda:0') mtp accept=1 prop=45834 top1=45834 accp=1.000 next=draft=31 prop=31 olap pair=111.2ms serial=197.9ms gain=86.7ms ratio=0.44 s0=3.8ms s1=194.0ms wait=0.1/48.6ms pred gate=device Token # 549: 3.928ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=0.999 next=pair draft=11154 prop=11154 pred gate=device Token # 550: 117.998ms; value: next_token_ids=tensor([11154], device='cuda:0') mtp accept=1 prop=11154 top1=11154 accp=1.000 next=draft=26 prop=26 olap pair=111.8ms serial=197.9ms gain=86.0ms ratio=0.43 s0=6.3ms s1=191.6ms wait=0.2/45.6ms pred gate=device Token # 551: 4.632ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=pair draft=66 prop=66 pred gate=device Token # 552: 117.467ms; value: next_token_ids=tensor([66], device='cuda:0') mtp accept=1 prop=66 top1=66 accp=1.000 next=draft=126691 prop=126691 olap pair=111.3ms serial=196.7ms gain=85.4ms ratio=0.43 s0=6.9ms s1=189.8ms wait=0.2/44.6ms pred gate=device Token # 553: 4.702ms; value: next_token_ids=tensor([7417], device='cuda:0') mtp accept=0 prop=126691 top1=7417 accp=0.258 next=pair draft=10861 prop=10861 pred gate=device Token # 554: 118.841ms; value: next_token_ids=tensor([10861], device='cuda:0') mtp accept=1 prop=10861 top1=10861 accp=0.995 next=draft=9270 prop=5843 olap pair=112.5ms serial=196.6ms gain=84.2ms ratio=0.43 s0=8.7ms s1=188.0ms wait=0.2/42.4ms pred gate=device Token # 555: 6.178ms; value: next_token_ids=tensor([9270], device='cuda:0') mtp accept=0 prop=5843 top1=9270 accp=0.605 next=pair draft=5189 prop=5189 pred gate=device Token # 556: 117.687ms; value: next_token_ids=tensor([5189], device='cuda:0') mtp accept=1 prop=5189 top1=5189 accp=1.000 next=draft=94 prop=94 olap pair=112.3ms serial=198.1ms gain=85.9ms ratio=0.43 s0=5.7ms s1=192.5ms wait=0.1/45.9ms pred gate=device Token # 557: 3.766ms; value: next_token_ids=tensor([94], device='cuda:0') mtp accept=1 prop=94 top1=94 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 558: 116.777ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=10909 prop=10909 olap pair=111.5ms serial=197.7ms gain=86.2ms ratio=0.44 s0=6.7ms s1=190.9ms wait=0.2/44.9ms pred gate=device Token # 559: 3.741ms; value: next_token_ids=tensor([10909], device='cuda:0') mtp accept=1 prop=10909 top1=10909 accp=1.000 next=pair draft=369 prop=369 pred gate=device Token # 560: 116.598ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=0.999 next=draft=223 prop=223 olap pair=111.3ms serial=197.5ms gain=86.2ms ratio=0.44 s0=4.2ms s1=193.3ms wait=0.1/48.1ms pred gate=device Token # 561: 3.852ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=24292 prop=24292 pred gate=device Token # 562: 117.754ms; value: next_token_ids=tensor([24292], device='cuda:0') mtp accept=1 prop=24292 top1=24292 accp=1.000 next=draft=7640 prop=7640 olap pair=112.6ms serial=199.7ms gain=87.1ms ratio=0.44 s0=4.1ms s1=195.5ms wait=0.1/48.1ms pred gate=device Token # 563: 3.732ms; value: next_token_ids=tensor([7640], device='cuda:0') mtp accept=1 prop=7640 top1=7640 accp=1.000 next=pair draft=94 prop=94 pred gate=device Token # 564: 117.101ms; value: next_token_ids=tensor([94], device='cuda:0') mtp accept=1 prop=94 top1=94 accp=1.000 next=draft=1313 prop=1313 olap pair=111.0ms serial=196.2ms gain=85.1ms ratio=0.43 s0=7.3ms s1=188.9ms wait=0.2/44.4ms pred gate=device Token # 565: 4.646ms; value: next_token_ids=tensor([1313], device='cuda:0') mtp accept=1 prop=1313 top1=1313 accp=1.000 next=pair draft=5013 prop=5013 pred gate=device Token # 566: 116.853ms; value: next_token_ids=tensor([5013], device='cuda:0') mtp accept=1 prop=5013 top1=5013 accp=1.000 next=draft=369 prop=369 olap pair=111.3ms serial=197.5ms gain=86.2ms ratio=0.44 s0=5.0ms s1=192.5ms wait=0.1/46.6ms pred gate=device Token # 567: 3.815ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=1.000 next=pair draft=1313 prop=1313 pred gate=device Token # 568: 117.856ms; value: next_token_ids=tensor([1313], device='cuda:0') mtp accept=1 prop=1313 top1=1313 accp=1.000 next=draft=5013 prop=5013 olap pair=112.1ms serial=198.9ms gain=86.8ms ratio=0.44 s0=5.5ms s1=193.4ms wait=0.1/46.1ms pred gate=device Token # 569: 3.849ms; value: next_token_ids=tensor([5013], device='cuda:0') mtp accept=1 prop=5013 top1=5013 accp=1.000 next=pair draft=7640 prop=7640 pred gate=device Token # 570: 117.712ms; value: next_token_ids=tensor([7640], device='cuda:0') mtp accept=1 prop=7640 top1=7640 accp=1.000 next=draft=94 prop=94 olap pair=111.6ms serial=197.7ms gain=86.1ms ratio=0.44 s0=4.5ms s1=193.2ms wait=0.1/47.4ms pred gate=device Token # 571: 4.561ms; value: next_token_ids=tensor([94], device='cuda:0') mtp accept=1 prop=94 top1=94 accp=1.000 next=pair draft=2619 prop=2619 pred gate=device Token # 572: 117.348ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=draft=93130 prop=93130 olap pair=111.8ms serial=197.2ms gain=85.4ms ratio=0.43 s0=5.7ms s1=191.5ms wait=0.1/46.4ms pred gate=device Token # 573: 3.775ms; value: next_token_ids=tensor([93130], device='cuda:0') mtp accept=1 prop=93130 top1=93130 accp=1.000 next=pair draft=90974 prop=90974 pred gate=device Token # 574: 116.692ms; value: next_token_ids=tensor([90974], device='cuda:0') mtp accept=1 prop=90974 top1=90974 accp=1.000 next=draft=666 prop=666 olap pair=111.4ms serial=197.9ms gain=86.5ms ratio=0.44 s0=3.9ms s1=194.0ms wait=0.1/48.3ms pred gate=device Token # 575: 3.753ms; value: next_token_ids=tensor([1121], device='cuda:0') mtp accept=0 prop=666 top1=1121 accp=0.021 next=pair draft=666 prop=666 pred gate=device Token # 576: 117.177ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=369 prop=369 olap pair=111.8ms serial=199.0ms gain=87.3ms ratio=0.44 s0=3.8ms s1=195.3ms wait=0.1/48.6ms pred gate=device Token # 577: 3.713ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=1.000 next=pair draft=2619 prop=2619 pred gate=device Token # 578: 116.707ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=2619 top1=223 accp=0.159 next=draft=856 prop=856 olap pair=110.6ms serial=195.6ms gain=85.0ms ratio=0.43 s0=6.5ms s1=189.2ms wait=0.2/45.4ms pred gate=device Token # 579: 116.509ms; value: next_token_ids=tensor([856], device='cuda:0') mtp accept=1 prop=856 top1=856 accp=1.000 next=draft=16 prop=16 olap pair=110.9ms serial=197.0ms gain=86.0ms ratio=0.44 s0=4.9ms s1=192.0ms wait=0.1/47.4ms pred gate=device Token # 580: 3.763ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=pair draft=14882 prop=14882 pred gate=device Token # 581: 116.967ms; value: next_token_ids=tensor([14882], device='cuda:0') mtp accept=1 prop=14882 top1=14882 accp=1.000 next=draft=28828 prop=17840 olap pair=111.7ms serial=198.8ms gain=87.1ms ratio=0.44 s0=3.8ms s1=195.0ms wait=0.1/48.6ms pred gate=device Token # 582: 3.738ms; value: next_token_ids=tensor([17840], device='cuda:0') mtp accept=1 prop=17840 top1=17840 accp=0.366 next=pair draft=2283 prop=2283 pred gate=device Token # 583: 116.952ms; value: next_token_ids=tensor([17], device='cuda:0') mtp accept=0 prop=2283 top1=17 accp=0.001 next=draft=11274 prop=11274 olap pair=111.4ms serial=196.5ms gain=85.1ms ratio=0.43 s0=8.1ms s1=188.4ms wait=0.2/43.3ms pred gate=device Token # 584: 117.701ms; value: next_token_ids=tensor([11274], device='cuda:0') mtp accept=1 prop=11274 top1=11274 accp=1.000 next=draft=7640 prop=7640 olap pair=111.6ms serial=196.1ms gain=84.5ms ratio=0.43 s0=6.9ms s1=189.2ms wait=0.2/45.1ms pred gate=device Token # 585: 4.660ms; value: next_token_ids=tensor([7640], device='cuda:0') mtp accept=1 prop=7640 top1=7640 accp=1.000 next=pair draft=94 prop=94 pred gate=device Token # 586: 117.916ms; value: next_token_ids=tensor([94], device='cuda:0') mtp accept=1 prop=94 top1=94 accp=1.000 next=draft=2619 prop=2619 olap pair=111.7ms serial=196.7ms gain=85.0ms ratio=0.43 s0=8.9ms s1=187.8ms wait=0.2/42.3ms pred gate=device Token # 587: 4.588ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=pair draft=2793 prop=2793 pred gate=device Token # 588: 116.804ms; value: next_token_ids=tensor([2793], device='cuda:0') mtp accept=1 prop=2793 top1=2793 accp=1.000 next=draft=4055 prop=4055 olap pair=111.4ms serial=197.3ms gain=85.8ms ratio=0.44 s0=6.9ms s1=190.4ms wait=0.2/44.6ms pred gate=device Token # 589: 3.759ms; value: next_token_ids=tensor([47948], device='cuda:0') mtp accept=0 prop=4055 top1=47948 accp=0.007 next=pair draft=223 prop=223 pred gate=device Token # 590: 116.919ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=22827 prop=22827 olap pair=111.4ms serial=196.8ms gain=85.4ms ratio=0.43 s0=8.7ms s1=188.1ms wait=0.2/42.6ms pred gate=device Token # 591: 3.735ms; value: next_token_ids=tensor([45045], device='cuda:0') mtp accept=0 prop=22827 top1=45045 accp=0.274 next=pair draft=666 prop=666 pred gate=device Token # 592: 117.321ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=369 prop=369 olap pair=111.3ms serial=196.6ms gain=85.3ms ratio=0.43 s0=8.4ms s1=188.2ms wait=0.2/43.0ms pred gate=device Token # 593: 4.652ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 594: 116.132ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.996 next=draft=18 prop=18 olap pair=110.8ms serial=196.7ms gain=85.9ms ratio=0.44 s0=4.0ms s1=192.6ms wait=0.1/48.1ms pred gate=device Token # 595: 3.752ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 596: 116.122ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=27521 prop=27521 olap pair=110.9ms serial=197.1ms gain=86.2ms ratio=0.44 s0=4.2ms s1=192.8ms wait=0.1/47.8ms pred gate=device Token # 597: 3.755ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 598: 117.328ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=11274 prop=11274 olap pair=111.9ms serial=197.2ms gain=85.3ms ratio=0.43 s0=4.4ms s1=192.8ms wait=0.1/48.0ms pred gate=device Token # 599: 3.734ms; value: next_token_ids=tensor([11274], device='cuda:0') mtp accept=1 prop=11274 top1=11274 accp=1.000 next=pair draft=7640 prop=7640 pred gate=device Token # 600: 117.263ms; value: next_token_ids=tensor([7640], device='cuda:0') mtp accept=1 prop=7640 top1=7640 accp=1.000 next=draft=94 prop=94 olap pair=111.9ms serial=198.1ms gain=86.2ms ratio=0.44 s0=4.1ms s1=194.0ms wait=0.1/48.3ms pred gate=device Token # 601: 3.731ms; value: next_token_ids=tensor([94], device='cuda:0') mtp accept=1 prop=94 top1=94 accp=1.000 next=pair draft=2619 prop=2619 pred gate=device Token # 602: 116.018ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=draft=7849 prop=7849 olap pair=110.8ms serial=195.6ms gain=84.8ms ratio=0.43 s0=4.3ms s1=191.3ms wait=0.1/47.7ms pred gate=device Token # 603: 3.699ms; value: next_token_ids=tensor([1644], device='cuda:0') mtp accept=0 prop=7849 top1=1833 accp=0.452 next=pair draft=47948 prop=47948 pred gate=device Token # 604: 116.439ms; value: next_token_ids=tensor([47948], device='cuda:0') mtp accept=1 prop=47948 top1=47948 accp=0.999 next=draft=223 prop=223 olap pair=111.1ms serial=196.3ms gain=85.2ms ratio=0.43 s0=4.7ms s1=191.7ms wait=0.1/47.2ms pred gate=device Token # 605: 3.714ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=93130 prop=93130 pred gate=device Token # 606: 116.313ms; value: next_token_ids=tensor([4398], device='cuda:0') mtp accept=0 prop=93130 top1=4398 accp=0.118 next=draft=2130 prop=2130 olap pair=111.0ms serial=197.2ms gain=86.2ms ratio=0.44 s0=5.2ms s1=192.0ms wait=0.1/46.8ms pred gate=device Token # 607: 116.129ms; value: next_token_ids=tensor([2130], device='cuda:0') mtp accept=1 prop=2130 top1=2130 accp=1.000 next=draft=666 prop=666 olap pair=110.8ms serial=196.1ms gain=85.3ms ratio=0.43 s0=5.1ms s1=190.9ms wait=0.1/46.6ms pred gate=device Token # 608: 3.706ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=pair draft=369 prop=369 pred gate=device Token # 609: 117.475ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=1.000 next=draft=223 prop=223 olap pair=112.2ms serial=198.3ms gain=86.1ms ratio=0.43 s0=5.7ms s1=192.7ms wait=0.1/46.3ms pred gate=device Token # 610: 3.789ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=2738 prop=2738 pred gate=device Token # 611: 117.533ms; value: next_token_ids=tensor([2738], device='cuda:0') mtp accept=1 prop=2738 top1=2738 accp=1.000 next=draft=223 prop=223 olap pair=112.0ms serial=197.4ms gain=85.4ms ratio=0.43 s0=5.5ms s1=191.9ms wait=0.1/46.4ms pred gate=device Token # 612: 3.750ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.984 next=pair draft=7016 prop=7016 pred gate=device Token # 613: 116.663ms; value: next_token_ids=tensor([7016], device='cuda:0') mtp accept=1 prop=7016 top1=7016 accp=1.000 next=draft=11274 prop=11274 olap pair=111.3ms serial=196.9ms gain=85.6ms ratio=0.43 s0=5.5ms s1=191.4ms wait=0.1/46.3ms pred gate=device Token # 614: 3.709ms; value: next_token_ids=tensor([11274], device='cuda:0') mtp accept=1 prop=11274 top1=11274 accp=1.000 next=pair draft=7640 prop=7640 pred gate=device Token # 615: 116.386ms; value: next_token_ids=tensor([7640], device='cuda:0') mtp accept=1 prop=7640 top1=7640 accp=1.000 next=draft=94 prop=94 olap pair=111.2ms serial=197.3ms gain=86.1ms ratio=0.44 s0=6.1ms s1=191.1ms wait=0.2/45.4ms pred gate=device Token # 616: 3.780ms; value: next_token_ids=tensor([94], device='cuda:0') mtp accept=1 prop=94 top1=94 accp=1.000 next=pair draft=2619 prop=2619 pred gate=device Token # 617: 116.506ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.955 next=draft=47 prop=47 olap pair=111.3ms serial=197.9ms gain=86.6ms ratio=0.44 s0=4.0ms s1=193.9ms wait=0.1/48.3ms pred gate=device Token # 618: 3.756ms; value: next_token_ids=tensor([47], device='cuda:0') mtp accept=1 prop=47 top1=47 accp=1.000 next=pair draft=8835 prop=8835 pred gate=device Token # 619: 117.441ms; value: next_token_ids=tensor([8835], device='cuda:0') mtp accept=1 prop=8835 top1=8835 accp=1.000 next=draft=19 prop=19 olap pair=112.2ms serial=198.7ms gain=86.5ms ratio=0.44 s0=3.9ms s1=194.8ms wait=0.1/48.3ms pred gate=device Token # 620: 3.700ms; value: next_token_ids=tensor([4904], device='cuda:0') mtp accept=0 prop=19 top1=19 accp=0.908 next=pair draft=31 prop=31 pred gate=device Token # 621: 115.444ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=19 prop=19 olap pair=110.1ms serial=195.0ms gain=84.9ms ratio=0.44 s0=6.8ms s1=188.2ms wait=0.2/44.9ms pred gate=device Token # 622: 3.707ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 623: 116.363ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=52471 prop=52471 olap pair=111.0ms serial=197.5ms gain=86.5ms ratio=0.44 s0=3.8ms s1=193.6ms wait=0.1/48.4ms pred gate=device Token # 624: 3.865ms; value: next_token_ids=tensor([10768], device='cuda:0') mtp accept=0 prop=52471 top1=10768 accp=0.455 next=pair draft=666 prop=666 pred gate=device Token # 625: 116.904ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=369 prop=369 olap pair=111.6ms serial=198.7ms gain=87.1ms ratio=0.44 s0=3.9ms s1=194.8ms wait=0.1/48.4ms pred gate=device Token # 626: 3.718ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 627: 116.616ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=31444 prop=31444 olap pair=111.4ms serial=198.2ms gain=86.8ms ratio=0.44 s0=3.8ms s1=194.4ms wait=0.1/48.6ms pred gate=device Token # 628: 3.738ms; value: next_token_ids=tensor([31444], device='cuda:0') mtp accept=1 prop=31444 top1=31444 accp=1.000 next=pair draft=1492 prop=1492 pred gate=device Token # 629: 117.474ms; value: next_token_ids=tensor([1492], device='cuda:0') mtp accept=1 prop=1492 top1=1492 accp=1.000 next=draft=223 prop=223 olap pair=112.1ms serial=199.1ms gain=86.9ms ratio=0.44 s0=4.2ms s1=194.9ms wait=0.1/48.2ms pred gate=device Token # 630: 3.788ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=5769 prop=5769 pred gate=device Token # 631: 116.745ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=1.000 next=draft=21 prop=21 olap pair=111.5ms serial=198.3ms gain=86.8ms ratio=0.44 s0=3.9ms s1=194.3ms wait=0.1/48.3ms pred gate=device Token # 632: 3.728ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=pair draft=35015 prop=35015 pred gate=device Token # 633: 116.462ms; value: next_token_ids=tensor([35015], device='cuda:0') mtp accept=1 prop=35015 top1=35015 accp=1.000 next=draft=223 prop=223 olap pair=111.1ms serial=196.6ms gain=85.4ms ratio=0.43 s0=6.9ms s1=189.7ms wait=0.2/44.6ms pred gate=device Token # 634: 3.779ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=5198 prop=5198 pred gate=device Token # 635: 116.866ms; value: next_token_ids=tensor([5198], device='cuda:0') mtp accept=1 prop=5198 top1=5198 accp=1.000 next=draft=16 prop=16 olap pair=111.6ms serial=198.2ms gain=86.6ms ratio=0.44 s0=6.6ms s1=191.6ms wait=0.2/45.0ms pred gate=device Token # 636: 3.826ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 637: 117.401ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=7 prop=7 olap pair=111.6ms serial=197.2ms gain=85.6ms ratio=0.43 s0=7.8ms s1=189.3ms wait=0.2/43.5ms pred gate=device Token # 638: 3.819ms; value: next_token_ids=tensor([7], device='cuda:0') mtp accept=1 prop=7 top1=7 accp=1.000 next=pair draft=25830 prop=25830 pred gate=device Token # 639: 117.837ms; value: next_token_ids=tensor([25830], device='cuda:0') mtp accept=1 prop=25830 top1=25830 accp=1.000 next=draft=12 prop=12 olap pair=111.8ms serial=198.2ms gain=86.4ms ratio=0.44 s0=6.2ms s1=192.1ms wait=0.2/45.9ms pred gate=device Token # 640: 4.579ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=666 accp=0.258 next=pair draft=8563 prop=8563 pred gate=device Token # 641: 116.154ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=0 prop=8563 top1=1237 accp=0.042 next=draft=33605 prop=33605 olap pair=110.8ms serial=196.1ms gain=85.3ms ratio=0.43 s0=8.2ms s1=187.9ms wait=0.2/43.1ms pred gate=device Token # 642: 116.572ms; value: next_token_ids=tensor([33605], device='cuda:0') mtp accept=1 prop=33605 top1=33605 accp=0.924 next=draft=76207 prop=76207 olap pair=111.3ms serial=198.0ms gain=86.7ms ratio=0.44 s0=3.8ms s1=194.2ms wait=0.1/48.6ms pred gate=device Token # 643: 3.756ms; value: next_token_ids=tensor([8738], device='cuda:0') mtp accept=0 prop=76207 top1=8738 accp=0.000 next=pair draft=76207 prop=76207 pred gate=device Token # 644: 116.901ms; value: next_token_ids=tensor([4498], device='cuda:0') mtp accept=0 prop=76207 top1=4498 accp=0.014 next=draft=76207 prop=76207 olap pair=111.6ms serial=198.1ms gain=86.5ms ratio=0.44 s0=4.4ms s1=193.7ms wait=0.1/47.3ms pred gate=device Token # 645: 116.777ms; value: next_token_ids=tensor([76207], device='cuda:0') mtp accept=1 prop=76207 top1=76207 accp=1.000 next=draft=303 prop=303 olap pair=111.4ms serial=196.8ms gain=85.3ms ratio=0.43 s0=4.3ms s1=192.5ms wait=0.1/48.0ms pred gate=device Token # 646: 3.728ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.890 next=pair draft=1620 prop=7000 pred gate=device Token # 647: 117.243ms; value: next_token_ids=tensor([7000], device='cuda:0') mtp accept=1 prop=7000 top1=7000 accp=0.096 next=draft=14590 prop=223 olap pair=111.6ms serial=197.5ms gain=85.9ms ratio=0.44 s0=5.4ms s1=192.1ms wait=0.1/46.8ms pred gate=device Token # 648: 3.786ms; value: next_token_ids=tensor([49449], device='cuda:0') mtp accept=0 prop=223 top1=49449 accp=0.408 next=pair draft=223 prop=768 pred gate=device Token # 649: 116.793ms; value: next_token_ids=tensor([939], device='cuda:0') mtp accept=0 prop=768 top1=223 accp=0.964 next=draft=24 prop=24 olap pair=111.5ms serial=198.1ms gain=86.7ms ratio=0.44 s0=4.0ms s1=194.1ms wait=0.1/48.3ms pred gate=device Token # 650: 116.832ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=draft=15 prop=15 olap pair=111.6ms serial=198.4ms gain=86.8ms ratio=0.44 s0=4.0ms s1=194.4ms wait=0.1/48.4ms pred gate=device Token # 651: 3.807ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=0.918 next=pair draft=3600 prop=3600 pred gate=device Token # 652: 117.322ms; value: next_token_ids=tensor([3600], device='cuda:0') mtp accept=1 prop=3600 top1=3600 accp=1.000 next=draft=15 prop=15 olap pair=112.0ms serial=198.7ms gain=86.7ms ratio=0.44 s0=4.1ms s1=194.7ms wait=0.1/48.2ms pred gate=device Token # 653: 3.737ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=pair draft=1059 prop=1059 pred gate=device Token # 654: 116.678ms; value: next_token_ids=tensor([1059], device='cuda:0') mtp accept=1 prop=1059 top1=1059 accp=1.000 next=draft=303 prop=303 olap pair=111.5ms serial=197.7ms gain=86.3ms ratio=0.44 s0=4.4ms s1=193.3ms wait=0.1/47.7ms pred gate=device Token # 655: 3.711ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.997 next=pair draft=5769 prop=5769 pred gate=device Token # 656: 117.093ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=0.723 next=draft=22 prop=22 olap pair=111.7ms serial=198.3ms gain=86.5ms ratio=0.44 s0=4.0ms s1=194.2ms wait=0.1/48.2ms pred gate=device Token # 657: 3.687ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=pair draft=17840 prop=17840 pred gate=device Token # 658: 116.333ms; value: next_token_ids=tensor([17840], device='cuda:0') mtp accept=1 prop=17840 top1=17840 accp=0.998 next=draft=223 prop=223 olap pair=111.1ms serial=197.3ms gain=86.3ms ratio=0.44 s0=4.1ms s1=193.2ms wait=0.1/47.4ms pred gate=device Token # 659: 3.786ms; value: next_token_ids=tensor([1122], device='cuda:0') mtp accept=0 prop=223 top1=1122 accp=0.112 next=pair draft=5659 prop=5659 pred gate=device Token # 660: 116.924ms; value: next_token_ids=tensor([6533], device='cuda:0') mtp accept=0 prop=5659 top1=6533 accp=0.013 next=draft=303 prop=303 olap pair=111.7ms serial=198.2ms gain=86.6ms ratio=0.44 s0=4.2ms s1=194.1ms wait=0.1/47.9ms pred gate=device Token # 661: 117.534ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.629 next=draft=93130 prop=93130 olap pair=112.1ms serial=198.2ms gain=86.1ms ratio=0.43 s0=4.4ms s1=193.8ms wait=0.1/47.9ms pred gate=device Token # 662: 3.755ms; value: next_token_ids=tensor([93130], device='cuda:0') mtp accept=1 prop=93130 top1=93130 accp=0.953 next=pair draft=90974 prop=90974 pred gate=device Token # 663: 116.426ms; value: next_token_ids=tensor([6742], device='cuda:0') mtp accept=0 prop=90974 top1=6742 accp=0.012 next=draft=856 prop=856 olap pair=111.0ms serial=197.4ms gain=86.4ms ratio=0.44 s0=3.9ms s1=193.6ms wait=0.1/48.6ms pred gate=device Token # 664: 116.773ms; value: next_token_ids=tensor([856], device='cuda:0') mtp accept=1 prop=856 top1=856 accp=1.000 next=draft=16 prop=16 olap pair=111.4ms serial=198.3ms gain=86.9ms ratio=0.44 s0=3.9ms s1=194.4ms wait=0.1/48.3ms pred gate=device Token # 665: 3.756ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=pair draft=14882 prop=14882 pred gate=device Token # 666: 116.261ms; value: next_token_ids=tensor([14882], device='cuda:0') mtp accept=1 prop=14882 top1=14882 accp=1.000 next=draft=28828 prop=28828 olap pair=111.0ms serial=197.5ms gain=86.5ms ratio=0.44 s0=3.9ms s1=193.6ms wait=0.1/48.6ms pred gate=device Token # 667: 3.817ms; value: next_token_ids=tensor([28828], device='cuda:0') mtp accept=1 prop=28828 top1=28828 accp=0.956 next=pair draft=2283 prop=2283 pred gate=device Token # 668: 116.304ms; value: next_token_ids=tensor([2283], device='cuda:0') mtp accept=1 prop=2283 top1=2283 accp=1.000 next=draft=303 prop=303 olap pair=111.1ms serial=196.8ms gain=85.7ms ratio=0.44 s0=4.0ms s1=192.8ms wait=0.1/48.3ms pred gate=device Token # 669: 3.731ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=36060 prop=36060 pred gate=device Token # 670: 116.037ms; value: next_token_ids=tensor([36060], device='cuda:0') mtp accept=1 prop=36060 top1=36060 accp=1.000 next=draft=31 prop=31 olap pair=110.8ms serial=197.0ms gain=86.2ms ratio=0.44 s0=3.8ms s1=193.2ms wait=0.1/48.7ms pred gate=device Token # 671: 3.843ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 672: 116.076ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=4003 prop=4003 olap pair=110.8ms serial=197.1ms gain=86.3ms ratio=0.44 s0=4.0ms s1=193.2ms wait=0.1/48.4ms pred gate=device Token # 673: 3.747ms; value: next_token_ids=tensor([4003], device='cuda:0') mtp accept=1 prop=4003 top1=4003 accp=0.996 next=pair draft=31444 prop=31444 pred gate=device Token # 674: 116.618ms; value: next_token_ids=tensor([31444], device='cuda:0') mtp accept=1 prop=31444 top1=31444 accp=1.000 next=draft=1293 prop=1293 olap pair=111.3ms serial=198.1ms gain=86.8ms ratio=0.44 s0=3.9ms s1=194.1ms wait=0.1/48.4ms pred gate=device Token # 675: 3.860ms; value: next_token_ids=tensor([1293], device='cuda:0') mtp accept=1 prop=1293 top1=1293 accp=1.000 next=pair draft=83340 prop=83340 pred gate=device Token # 676: 117.162ms; value: next_token_ids=tensor([83340], device='cuda:0') mtp accept=1 prop=83340 top1=83340 accp=0.870 next=draft=23668 prop=23668 olap pair=111.9ms serial=199.2ms gain=87.3ms ratio=0.44 s0=3.9ms s1=195.3ms wait=0.1/48.4ms pred gate=device Token # 677: 3.723ms; value: next_token_ids=tensor([23668], device='cuda:0') mtp accept=1 prop=23668 top1=23668 accp=1.000 next=pair draft=31826 prop=31826 pred gate=device Token # 678: 117.649ms; value: next_token_ids=tensor([31826], device='cuda:0') mtp accept=1 prop=31826 top1=31826 accp=0.808 next=draft=5960 prop=5960 olap pair=111.6ms serial=197.5ms gain=85.9ms ratio=0.43 s0=7.2ms s1=190.3ms wait=0.2/44.2ms pred gate=device Token # 679: 4.714ms; value: next_token_ids=tensor([5960], device='cuda:0') mtp accept=1 prop=5960 top1=5960 accp=0.999 next=pair draft=66307 prop=66307 pred gate=device Token # 680: 116.547ms; value: next_token_ids=tensor([66307], device='cuda:0') mtp accept=1 prop=66307 top1=66307 accp=1.000 next=draft=303 prop=303 olap pair=111.1ms serial=197.1ms gain=86.0ms ratio=0.44 s0=5.2ms s1=191.9ms wait=0.1/46.5ms pred gate=device Token # 681: 3.792ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=17369 prop=17369 pred gate=device Token # 682: 117.358ms; value: next_token_ids=tensor([1530], device='cuda:0') mtp accept=0 prop=17369 top1=1530 accp=0.014 next=draft=1994 prop=1994 olap pair=111.4ms serial=197.0ms gain=85.7ms ratio=0.43 s0=7.2ms s1=189.9ms wait=0.2/44.3ms pred gate=device Token # 683: 117.099ms; value: next_token_ids=tensor([2541], device='cuda:0') mtp accept=0 prop=1994 top1=1994 accp=0.609 next=draft=17899 prop=17899 olap pair=111.5ms serial=197.6ms gain=86.1ms ratio=0.44 s0=4.9ms s1=192.6ms wait=0.1/47.0ms pred gate=device Token # 684: 117.186ms; value: next_token_ids=tensor([5293], device='cuda:0') mtp accept=0 prop=17899 top1=5293 accp=0.058 next=draft=17899 prop=17899 olap pair=111.9ms serial=196.9ms gain=85.0ms ratio=0.43 s0=4.4ms s1=192.4ms wait=0.1/48.1ms pred gate=device Token # 685: 116.988ms; value: next_token_ids=tensor([17899], device='cuda:0') mtp accept=1 prop=17899 top1=17899 accp=1.000 next=draft=19476 prop=19476 olap pair=110.9ms serial=196.5ms gain=85.6ms ratio=0.44 s0=4.8ms s1=191.7ms wait=0.1/47.2ms pred gate=device Token # 686: 4.669ms; value: next_token_ids=tensor([19476], device='cuda:0') mtp accept=1 prop=19476 top1=19476 accp=0.996 next=pair draft=303 prop=303 pred gate=device Token # 687: 116.775ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=11253 prop=728 olap pair=111.3ms serial=197.4ms gain=86.1ms ratio=0.44 s0=6.9ms s1=190.6ms wait=0.2/44.8ms pred gate=device Token # 688: 3.723ms; value: next_token_ids=tensor([11253], device='cuda:0') mtp accept=0 prop=728 top1=11253 accp=0.697 next=pair draft=117813 prop=117813 pred gate=device Token # 689: 116.530ms; value: next_token_ids=tensor([117813], device='cuda:0') mtp accept=1 prop=117813 top1=117813 accp=1.000 next=draft=5293 prop=5293 olap pair=111.3ms serial=197.4ms gain=86.2ms ratio=0.44 s0=4.1ms s1=193.3ms wait=0.1/48.2ms pred gate=device Token # 690: 3.734ms; value: next_token_ids=tensor([5293], device='cuda:0') mtp accept=1 prop=5293 top1=5293 accp=0.974 next=pair draft=37653 prop=37653 pred gate=device Token # 691: 116.700ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=0 prop=37653 top1=4339 accp=0.051 next=draft=2827 prop=2827 olap pair=111.5ms serial=198.4ms gain=87.0ms ratio=0.44 s0=3.8ms s1=194.6ms wait=0.1/48.7ms pred gate=device Token # 692: 117.452ms; value: next_token_ids=tensor([2827], device='cuda:0') mtp accept=1 prop=2827 top1=2827 accp=1.000 next=draft=320 prop=320 olap pair=111.2ms serial=197.6ms gain=86.3ms ratio=0.44 s0=4.4ms s1=193.2ms wait=0.1/48.1ms pred gate=device Token # 693: 4.639ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=30869 prop=30869 pred gate=device Token # 694: 117.116ms; value: next_token_ids=tensor([30869], device='cuda:0') mtp accept=1 prop=30869 top1=30869 accp=1.000 next=draft=4618 prop=4618 olap pair=111.0ms serial=195.7ms gain=84.6ms ratio=0.43 s0=7.6ms s1=188.0ms wait=0.2/43.7ms pred gate=device Token # 695: 4.748ms; value: next_token_ids=tensor([93351], device='cuda:0') mtp accept=0 prop=4618 top1=93351 accp=0.134 next=pair draft=223 prop=223 pred gate=device Token # 696: 116.360ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=548 prop=548 olap pair=110.8ms serial=196.9ms gain=86.1ms ratio=0.44 s0=4.5ms s1=192.4ms wait=0.1/47.8ms pred gate=device Token # 697: 3.806ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=1 prop=548 top1=548 accp=1.000 next=pair draft=370 prop=370 pred gate=device Token # 698: 116.605ms; value: next_token_ids=tensor([370], device='cuda:0') mtp accept=1 prop=370 top1=370 accp=1.000 next=draft=69055 prop=69055 olap pair=111.1ms serial=197.8ms gain=86.6ms ratio=0.44 s0=3.9ms s1=193.9ms wait=0.1/48.5ms pred gate=device Token # 699: 3.745ms; value: next_token_ids=tensor([69055], device='cuda:0') mtp accept=1 prop=69055 top1=69055 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 700: 117.621ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=49764 prop=49764 olap pair=111.6ms serial=197.1ms gain=85.6ms ratio=0.43 s0=6.0ms s1=191.2ms wait=0.2/45.9ms pred gate=device Token # 701: 4.683ms; value: next_token_ids=tensor([49764], device='cuda:0') mtp accept=1 prop=49764 top1=49764 accp=0.932 next=pair draft=16432 prop=16432 pred gate=device Token # 702: 117.890ms; value: next_token_ids=tensor([16432], device='cuda:0') mtp accept=1 prop=16432 top1=16432 accp=0.915 next=draft=1481 prop=1481 olap pair=111.7ms serial=197.5ms gain=85.7ms ratio=0.43 s0=8.9ms s1=188.6ms wait=0.2/42.3ms pred gate=device Token # 703: 4.307ms; value: next_token_ids=tensor([111934], device='cuda:0') mtp accept=0 prop=1481 top1=111934 accp=0.323 next=pair draft=7018 prop=7018 pred gate=device Token # 704: 117.103ms; value: next_token_ids=tensor([3500], device='cuda:0') mtp accept=0 prop=7018 top1=3500 accp=0.000 next=draft=55074 prop=55074 olap pair=110.9ms serial=196.0ms gain=85.2ms ratio=0.43 s0=8.4ms s1=187.6ms wait=0.2/42.9ms pred gate=device Token # 705: 116.919ms; value: next_token_ids=tensor([55074], device='cuda:0') mtp accept=1 prop=55074 top1=55074 accp=0.688 next=draft=525 prop=525 olap pair=111.3ms serial=197.1ms gain=85.8ms ratio=0.44 s0=7.7ms s1=189.4ms wait=0.2/43.6ms pred gate=device Token # 706: 3.737ms; value: next_token_ids=tensor([525], device='cuda:0') mtp accept=1 prop=525 top1=525 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 707: 116.576ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=4569 prop=4569 olap pair=111.2ms serial=197.4ms gain=86.2ms ratio=0.44 s0=4.0ms s1=193.4ms wait=0.1/48.3ms pred gate=device Token # 708: 3.762ms; value: next_token_ids=tensor([4569], device='cuda:0') mtp accept=1 prop=4569 top1=4569 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 709: 118.200ms; value: next_token_ids=tensor([12519], device='cuda:0') mtp accept=0 prop=223 top1=12519 accp=0.100 next=draft=223 prop=223 olap pair=112.1ms serial=198.2ms gain=86.1ms ratio=0.43 s0=6.2ms s1=192.0ms wait=0.2/45.6ms pred gate=device Token # 710: 118.007ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.770 next=draft=26139 prop=26139 olap pair=111.8ms serial=197.3ms gain=85.5ms ratio=0.43 s0=8.8ms s1=188.5ms wait=0.2/42.4ms pred gate=device Token # 711: 3.766ms; value: next_token_ids=tensor([26139], device='cuda:0') mtp accept=1 prop=26139 top1=26139 accp=1.000 next=pair draft=223 prop=223 pred gate=device