[2026-03-29 23:47:40.584052 INFO duck_llm] 这是一条信息日志 [2026-03-29 23:47:40.584085 WARN duck_llm] 这是一条警告日志 [2026-03-29 23:47:40.584087 ERROR duck_llm] 这是一条错误日志 [2026-03-29 23:47:40.584289 INFO utils] Selected DPDK lcores: master=0, workers=[2, 4, 6, 8], all_performance_core_representatives=[0, 2, 4, 6, 8, 10, 12, 14] EAL: Detected CPU lcores: 32 EAL: Detected NUMA nodes: 1 EAL: Detected shared linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: VFIO support initialized EAL: Using IOMMU type 1 (Type 1) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) [2026-03-29 23:47:42.626198 INFO dpdk_workers] DPDK initialized successfully. Found 4 ports. [2026-03-29 23:47:42.626215 INFO dpdk_workers] Port 0 device name: 0000:01:00.0 [2026-03-29 23:47:42.626217 INFO dpdk_workers] Port 0 IP address: 10.21.1.1 [2026-03-29 23:47:42.626219 INFO dpdk_workers] Port 0 Broadcast address: 10.21.1.255 [2026-03-29 23:47:42.626221 INFO dpdk_workers] Port 1 device name: 0000:01:00.1 [2026-03-29 23:47:42.626223 INFO dpdk_workers] Port 1 IP address: 10.21.2.1 [2026-03-29 23:47:42.626224 INFO dpdk_workers] Port 1 Broadcast address: 10.21.2.255 [2026-03-29 23:47:42.626226 INFO dpdk_workers] Port 2 device name: 0000:01:00.2 [2026-03-29 23:47:42.626227 INFO dpdk_workers] Port 2 IP address: 10.21.3.1 [2026-03-29 23:47:42.626229 INFO dpdk_workers] Port 2 Broadcast address: 10.21.3.255 [2026-03-29 23:47:42.626230 INFO dpdk_workers] Port 3 device name: 0000:01:00.3 [2026-03-29 23:47:42.626231 INFO dpdk_workers] Port 3 IP address: 10.21.4.1 [2026-03-29 23:47:42.626233 INFO dpdk_workers] Port 3 Broadcast address: 10.21.4.255 [2026-03-29 23:47:42.626235 INFO dpdk_workers] Available netifs list: [(10.21.1.255, 0, 10.21.1.1), (10.21.2.255, 1, 10.21.2.1), (10.21.3.255, 2, 10.21.3.1), (10.21.4.255, 3, 10.21.4.1)] [2026-03-29 23:47:42.626240 INFO dpdk_workers] Starting worker #0: (bcast_ip: 10.21.1.255, port_id: 0, lcore_id: 2, host_ip: 10.21.1.1) [2026-03-29 23:47:42.626279 INFO dpdk_workers] Initializing worker port 0 on lcore 2... [2026-03-29 23:47:42.627780 INFO dpdk_workers] Starting worker #1: (bcast_ip: 10.21.2.255, port_id: 1, lcore_id: 4, host_ip: 10.21.2.1) [2026-03-29 23:47:42.627803 INFO dpdk_workers] Starting worker #2: (bcast_ip: 10.21.3.255, port_id: 2, lcore_id: 6, host_ip: 10.21.3.1) [2026-03-29 23:47:42.627816 INFO dpdk_workers] Starting worker #3: (bcast_ip: 10.21.4.255, port_id: 3, lcore_id: 8, host_ip: 10.21.4.1) [2026-03-29 23:47:42.628038 INFO dpdk_workers] Initializing worker port 1 on lcore 4... [2026-03-29 23:47:42.629803 INFO dpdk_workers] Initializing worker port 2 on lcore 6... [2026-03-29 23:47:42.631792 INFO dpdk_workers] Initializing worker port 3 on lcore 8... ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 1). ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 2). ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 0). ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 3). [2026-03-29 23:47:46.429702 INFO dpdk_workers] Worker port 1 initialized successfully. [2026-03-29 23:47:47.300930 INFO dpdk_workers] Worker port 2 initialized successfully. [2026-03-29 23:47:47.301830 INFO dpdk_workers] Worker port 3 initialized successfully. [2026-03-29 23:47:47.304135 INFO dpdk_workers] Worker port 0 initialized successfully. [2026-03-29 23:47:47.304158 INFO dpdk_workers] Workers initialized successfully. 4 workers running. [2026-03-29 23:47:47.304412 INFO utils] Binding master thread to cores (excluding workers): [0, 1, 3, 5, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31] [2026-03-29 23:47:47.304423 INFO utils] set_thread_affinity(tid 924920, cores [0, 1, 3, 5, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]): 0 [2026-03-29 23:47:47.305214 INFO dpdk_workers] Run command Ping all time: send 1.1 us, recv 784.9 us [2026-03-29 23:47:47.355272 INFO dpdk_workers] Run command Ping all time: send 0.3 us, recv 0.4 us [2026-03-29 23:47:47.405328 INFO dpdk_workers] Run command Ping all time: send 0.2 us, recv 0.4 us [2026-03-29 23:47:47.455387 INFO dpdk_workers] Run command Ping all time: send 0.8 us, recv 0.7 us [2026-03-29 23:47:47.505474 INFO dpdk_workers] Run command Ping all time: send 4.2 us, recv 2.9 us [2026-03-29 23:47:47.555587 INFO dpdk_workers] Run command Ping all time: send 1.3 us, recv 1.5 us [2026-03-29 23:47:47.605678 INFO dpdk_workers] Run command Ping all time: send 0.9 us, recv 1.2 us [2026-03-29 23:47:47.655756 INFO dpdk_workers] Run command Ping all time: send 1.4 us, recv 1.0 us [2026-03-29 23:47:47.705860 INFO dpdk_workers] Run command Ping all time: send 1.4 us, recv 1.2 us [2026-03-29 23:47:47.755952 INFO dpdk_workers] Run command Ping all time: send 1.3 us, recv 1.1 us [2026-03-29 23:47:47.806078 INFO dpdk_workers] Found 32 ducks in duck-ips-multi-netifs.txt [2026-03-29 23:47:47.806081 INFO dpdk_workers] Duck #0: 10.21.1.101 (bcast_ip: 10.21.1.255) [2026-03-29 23:47:47.806083 INFO dpdk_workers] Duck #1: 10.21.1.102 (bcast_ip: 10.21.1.255) [2026-03-29 23:47:47.806085 INFO dpdk_workers] Duck #2: 10.21.1.103 (bcast_ip: 10.21.1.255) [2026-03-29 23:47:47.806086 INFO dpdk_workers] Duck #3: 10.21.1.104 (bcast_ip: 10.21.1.255) [2026-03-29 23:47:47.806088 INFO dpdk_workers] Duck #4: 10.21.1.105 (bcast_ip: 10.21.1.255) [2026-03-29 23:47:47.806089 INFO dpdk_workers] Duck #5: 10.21.1.106 (bcast_ip: 10.21.1.255) [2026-03-29 23:47:47.806092 INFO dpdk_workers] Duck #6: 10.21.1.107 (bcast_ip: 10.21.1.255) [2026-03-29 23:47:47.806094 INFO dpdk_workers] Duck #7: 10.21.1.108 (bcast_ip: 10.21.1.255) [2026-03-29 23:47:47.806096 INFO dpdk_workers] Duck #8: 10.21.2.101 (bcast_ip: 10.21.2.255) [2026-03-29 23:47:47.806097 INFO dpdk_workers] Duck #9: 10.21.2.102 (bcast_ip: 10.21.2.255) [2026-03-29 23:47:47.806098 INFO dpdk_workers] Duck #10: 10.21.2.103 (bcast_ip: 10.21.2.255) [2026-03-29 23:47:47.806100 INFO dpdk_workers] Duck #11: 10.21.2.104 (bcast_ip: 10.21.2.255) [2026-03-29 23:47:47.806101 INFO dpdk_workers] Duck #12: 10.21.2.105 (bcast_ip: 10.21.2.255) [2026-03-29 23:47:47.806103 INFO dpdk_workers] Duck #13: 10.21.2.106 (bcast_ip: 10.21.2.255) [2026-03-29 23:47:47.806104 INFO dpdk_workers] Duck #14: 10.21.2.107 (bcast_ip: 10.21.2.255) [2026-03-29 23:47:47.806106 INFO dpdk_workers] Duck #15: 10.21.2.108 (bcast_ip: 10.21.2.255) [2026-03-29 23:47:47.806107 INFO dpdk_workers] Duck #16: 10.21.3.101 (bcast_ip: 10.21.3.255) [2026-03-29 23:47:47.806109 INFO dpdk_workers] Duck #17: 10.21.3.102 (bcast_ip: 10.21.3.255) [2026-03-29 23:47:47.806110 INFO dpdk_workers] Duck #18: 10.21.3.103 (bcast_ip: 10.21.3.255) [2026-03-29 23:47:47.806112 INFO dpdk_workers] Duck #19: 10.21.3.104 (bcast_ip: 10.21.3.255) [2026-03-29 23:47:47.806113 INFO dpdk_workers] Duck #20: 10.21.3.105 (bcast_ip: 10.21.3.255) [2026-03-29 23:47:47.806115 INFO dpdk_workers] Duck #21: 10.21.3.106 (bcast_ip: 10.21.3.255) [2026-03-29 23:47:47.806116 INFO dpdk_workers] Duck #22: 10.21.3.107 (bcast_ip: 10.21.3.255) [2026-03-29 23:47:47.806118 INFO dpdk_workers] Duck #23: 10.21.3.108 (bcast_ip: 10.21.3.255) [2026-03-29 23:47:47.806119 INFO dpdk_workers] Duck #24: 10.21.4.101 (bcast_ip: 10.21.4.255) [2026-03-29 23:47:47.806120 INFO dpdk_workers] Duck #25: 10.21.4.102 (bcast_ip: 10.21.4.255) [2026-03-29 23:47:47.806122 INFO dpdk_workers] Duck #26: 10.21.4.103 (bcast_ip: 10.21.4.255) [2026-03-29 23:47:47.806123 INFO dpdk_workers] Duck #27: 10.21.4.104 (bcast_ip: 10.21.4.255) [2026-03-29 23:47:47.806125 INFO dpdk_workers] Duck #28: 10.21.4.105 (bcast_ip: 10.21.4.255) [2026-03-29 23:47:47.806126 INFO dpdk_workers] Duck #29: 10.21.4.106 (bcast_ip: 10.21.4.255) [2026-03-29 23:47:47.806128 INFO dpdk_workers] Duck #30: 10.21.4.107 (bcast_ip: 10.21.4.255) [2026-03-29 23:47:47.806132 INFO dpdk_workers] Duck #31: 10.21.4.108 (bcast_ip: 10.21.4.255) [2026-03-29 23:47:48.008206 INFO dpdk_workers] [Worker 0]: 10.21.1.101 [2026-03-29 23:47:48.008240 INFO dpdk_workers] [Worker 0]: 10.21.1.102 [2026-03-29 23:47:48.008250 INFO dpdk_workers] [Worker 0]: 10.21.1.103 [2026-03-29 23:47:48.008252 INFO dpdk_workers] [Worker 0]: 10.21.1.104 [2026-03-29 23:47:48.008254 INFO dpdk_workers] [Worker 0]: 10.21.1.105 [2026-03-29 23:47:48.008256 INFO dpdk_workers] [Worker 0]: 10.21.1.106 [2026-03-29 23:47:48.008258 INFO dpdk_workers] [Worker 0]: 10.21.1.107 [2026-03-29 23:47:48.008260 INFO dpdk_workers] [Worker 0]: 10.21.1.108 [2026-03-29 23:47:48.008263 INFO dpdk_workers] [Worker 1]: 10.21.2.101 [2026-03-29 23:47:48.008264 INFO dpdk_workers] [Worker 1]: 10.21.2.102 [2026-03-29 23:47:48.008266 INFO dpdk_workers] [Worker 1]: 10.21.2.103 [2026-03-29 23:47:48.008267 INFO dpdk_workers] [Worker 1]: 10.21.2.104 [2026-03-29 23:47:48.008269 INFO dpdk_workers] [Worker 1]: 10.21.2.105 [2026-03-29 23:47:48.008270 INFO dpdk_workers] [Worker 1]: 10.21.2.106 [2026-03-29 23:47:48.008271 INFO dpdk_workers] [Worker 1]: 10.21.2.107 [2026-03-29 23:47:48.008273 INFO dpdk_workers] [Worker 1]: 10.21.2.108 [2026-03-29 23:47:48.009160 INFO dpdk_workers] [Worker 2]: 10.21.3.101 [2026-03-29 23:47:48.009174 INFO dpdk_workers] [Worker 2]: 10.21.3.102 [2026-03-29 23:47:48.009186 INFO dpdk_workers] [Worker 2]: 10.21.3.103 [2026-03-29 23:47:48.009195 INFO dpdk_workers] [Worker 2]: 10.21.3.104 [2026-03-29 23:47:48.009197 INFO dpdk_workers] [Worker 2]: 10.21.3.105 [2026-03-29 23:47:48.009199 INFO dpdk_workers] [Worker 2]: 10.21.3.106 [2026-03-29 23:47:48.009200 INFO dpdk_workers] [Worker 2]: 10.21.3.107 [2026-03-29 23:47:48.009202 INFO dpdk_workers] [Worker 2]: 10.21.3.108 [2026-03-29 23:47:48.009204 INFO dpdk_workers] [Worker 3]: 10.21.4.101 [2026-03-29 23:47:48.009206 INFO dpdk_workers] [Worker 3]: 10.21.4.102 [2026-03-29 23:47:48.009207 INFO dpdk_workers] [Worker 3]: 10.21.4.103 [2026-03-29 23:47:48.009209 INFO dpdk_workers] [Worker 3]: 10.21.4.104 [2026-03-29 23:47:48.009210 INFO dpdk_workers] [Worker 3]: 10.21.4.105 [2026-03-29 23:47:48.009211 INFO dpdk_workers] [Worker 3]: 10.21.4.106 [2026-03-29 23:47:48.009213 INFO dpdk_workers] [Worker 3]: 10.21.4.107 [2026-03-29 23:47:48.009215 INFO dpdk_workers] [Worker 3]: 10.21.4.108 [2026-03-29 23:47:48.009217 INFO dpdk_workers] init_ducks done [2026-03-29 23:47:48.010674 INFO dpdk_ducks] Initialized 4 DPDK duck workers [2026-03-29 23:47:48.010677 INFO dpdk_ducks] DPDK duck worker 0: DpdkDuckWorker { worker_idx: 0, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (0, 8) } [2026-03-29 23:47:48.010682 INFO dpdk_ducks] DPDK duck worker 1: DpdkDuckWorker { worker_idx: 1, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (8, 16) } [2026-03-29 23:47:48.010685 INFO dpdk_ducks] DPDK duck worker 2: DpdkDuckWorker { worker_idx: 2, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (16, 24) } [2026-03-29 23:47:48.010688 INFO dpdk_ducks] DPDK duck worker 3: DpdkDuckWorker { worker_idx: 3, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (24, 32) } [2026-03-29 23:47:48.010694 INFO buffer_manager] Initializing buffer manager [2026-03-29 23:47:48.010696 INFO buffer_manager] Buffer manager initialized: ELF BufferAllocator { begin: 0, end: 10485760, current: 0 }, input BufferAllocator { begin: 10485760, end: 104857600, current: 10485760 }, weights BufferAllocator { begin: 104923136, end: 32212254720, current: 104923136 } [2026-03-29 23:47:48.010700 INFO fp8_dpdk_common] fp9 persistent judge enabled by default; set DUCK_FP9_PERSISTENT_JUDGE=0 to disable [2026-03-29 23:47:48.011124 INFO buffer_manager] Added kernel fp9_kernels at (0, 91664) [2026-03-29 23:47:48.011158 INFO fp8_dpdk_common] fp9 persistent judge: opened 32 sessions [2026-03-29 23:47:48.011161 INFO fp8_dpdk_common] fp9 persistent judge: force-opened 32 fresh sessions for new init [2026-03-29 23:47:48.011163 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init(tp_size=32) [2026-03-29 23:47:48.011165 INFO fp8_moe_dpdk] fp8_moe_dpdk: init(tp_size=32) [2026-03-29 23:47:48.377158 INFO weight_cache] weight_cache: header hit tp_size=32 num_slots=62 finished_slots=62 [2026-03-29 23:47:48.704360 INFO buffer_manager] Allocated weights buffer at (104923136, 0) [2026-03-29 23:47:48.704380 INFO buffer_manager] Allocated weights buffer at (104923136, 4128768) [2026-03-29 23:47:48.704382 INFO buffer_manager] Allocated weights buffer at (109051904, 516096) [2026-03-29 23:47:48.704384 INFO buffer_manager] Allocated weights buffer at (109568000, 2016) [2026-03-29 23:47:48.704385 INFO buffer_manager] Allocated weights buffer at (109572096, 4128768) [2026-03-29 23:47:48.704386 INFO buffer_manager] Allocated weights buffer at (113700864, 516096) [2026-03-29 23:47:48.704388 INFO buffer_manager] Allocated weights buffer at (114216960, 2016) [2026-03-29 23:47:48.704389 INFO buffer_manager] Allocated weights buffer at (114221056, 4128768) [2026-03-29 23:47:48.704391 INFO buffer_manager] Allocated weights buffer at (118349824, 516096) [2026-03-29 23:47:48.704392 INFO buffer_manager] Allocated weights buffer at (118865920, 2016) [2026-03-29 23:47:48.704394 INFO buffer_manager] Allocated weights buffer at (118870016, 0) [2026-03-29 23:47:48.704395 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init_layer_cached(layer_idx=0, cache_slot=0) planned desc only [2026-03-29 23:47:48.797083 INFO buffer_manager] Allocated weights buffer at (118870016, 0) [2026-03-29 23:47:48.797101 INFO buffer_manager] Allocated weights buffer at (118870016, 4128768) [2026-03-29 23:47:48.797103 INFO buffer_manager] Allocated weights buffer at (122998784, 516096) [2026-03-29 23:47:48.797105 INFO buffer_manager] Allocated weights buffer at (123514880, 2016) [2026-03-29 23:47:48.797106 INFO buffer_manager] Allocated weights buffer at (123518976, 4128768) [2026-03-29 23:47:48.797107 INFO buffer_manager] Allocated weights buffer at (127647744, 516096) [2026-03-29 23:47:48.797109 INFO buffer_manager] Allocated weights buffer at (128163840, 2016) [2026-03-29 23:47:48.797110 INFO buffer_manager] Allocated weights buffer at (128167936, 4128768) [2026-03-29 23:47:48.797112 INFO buffer_manager] Allocated weights buffer at (132296704, 516096) [2026-03-29 23:47:48.797113 INFO buffer_manager] Allocated weights buffer at (132812800, 2016) [2026-03-29 23:47:48.797115 INFO buffer_manager] Allocated weights buffer at (132816896, 0) [2026-03-29 23:47:48.797117 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init_layer_cached(layer_idx=1, cache_slot=1) planned desc only [2026-03-29 23:47:48.883732 INFO buffer_manager] Allocated weights buffer at (132816896, 0) [2026-03-29 23:47:48.883749 INFO buffer_manager] Allocated weights buffer at (132816896, 4128768) [2026-03-29 23:47:48.883751 INFO buffer_manager] Allocated weights buffer at (136945664, 516096) [2026-03-29 23:47:48.883753 INFO buffer_manager] Allocated weights buffer at (137461760, 2016) [2026-03-29 23:47:48.883758 INFO buffer_manager] Allocated weights buffer at (137465856, 4128768) [2026-03-29 23:47:48.883759 INFO buffer_manager] Allocated weights buffer at (141594624, 516096) [2026-03-29 23:47:48.883761 INFO buffer_manager] Allocated weights buffer at (142110720, 2016) [2026-03-29 23:47:48.883762 INFO buffer_manager] Allocated weights buffer at (142114816, 4128768) [2026-03-29 23:47:48.883764 INFO buffer_manager] Allocated weights buffer at (146243584, 516096) [2026-03-29 23:47:48.883766 INFO buffer_manager] Allocated weights buffer at (146759680, 2016) [2026-03-29 23:47:48.883768 INFO buffer_manager] Allocated weights buffer at (146763776, 0) [2026-03-29 23:47:48.883770 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init_layer_cached(layer_idx=2, cache_slot=2) planned desc only [2026-03-29 23:47:48.912320 INFO buffer_manager] Allocated weights buffer at (146763776, 0) [2026-03-29 23:47:48.912334 INFO buffer_manager] Allocated weights buffer at (146763776, 132120576) [2026-03-29 23:47:48.912336 INFO buffer_manager] Allocated weights buffer at (278884352, 57344) [2026-03-29 23:47:48.912338 INFO buffer_manager] Allocated weights buffer at (278941696, 132120576) [2026-03-29 23:47:48.912339 INFO buffer_manager] Allocated weights buffer at (411062272, 57344) [2026-03-29 23:47:48.912341 INFO buffer_manager] Allocated weights buffer at (411119616, 132120576) [2026-03-29 23:47:48.912342 INFO buffer_manager] Allocated weights buffer at (543240192, 57344) [2026-03-29 23:47:48.912344 INFO buffer_manager] Allocated weights buffer at (543297536, 0) [2026-03-29 23:47:48.912345 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=3, cache_slot=3) planned desc only [2026-03-29 23:47:48.948828 INFO buffer_manager] Allocated weights buffer at (543297536, 0) [2026-03-29 23:47:48.948844 INFO buffer_manager] Allocated weights buffer at (543297536, 132120576) [2026-03-29 23:47:48.948846 INFO buffer_manager] Allocated weights buffer at (675418112, 57344) [2026-03-29 23:47:48.948848 INFO buffer_manager] Allocated weights buffer at (675475456, 132120576) [2026-03-29 23:47:48.948849 INFO buffer_manager] Allocated weights buffer at (807596032, 57344) [2026-03-29 23:47:48.948851 INFO buffer_manager] Allocated weights buffer at (807653376, 132120576) [2026-03-29 23:47:48.948852 INFO buffer_manager] Allocated weights buffer at (939773952, 57344) [2026-03-29 23:47:48.948854 INFO buffer_manager] Allocated weights buffer at (939831296, 0) [2026-03-29 23:47:48.948855 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=4, cache_slot=4) planned desc only [2026-03-29 23:47:48.985084 INFO buffer_manager] Allocated weights buffer at (939831296, 0) [2026-03-29 23:47:48.985098 INFO buffer_manager] Allocated weights buffer at (939831296, 132120576) [2026-03-29 23:47:48.985100 INFO buffer_manager] Allocated weights buffer at (1071951872, 57344) [2026-03-29 23:47:48.985102 INFO buffer_manager] Allocated weights buffer at (1072009216, 132120576) [2026-03-29 23:47:48.985103 INFO buffer_manager] Allocated weights buffer at (1204129792, 57344) [2026-03-29 23:47:48.985105 INFO buffer_manager] Allocated weights buffer at (1204187136, 132120576) [2026-03-29 23:47:48.985106 INFO buffer_manager] Allocated weights buffer at (1336307712, 57344) [2026-03-29 23:47:48.985108 INFO buffer_manager] Allocated weights buffer at (1336365056, 0) [2026-03-29 23:47:48.985109 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=5, cache_slot=5) planned desc only [2026-03-29 23:47:49.021428 INFO buffer_manager] Allocated weights buffer at (1336365056, 0) [2026-03-29 23:47:49.021441 INFO buffer_manager] Allocated weights buffer at (1336365056, 132120576) [2026-03-29 23:47:49.021443 INFO buffer_manager] Allocated weights buffer at (1468485632, 57344) [2026-03-29 23:47:49.021445 INFO buffer_manager] Allocated weights buffer at (1468542976, 132120576) [2026-03-29 23:47:49.021446 INFO buffer_manager] Allocated weights buffer at (1600663552, 57344) [2026-03-29 23:47:49.021448 INFO buffer_manager] Allocated weights buffer at (1600720896, 132120576) [2026-03-29 23:47:49.021452 INFO buffer_manager] Allocated weights buffer at (1732841472, 57344) [2026-03-29 23:47:49.021453 INFO buffer_manager] Allocated weights buffer at (1732898816, 0) [2026-03-29 23:47:49.021455 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=6, cache_slot=6) planned desc only [2026-03-29 23:47:49.058022 INFO buffer_manager] Allocated weights buffer at (1732898816, 0) [2026-03-29 23:47:49.058035 INFO buffer_manager] Allocated weights buffer at (1732898816, 132120576) [2026-03-29 23:47:49.058037 INFO buffer_manager] Allocated weights buffer at (1865019392, 57344) [2026-03-29 23:47:49.058039 INFO buffer_manager] Allocated weights buffer at (1865076736, 132120576) [2026-03-29 23:47:49.058040 INFO buffer_manager] Allocated weights buffer at (1997197312, 57344) [2026-03-29 23:47:49.058042 INFO buffer_manager] Allocated weights buffer at (1997254656, 132120576) [2026-03-29 23:47:49.058043 INFO buffer_manager] Allocated weights buffer at (2129375232, 57344) [2026-03-29 23:47:49.058045 INFO buffer_manager] Allocated weights buffer at (2129432576, 0) [2026-03-29 23:47:49.058046 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=7, cache_slot=7) planned desc only [2026-03-29 23:47:49.094502 INFO buffer_manager] Allocated weights buffer at (2129432576, 0) [2026-03-29 23:47:49.094516 INFO buffer_manager] Allocated weights buffer at (2129432576, 132120576) [2026-03-29 23:47:49.094518 INFO buffer_manager] Allocated weights buffer at (2261553152, 57344) [2026-03-29 23:47:49.094519 INFO buffer_manager] Allocated weights buffer at (2261610496, 132120576) [2026-03-29 23:47:49.094521 INFO buffer_manager] Allocated weights buffer at (2393731072, 57344) [2026-03-29 23:47:49.094522 INFO buffer_manager] Allocated weights buffer at (2393788416, 132120576) [2026-03-29 23:47:49.094524 INFO buffer_manager] Allocated weights buffer at (2525908992, 57344) [2026-03-29 23:47:49.094525 INFO buffer_manager] Allocated weights buffer at (2525966336, 0) [2026-03-29 23:47:49.094527 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=8, cache_slot=8) planned desc only [2026-03-29 23:47:49.130767 INFO buffer_manager] Allocated weights buffer at (2525966336, 0) [2026-03-29 23:47:49.130785 INFO buffer_manager] Allocated weights buffer at (2525966336, 132120576) [2026-03-29 23:47:49.130787 INFO buffer_manager] Allocated weights buffer at (2658086912, 57344) [2026-03-29 23:47:49.130788 INFO buffer_manager] Allocated weights buffer at (2658144256, 132120576) [2026-03-29 23:47:49.130790 INFO buffer_manager] Allocated weights buffer at (2790264832, 57344) [2026-03-29 23:47:49.130791 INFO buffer_manager] Allocated weights buffer at (2790322176, 132120576) [2026-03-29 23:47:49.130793 INFO buffer_manager] Allocated weights buffer at (2922442752, 57344) [2026-03-29 23:47:49.130794 INFO buffer_manager] Allocated weights buffer at (2922500096, 0) [2026-03-29 23:47:49.130797 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=9, cache_slot=9) planned desc only [2026-03-29 23:47:49.167107 INFO buffer_manager] Allocated weights buffer at (2922500096, 0) [2026-03-29 23:47:49.167120 INFO buffer_manager] Allocated weights buffer at (2922500096, 132120576) [2026-03-29 23:47:49.167122 INFO buffer_manager] Allocated weights buffer at (3054620672, 57344) [2026-03-29 23:47:49.167124 INFO buffer_manager] Allocated weights buffer at (3054678016, 132120576) [2026-03-29 23:47:49.167125 INFO buffer_manager] Allocated weights buffer at (3186798592, 57344) [2026-03-29 23:47:49.167127 INFO buffer_manager] Allocated weights buffer at (3186855936, 132120576) [2026-03-29 23:47:49.167128 INFO buffer_manager] Allocated weights buffer at (3318976512, 57344) [2026-03-29 23:47:49.167130 INFO buffer_manager] Allocated weights buffer at (3319033856, 0) [2026-03-29 23:47:49.167132 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=10, cache_slot=10) planned desc only [2026-03-29 23:47:49.203388 INFO buffer_manager] Allocated weights buffer at (3319033856, 0) [2026-03-29 23:47:49.203403 INFO buffer_manager] Allocated weights buffer at (3319033856, 132120576) [2026-03-29 23:47:49.203408 INFO buffer_manager] Allocated weights buffer at (3451154432, 57344) [2026-03-29 23:47:49.203410 INFO buffer_manager] Allocated weights buffer at (3451211776, 132120576) [2026-03-29 23:47:49.203411 INFO buffer_manager] Allocated weights buffer at (3583332352, 57344) [2026-03-29 23:47:49.203413 INFO buffer_manager] Allocated weights buffer at (3583389696, 132120576) [2026-03-29 23:47:49.203414 INFO buffer_manager] Allocated weights buffer at (3715510272, 57344) [2026-03-29 23:47:49.203416 INFO buffer_manager] Allocated weights buffer at (3715567616, 0) [2026-03-29 23:47:49.203417 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=11, cache_slot=11) planned desc only [2026-03-29 23:47:49.239684 INFO buffer_manager] Allocated weights buffer at (3715567616, 0) [2026-03-29 23:47:49.239702 INFO buffer_manager] Allocated weights buffer at (3715567616, 132120576) [2026-03-29 23:47:49.239704 INFO buffer_manager] Allocated weights buffer at (3847688192, 57344) [2026-03-29 23:47:49.239706 INFO buffer_manager] Allocated weights buffer at (3847745536, 132120576) [2026-03-29 23:47:49.239707 INFO buffer_manager] Allocated weights buffer at (3979866112, 57344) [2026-03-29 23:47:49.239709 INFO buffer_manager] Allocated weights buffer at (3979923456, 132120576) [2026-03-29 23:47:49.239710 INFO buffer_manager] Allocated weights buffer at (4112044032, 57344) [2026-03-29 23:47:49.239712 INFO buffer_manager] Allocated weights buffer at (4112101376, 0) [2026-03-29 23:47:49.239713 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=12, cache_slot=12) planned desc only [2026-03-29 23:47:49.275956 INFO buffer_manager] Allocated weights buffer at (4112101376, 0) [2026-03-29 23:47:49.275971 INFO buffer_manager] Allocated weights buffer at (4112101376, 132120576) [2026-03-29 23:47:49.275973 INFO buffer_manager] Allocated weights buffer at (4244221952, 57344) [2026-03-29 23:47:49.275974 INFO buffer_manager] Allocated weights buffer at (4244279296, 132120576) [2026-03-29 23:47:49.275976 INFO buffer_manager] Allocated weights buffer at (4376399872, 57344) [2026-03-29 23:47:49.275977 INFO buffer_manager] Allocated weights buffer at (4376457216, 132120576) [2026-03-29 23:47:49.275979 INFO buffer_manager] Allocated weights buffer at (4508577792, 57344) [2026-03-29 23:47:49.275980 INFO buffer_manager] Allocated weights buffer at (4508635136, 0) [2026-03-29 23:47:49.275982 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=13, cache_slot=13) planned desc only [2026-03-29 23:47:49.312143 INFO buffer_manager] Allocated weights buffer at (4508635136, 0) [2026-03-29 23:47:49.312158 INFO buffer_manager] Allocated weights buffer at (4508635136, 132120576) [2026-03-29 23:47:49.312159 INFO buffer_manager] Allocated weights buffer at (4640755712, 57344) [2026-03-29 23:47:49.312161 INFO buffer_manager] Allocated weights buffer at (4640813056, 132120576) [2026-03-29 23:47:49.312163 INFO buffer_manager] Allocated weights buffer at (4772933632, 57344) [2026-03-29 23:47:49.312164 INFO buffer_manager] Allocated weights buffer at (4772990976, 132120576) [2026-03-29 23:47:49.312166 INFO buffer_manager] Allocated weights buffer at (4905111552, 57344) [2026-03-29 23:47:49.312168 INFO buffer_manager] Allocated weights buffer at (4905168896, 0) [2026-03-29 23:47:49.312170 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=14, cache_slot=14) planned desc only [2026-03-29 23:47:49.348517 INFO buffer_manager] Allocated weights buffer at (4905168896, 0) [2026-03-29 23:47:49.348532 INFO buffer_manager] Allocated weights buffer at (4905168896, 132120576) [2026-03-29 23:47:49.348534 INFO buffer_manager] Allocated weights buffer at (5037289472, 57344) [2026-03-29 23:47:49.348535 INFO buffer_manager] Allocated weights buffer at (5037346816, 132120576) [2026-03-29 23:47:49.348537 INFO buffer_manager] Allocated weights buffer at (5169467392, 57344) [2026-03-29 23:47:49.348538 INFO buffer_manager] Allocated weights buffer at (5169524736, 132120576) [2026-03-29 23:47:49.348540 INFO buffer_manager] Allocated weights buffer at (5301645312, 57344) [2026-03-29 23:47:49.348549 INFO buffer_manager] Allocated weights buffer at (5301702656, 0) [2026-03-29 23:47:49.348550 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=15, cache_slot=15) planned desc only [2026-03-29 23:47:49.384964 INFO buffer_manager] Allocated weights buffer at (5301702656, 0) [2026-03-29 23:47:49.384984 INFO buffer_manager] Allocated weights buffer at (5301702656, 132120576) [2026-03-29 23:47:49.384986 INFO buffer_manager] Allocated weights buffer at (5433823232, 57344) [2026-03-29 23:47:49.384987 INFO buffer_manager] Allocated weights buffer at (5433880576, 132120576) [2026-03-29 23:47:49.384989 INFO buffer_manager] Allocated weights buffer at (5566001152, 57344) [2026-03-29 23:47:49.384991 INFO buffer_manager] Allocated weights buffer at (5566058496, 132120576) [2026-03-29 23:47:49.384992 INFO buffer_manager] Allocated weights buffer at (5698179072, 57344) [2026-03-29 23:47:49.384993 INFO buffer_manager] Allocated weights buffer at (5698236416, 0) [2026-03-29 23:47:49.384995 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=16, cache_slot=16) planned desc only [2026-03-29 23:47:49.421227 INFO buffer_manager] Allocated weights buffer at (5698236416, 0) [2026-03-29 23:47:49.421241 INFO buffer_manager] Allocated weights buffer at (5698236416, 132120576) [2026-03-29 23:47:49.421243 INFO buffer_manager] Allocated weights buffer at (5830356992, 57344) [2026-03-29 23:47:49.421245 INFO buffer_manager] Allocated weights buffer at (5830414336, 132120576) [2026-03-29 23:47:49.421246 INFO buffer_manager] Allocated weights buffer at (5962534912, 57344) [2026-03-29 23:47:49.421248 INFO buffer_manager] Allocated weights buffer at (5962592256, 132120576) [2026-03-29 23:47:49.421249 INFO buffer_manager] Allocated weights buffer at (6094712832, 57344) [2026-03-29 23:47:49.421251 INFO buffer_manager] Allocated weights buffer at (6094770176, 0) [2026-03-29 23:47:49.421252 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=17, cache_slot=17) planned desc only [2026-03-29 23:47:49.457427 INFO buffer_manager] Allocated weights buffer at (6094770176, 0) [2026-03-29 23:47:49.457440 INFO buffer_manager] Allocated weights buffer at (6094770176, 132120576) [2026-03-29 23:47:49.457443 INFO buffer_manager] Allocated weights buffer at (6226890752, 57344) [2026-03-29 23:47:49.457444 INFO buffer_manager] Allocated weights buffer at (6226948096, 132120576) [2026-03-29 23:47:49.457446 INFO buffer_manager] Allocated weights buffer at (6359068672, 57344) [2026-03-29 23:47:49.457447 INFO buffer_manager] Allocated weights buffer at (6359126016, 132120576) [2026-03-29 23:47:49.457448 INFO buffer_manager] Allocated weights buffer at (6491246592, 57344) [2026-03-29 23:47:49.457450 INFO buffer_manager] Allocated weights buffer at (6491303936, 0) [2026-03-29 23:47:49.457451 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=18, cache_slot=18) planned desc only [2026-03-29 23:47:49.493735 INFO buffer_manager] Allocated weights buffer at (6491303936, 0) [2026-03-29 23:47:49.493752 INFO buffer_manager] Allocated weights buffer at (6491303936, 132120576) [2026-03-29 23:47:49.493754 INFO buffer_manager] Allocated weights buffer at (6623424512, 57344) [2026-03-29 23:47:49.493756 INFO buffer_manager] Allocated weights buffer at (6623481856, 132120576) [2026-03-29 23:47:49.493757 INFO buffer_manager] Allocated weights buffer at (6755602432, 57344) [2026-03-29 23:47:49.493759 INFO buffer_manager] Allocated weights buffer at (6755659776, 132120576) [2026-03-29 23:47:49.493760 INFO buffer_manager] Allocated weights buffer at (6887780352, 57344) [2026-03-29 23:47:49.493762 INFO buffer_manager] Allocated weights buffer at (6887837696, 0) [2026-03-29 23:47:49.493763 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=19, cache_slot=19) planned desc only [2026-03-29 23:47:49.530186 INFO buffer_manager] Allocated weights buffer at (6887837696, 0) [2026-03-29 23:47:49.530201 INFO buffer_manager] Allocated weights buffer at (6887837696, 132120576) [2026-03-29 23:47:49.530205 INFO buffer_manager] Allocated weights buffer at (7019958272, 57344) [2026-03-29 23:47:49.530207 INFO buffer_manager] Allocated weights buffer at (7020015616, 132120576) [2026-03-29 23:47:49.530208 INFO buffer_manager] Allocated weights buffer at (7152136192, 57344) [2026-03-29 23:47:49.530210 INFO buffer_manager] Allocated weights buffer at (7152193536, 132120576) [2026-03-29 23:47:49.530212 INFO buffer_manager] Allocated weights buffer at (7284314112, 57344) [2026-03-29 23:47:49.530215 INFO buffer_manager] Allocated weights buffer at (7284371456, 0) [2026-03-29 23:47:49.530216 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=20, cache_slot=20) planned desc only [2026-03-29 23:47:49.566532 INFO buffer_manager] Allocated weights buffer at (7284371456, 0) [2026-03-29 23:47:49.566553 INFO buffer_manager] Allocated weights buffer at (7284371456, 132120576) [2026-03-29 23:47:49.566555 INFO buffer_manager] Allocated weights buffer at (7416492032, 57344) [2026-03-29 23:47:49.566557 INFO buffer_manager] Allocated weights buffer at (7416549376, 132120576) [2026-03-29 23:47:49.566558 INFO buffer_manager] Allocated weights buffer at (7548669952, 57344) [2026-03-29 23:47:49.566560 INFO buffer_manager] Allocated weights buffer at (7548727296, 132120576) [2026-03-29 23:47:49.566561 INFO buffer_manager] Allocated weights buffer at (7680847872, 57344) [2026-03-29 23:47:49.566564 INFO buffer_manager] Allocated weights buffer at (7680905216, 0) [2026-03-29 23:47:49.566566 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=21, cache_slot=21) planned desc only [2026-03-29 23:47:49.602898 INFO buffer_manager] Allocated weights buffer at (7680905216, 0) [2026-03-29 23:47:49.602915 INFO buffer_manager] Allocated weights buffer at (7680905216, 132120576) [2026-03-29 23:47:49.602917 INFO buffer_manager] Allocated weights buffer at (7813025792, 57344) [2026-03-29 23:47:49.602919 INFO buffer_manager] Allocated weights buffer at (7813083136, 132120576) [2026-03-29 23:47:49.602921 INFO buffer_manager] Allocated weights buffer at (7945203712, 57344) [2026-03-29 23:47:49.602922 INFO buffer_manager] Allocated weights buffer at (7945261056, 132120576) [2026-03-29 23:47:49.602924 INFO buffer_manager] Allocated weights buffer at (8077381632, 57344) [2026-03-29 23:47:49.602925 INFO buffer_manager] Allocated weights buffer at (8077438976, 0) [2026-03-29 23:47:49.602927 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=22, cache_slot=22) planned desc only [2026-03-29 23:47:49.639398 INFO buffer_manager] Allocated weights buffer at (8077438976, 0) [2026-03-29 23:47:49.639419 INFO buffer_manager] Allocated weights buffer at (8077438976, 132120576) [2026-03-29 23:47:49.639422 INFO buffer_manager] Allocated weights buffer at (8209559552, 57344) [2026-03-29 23:47:49.639423 INFO buffer_manager] Allocated weights buffer at (8209616896, 132120576) [2026-03-29 23:47:49.639425 INFO buffer_manager] Allocated weights buffer at (8341737472, 57344) [2026-03-29 23:47:49.639427 INFO buffer_manager] Allocated weights buffer at (8341794816, 132120576) [2026-03-29 23:47:49.639429 INFO buffer_manager] Allocated weights buffer at (8473915392, 57344) [2026-03-29 23:47:49.639430 INFO buffer_manager] Allocated weights buffer at (8473972736, 0) [2026-03-29 23:47:49.639432 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=23, cache_slot=23) planned desc only [2026-03-29 23:47:49.675843 INFO buffer_manager] Allocated weights buffer at (8473972736, 0) [2026-03-29 23:47:49.675857 INFO buffer_manager] Allocated weights buffer at (8473972736, 132120576) [2026-03-29 23:47:49.675859 INFO buffer_manager] Allocated weights buffer at (8606093312, 57344) [2026-03-29 23:47:49.675861 INFO buffer_manager] Allocated weights buffer at (8606150656, 132120576) [2026-03-29 23:47:49.675862 INFO buffer_manager] Allocated weights buffer at (8738271232, 57344) [2026-03-29 23:47:49.675864 INFO buffer_manager] Allocated weights buffer at (8738328576, 132120576) [2026-03-29 23:47:49.675865 INFO buffer_manager] Allocated weights buffer at (8870449152, 57344) [2026-03-29 23:47:49.675870 INFO buffer_manager] Allocated weights buffer at (8870506496, 0) [2026-03-29 23:47:49.675871 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=24, cache_slot=24) planned desc only [2026-03-29 23:47:49.712265 INFO buffer_manager] Allocated weights buffer at (8870506496, 0) [2026-03-29 23:47:49.712280 INFO buffer_manager] Allocated weights buffer at (8870506496, 132120576) [2026-03-29 23:47:49.712282 INFO buffer_manager] Allocated weights buffer at (9002627072, 57344) [2026-03-29 23:47:49.712283 INFO buffer_manager] Allocated weights buffer at (9002684416, 132120576) [2026-03-29 23:47:49.712285 INFO buffer_manager] Allocated weights buffer at (9134804992, 57344) [2026-03-29 23:47:49.712286 INFO buffer_manager] Allocated weights buffer at (9134862336, 132120576) [2026-03-29 23:47:49.712288 INFO buffer_manager] Allocated weights buffer at (9266982912, 57344) [2026-03-29 23:47:49.712289 INFO buffer_manager] Allocated weights buffer at (9267040256, 0) [2026-03-29 23:47:49.712291 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=25, cache_slot=25) planned desc only [2026-03-29 23:47:49.748523 INFO buffer_manager] Allocated weights buffer at (9267040256, 0) [2026-03-29 23:47:49.748542 INFO buffer_manager] Allocated weights buffer at (9267040256, 132120576) [2026-03-29 23:47:49.748544 INFO buffer_manager] Allocated weights buffer at (9399160832, 57344) [2026-03-29 23:47:49.748546 INFO buffer_manager] Allocated weights buffer at (9399218176, 132120576) [2026-03-29 23:47:49.748547 INFO buffer_manager] Allocated weights buffer at (9531338752, 57344) [2026-03-29 23:47:49.748549 INFO buffer_manager] Allocated weights buffer at (9531396096, 132120576) [2026-03-29 23:47:49.748551 INFO buffer_manager] Allocated weights buffer at (9663516672, 57344) [2026-03-29 23:47:49.748553 INFO buffer_manager] Allocated weights buffer at (9663574016, 0) [2026-03-29 23:47:49.748554 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=26, cache_slot=26) planned desc only [2026-03-29 23:47:49.784899 INFO buffer_manager] Allocated weights buffer at (9663574016, 0) [2026-03-29 23:47:49.784913 INFO buffer_manager] Allocated weights buffer at (9663574016, 132120576) [2026-03-29 23:47:49.784915 INFO buffer_manager] Allocated weights buffer at (9795694592, 57344) [2026-03-29 23:47:49.784916 INFO buffer_manager] Allocated weights buffer at (9795751936, 132120576) [2026-03-29 23:47:49.784917 INFO buffer_manager] Allocated weights buffer at (9927872512, 57344) [2026-03-29 23:47:49.784919 INFO buffer_manager] Allocated weights buffer at (9927929856, 132120576) [2026-03-29 23:47:49.784921 INFO buffer_manager] Allocated weights buffer at (10060050432, 57344) [2026-03-29 23:47:49.784922 INFO buffer_manager] Allocated weights buffer at (10060107776, 0) [2026-03-29 23:47:49.784924 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=27, cache_slot=27) planned desc only [2026-03-29 23:47:49.821231 INFO buffer_manager] Allocated weights buffer at (10060107776, 0) [2026-03-29 23:47:49.821246 INFO buffer_manager] Allocated weights buffer at (10060107776, 132120576) [2026-03-29 23:47:49.821247 INFO buffer_manager] Allocated weights buffer at (10192228352, 57344) [2026-03-29 23:47:49.821249 INFO buffer_manager] Allocated weights buffer at (10192285696, 132120576) [2026-03-29 23:47:49.821251 INFO buffer_manager] Allocated weights buffer at (10324406272, 57344) [2026-03-29 23:47:49.821252 INFO buffer_manager] Allocated weights buffer at (10324463616, 132120576) [2026-03-29 23:47:49.821254 INFO buffer_manager] Allocated weights buffer at (10456584192, 57344) [2026-03-29 23:47:49.821255 INFO buffer_manager] Allocated weights buffer at (10456641536, 0) [2026-03-29 23:47:49.821257 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=28, cache_slot=28) planned desc only [2026-03-29 23:47:49.857569 INFO buffer_manager] Allocated weights buffer at (10456641536, 0) [2026-03-29 23:47:49.857584 INFO buffer_manager] Allocated weights buffer at (10456641536, 132120576) [2026-03-29 23:47:49.857593 INFO buffer_manager] Allocated weights buffer at (10588762112, 57344) [2026-03-29 23:47:49.857595 INFO buffer_manager] Allocated weights buffer at (10588819456, 132120576) [2026-03-29 23:47:49.857597 INFO buffer_manager] Allocated weights buffer at (10720940032, 57344) [2026-03-29 23:47:49.857599 INFO buffer_manager] Allocated weights buffer at (10720997376, 132120576) [2026-03-29 23:47:49.857601 INFO buffer_manager] Allocated weights buffer at (10853117952, 57344) [2026-03-29 23:47:49.857603 INFO buffer_manager] Allocated weights buffer at (10853175296, 0) [2026-03-29 23:47:49.857604 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=29, cache_slot=29) planned desc only [2026-03-29 23:47:49.893866 INFO buffer_manager] Allocated weights buffer at (10853175296, 0) [2026-03-29 23:47:49.893880 INFO buffer_manager] Allocated weights buffer at (10853175296, 132120576) [2026-03-29 23:47:49.893882 INFO buffer_manager] Allocated weights buffer at (10985295872, 57344) [2026-03-29 23:47:49.893883 INFO buffer_manager] Allocated weights buffer at (10985353216, 132120576) [2026-03-29 23:47:49.893885 INFO buffer_manager] Allocated weights buffer at (11117473792, 57344) [2026-03-29 23:47:49.893886 INFO buffer_manager] Allocated weights buffer at (11117531136, 132120576) [2026-03-29 23:47:49.893888 INFO buffer_manager] Allocated weights buffer at (11249651712, 57344) [2026-03-29 23:47:49.893889 INFO buffer_manager] Allocated weights buffer at (11249709056, 0) [2026-03-29 23:47:49.893891 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=30, cache_slot=30) planned desc only [2026-03-29 23:47:49.930188 INFO buffer_manager] Allocated weights buffer at (11249709056, 0) [2026-03-29 23:47:49.930202 INFO buffer_manager] Allocated weights buffer at (11249709056, 132120576) [2026-03-29 23:47:49.930204 INFO buffer_manager] Allocated weights buffer at (11381829632, 57344) [2026-03-29 23:47:49.930206 INFO buffer_manager] Allocated weights buffer at (11381886976, 132120576) [2026-03-29 23:47:49.930207 INFO buffer_manager] Allocated weights buffer at (11514007552, 57344) [2026-03-29 23:47:49.930209 INFO buffer_manager] Allocated weights buffer at (11514064896, 132120576) [2026-03-29 23:47:49.930210 INFO buffer_manager] Allocated weights buffer at (11646185472, 57344) [2026-03-29 23:47:49.930212 INFO buffer_manager] Allocated weights buffer at (11646242816, 0) [2026-03-29 23:47:49.930213 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=31, cache_slot=31) planned desc only [2026-03-29 23:47:49.966520 INFO buffer_manager] Allocated weights buffer at (11646242816, 0) [2026-03-29 23:47:49.966534 INFO buffer_manager] Allocated weights buffer at (11646242816, 132120576) [2026-03-29 23:47:49.966535 INFO buffer_manager] Allocated weights buffer at (11778363392, 57344) [2026-03-29 23:47:49.966537 INFO buffer_manager] Allocated weights buffer at (11778420736, 132120576) [2026-03-29 23:47:49.966538 INFO buffer_manager] Allocated weights buffer at (11910541312, 57344) [2026-03-29 23:47:49.966540 INFO buffer_manager] Allocated weights buffer at (11910598656, 132120576) [2026-03-29 23:47:49.966542 INFO buffer_manager] Allocated weights buffer at (12042719232, 57344) [2026-03-29 23:47:49.966543 INFO buffer_manager] Allocated weights buffer at (12042776576, 0) [2026-03-29 23:47:49.966545 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=32, cache_slot=32) planned desc only [2026-03-29 23:47:50.002765 INFO buffer_manager] Allocated weights buffer at (12042776576, 0) [2026-03-29 23:47:50.002783 INFO buffer_manager] Allocated weights buffer at (12042776576, 132120576) [2026-03-29 23:47:50.002785 INFO buffer_manager] Allocated weights buffer at (12174897152, 57344) [2026-03-29 23:47:50.002787 INFO buffer_manager] Allocated weights buffer at (12174954496, 132120576) [2026-03-29 23:47:50.002788 INFO buffer_manager] Allocated weights buffer at (12307075072, 57344) [2026-03-29 23:47:50.002790 INFO buffer_manager] Allocated weights buffer at (12307132416, 132120576) [2026-03-29 23:47:50.002791 INFO buffer_manager] Allocated weights buffer at (12439252992, 57344) [2026-03-29 23:47:50.002796 INFO buffer_manager] Allocated weights buffer at (12439310336, 0) [2026-03-29 23:47:50.002799 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=33, cache_slot=33) planned desc only [2026-03-29 23:47:50.039083 INFO buffer_manager] Allocated weights buffer at (12439310336, 0) [2026-03-29 23:47:50.039099 INFO buffer_manager] Allocated weights buffer at (12439310336, 132120576) [2026-03-29 23:47:50.039101 INFO buffer_manager] Allocated weights buffer at (12571430912, 57344) [2026-03-29 23:47:50.039102 INFO buffer_manager] Allocated weights buffer at (12571488256, 132120576) [2026-03-29 23:47:50.039104 INFO buffer_manager] Allocated weights buffer at (12703608832, 57344) [2026-03-29 23:47:50.039105 INFO buffer_manager] Allocated weights buffer at (12703666176, 132120576) [2026-03-29 23:47:50.039107 INFO buffer_manager] Allocated weights buffer at (12835786752, 57344) [2026-03-29 23:47:50.039108 INFO buffer_manager] Allocated weights buffer at (12835844096, 0) [2026-03-29 23:47:50.039110 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=34, cache_slot=34) planned desc only [2026-03-29 23:47:50.075614 INFO buffer_manager] Allocated weights buffer at (12835844096, 0) [2026-03-29 23:47:50.075634 INFO buffer_manager] Allocated weights buffer at (12835844096, 132120576) [2026-03-29 23:47:50.075636 INFO buffer_manager] Allocated weights buffer at (12967964672, 57344) [2026-03-29 23:47:50.075637 INFO buffer_manager] Allocated weights buffer at (12968022016, 132120576) [2026-03-29 23:47:50.075638 INFO buffer_manager] Allocated weights buffer at (13100142592, 57344) [2026-03-29 23:47:50.075640 INFO buffer_manager] Allocated weights buffer at (13100199936, 132120576) [2026-03-29 23:47:50.075642 INFO buffer_manager] Allocated weights buffer at (13232320512, 57344) [2026-03-29 23:47:50.075643 INFO buffer_manager] Allocated weights buffer at (13232377856, 0) [2026-03-29 23:47:50.075645 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=35, cache_slot=35) planned desc only [2026-03-29 23:47:50.112228 INFO buffer_manager] Allocated weights buffer at (13232377856, 0) [2026-03-29 23:47:50.112244 INFO buffer_manager] Allocated weights buffer at (13232377856, 132120576) [2026-03-29 23:47:50.112247 INFO buffer_manager] Allocated weights buffer at (13364498432, 57344) [2026-03-29 23:47:50.112248 INFO buffer_manager] Allocated weights buffer at (13364555776, 132120576) [2026-03-29 23:47:50.112250 INFO buffer_manager] Allocated weights buffer at (13496676352, 57344) [2026-03-29 23:47:50.112251 INFO buffer_manager] Allocated weights buffer at (13496733696, 132120576) [2026-03-29 23:47:50.112253 INFO buffer_manager] Allocated weights buffer at (13628854272, 57344) [2026-03-29 23:47:50.112254 INFO buffer_manager] Allocated weights buffer at (13628911616, 0) [2026-03-29 23:47:50.112256 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=36, cache_slot=36) planned desc only [2026-03-29 23:47:50.148718 INFO buffer_manager] Allocated weights buffer at (13628911616, 0) [2026-03-29 23:47:50.148734 INFO buffer_manager] Allocated weights buffer at (13628911616, 132120576) [2026-03-29 23:47:50.148736 INFO buffer_manager] Allocated weights buffer at (13761032192, 57344) [2026-03-29 23:47:50.148738 INFO buffer_manager] Allocated weights buffer at (13761089536, 132120576) [2026-03-29 23:47:50.148739 INFO buffer_manager] Allocated weights buffer at (13893210112, 57344) [2026-03-29 23:47:50.148741 INFO buffer_manager] Allocated weights buffer at (13893267456, 132120576) [2026-03-29 23:47:50.148742 INFO buffer_manager] Allocated weights buffer at (14025388032, 57344) [2026-03-29 23:47:50.148744 INFO buffer_manager] Allocated weights buffer at (14025445376, 0) [2026-03-29 23:47:50.148745 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=37, cache_slot=37) planned desc only [2026-03-29 23:47:50.184974 INFO buffer_manager] Allocated weights buffer at (14025445376, 0) [2026-03-29 23:47:50.184988 INFO buffer_manager] Allocated weights buffer at (14025445376, 132120576) [2026-03-29 23:47:50.184994 INFO buffer_manager] Allocated weights buffer at (14157565952, 57344) [2026-03-29 23:47:50.184996 INFO buffer_manager] Allocated weights buffer at (14157623296, 132120576) [2026-03-29 23:47:50.184997 INFO buffer_manager] Allocated weights buffer at (14289743872, 57344) [2026-03-29 23:47:50.184999 INFO buffer_manager] Allocated weights buffer at (14289801216, 132120576) [2026-03-29 23:47:50.185000 INFO buffer_manager] Allocated weights buffer at (14421921792, 57344) [2026-03-29 23:47:50.185002 INFO buffer_manager] Allocated weights buffer at (14421979136, 0) [2026-03-29 23:47:50.185004 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=38, cache_slot=38) planned desc only [2026-03-29 23:47:50.221375 INFO buffer_manager] Allocated weights buffer at (14421979136, 0) [2026-03-29 23:47:50.221390 INFO buffer_manager] Allocated weights buffer at (14421979136, 132120576) [2026-03-29 23:47:50.221392 INFO buffer_manager] Allocated weights buffer at (14554099712, 57344) [2026-03-29 23:47:50.221393 INFO buffer_manager] Allocated weights buffer at (14554157056, 132120576) [2026-03-29 23:47:50.221395 INFO buffer_manager] Allocated weights buffer at (14686277632, 57344) [2026-03-29 23:47:50.221396 INFO buffer_manager] Allocated weights buffer at (14686334976, 132120576) [2026-03-29 23:47:50.221398 INFO buffer_manager] Allocated weights buffer at (14818455552, 57344) [2026-03-29 23:47:50.221399 INFO buffer_manager] Allocated weights buffer at (14818512896, 0) [2026-03-29 23:47:50.221401 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=39, cache_slot=39) planned desc only [2026-03-29 23:47:50.257611 INFO buffer_manager] Allocated weights buffer at (14818512896, 0) [2026-03-29 23:47:50.257626 INFO buffer_manager] Allocated weights buffer at (14818512896, 132120576) [2026-03-29 23:47:50.257628 INFO buffer_manager] Allocated weights buffer at (14950633472, 57344) [2026-03-29 23:47:50.257629 INFO buffer_manager] Allocated weights buffer at (14950690816, 132120576) [2026-03-29 23:47:50.257631 INFO buffer_manager] Allocated weights buffer at (15082811392, 57344) [2026-03-29 23:47:50.257632 INFO buffer_manager] Allocated weights buffer at (15082868736, 132120576) [2026-03-29 23:47:50.257634 INFO buffer_manager] Allocated weights buffer at (15214989312, 57344) [2026-03-29 23:47:50.257635 INFO buffer_manager] Allocated weights buffer at (15215046656, 0) [2026-03-29 23:47:50.257641 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=40, cache_slot=40) planned desc only [2026-03-29 23:47:50.293789 INFO buffer_manager] Allocated weights buffer at (15215046656, 0) [2026-03-29 23:47:50.293803 INFO buffer_manager] Allocated weights buffer at (15215046656, 132120576) [2026-03-29 23:47:50.293805 INFO buffer_manager] Allocated weights buffer at (15347167232, 57344) [2026-03-29 23:47:50.293807 INFO buffer_manager] Allocated weights buffer at (15347224576, 132120576) [2026-03-29 23:47:50.293808 INFO buffer_manager] Allocated weights buffer at (15479345152, 57344) [2026-03-29 23:47:50.293810 INFO buffer_manager] Allocated weights buffer at (15479402496, 132120576) [2026-03-29 23:47:50.293811 INFO buffer_manager] Allocated weights buffer at (15611523072, 57344) [2026-03-29 23:47:50.293813 INFO buffer_manager] Allocated weights buffer at (15611580416, 0) [2026-03-29 23:47:50.293815 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=41, cache_slot=41) planned desc only [2026-03-29 23:47:50.329975 INFO buffer_manager] Allocated weights buffer at (15611580416, 0) [2026-03-29 23:47:50.329989 INFO buffer_manager] Allocated weights buffer at (15611580416, 132120576) [2026-03-29 23:47:50.329991 INFO buffer_manager] Allocated weights buffer at (15743700992, 57344) [2026-03-29 23:47:50.329992 INFO buffer_manager] Allocated weights buffer at (15743758336, 132120576) [2026-03-29 23:47:50.329994 INFO buffer_manager] Allocated weights buffer at (15875878912, 57344) [2026-03-29 23:47:50.329996 INFO buffer_manager] Allocated weights buffer at (15875936256, 132120576) [2026-03-29 23:47:50.329997 INFO buffer_manager] Allocated weights buffer at (16008056832, 57344) [2026-03-29 23:47:50.330001 INFO buffer_manager] Allocated weights buffer at (16008114176, 0) [2026-03-29 23:47:50.330003 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=42, cache_slot=42) planned desc only [2026-03-29 23:47:50.366283 INFO buffer_manager] Allocated weights buffer at (16008114176, 0) [2026-03-29 23:47:50.366298 INFO buffer_manager] Allocated weights buffer at (16008114176, 132120576) [2026-03-29 23:47:50.366301 INFO buffer_manager] Allocated weights buffer at (16140234752, 57344) [2026-03-29 23:47:50.366302 INFO buffer_manager] Allocated weights buffer at (16140292096, 132120576) [2026-03-29 23:47:50.366304 INFO buffer_manager] Allocated weights buffer at (16272412672, 57344) [2026-03-29 23:47:50.366305 INFO buffer_manager] Allocated weights buffer at (16272470016, 132120576) [2026-03-29 23:47:50.366307 INFO buffer_manager] Allocated weights buffer at (16404590592, 57344) [2026-03-29 23:47:50.366308 INFO buffer_manager] Allocated weights buffer at (16404647936, 0) [2026-03-29 23:47:50.366310 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=43, cache_slot=43) planned desc only [2026-03-29 23:47:50.402756 INFO buffer_manager] Allocated weights buffer at (16404647936, 0) [2026-03-29 23:47:50.402779 INFO buffer_manager] Allocated weights buffer at (16404647936, 132120576) [2026-03-29 23:47:50.402781 INFO buffer_manager] Allocated weights buffer at (16536768512, 57344) [2026-03-29 23:47:50.402783 INFO buffer_manager] Allocated weights buffer at (16536825856, 132120576) [2026-03-29 23:47:50.402784 INFO buffer_manager] Allocated weights buffer at (16668946432, 57344) [2026-03-29 23:47:50.402786 INFO buffer_manager] Allocated weights buffer at (16669003776, 132120576) [2026-03-29 23:47:50.402787 INFO buffer_manager] Allocated weights buffer at (16801124352, 57344) [2026-03-29 23:47:50.402793 INFO buffer_manager] Allocated weights buffer at (16801181696, 0) [2026-03-29 23:47:50.402795 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=44, cache_slot=44) planned desc only [2026-03-29 23:47:50.438952 INFO buffer_manager] Allocated weights buffer at (16801181696, 0) [2026-03-29 23:47:50.438968 INFO buffer_manager] Allocated weights buffer at (16801181696, 132120576) [2026-03-29 23:47:50.438970 INFO buffer_manager] Allocated weights buffer at (16933302272, 57344) [2026-03-29 23:47:50.438971 INFO buffer_manager] Allocated weights buffer at (16933359616, 132120576) [2026-03-29 23:47:50.438973 INFO buffer_manager] Allocated weights buffer at (17065480192, 57344) [2026-03-29 23:47:50.438974 INFO buffer_manager] Allocated weights buffer at (17065537536, 132120576) [2026-03-29 23:47:50.438976 INFO buffer_manager] Allocated weights buffer at (17197658112, 57344) [2026-03-29 23:47:50.438977 INFO buffer_manager] Allocated weights buffer at (17197715456, 0) [2026-03-29 23:47:50.438979 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=45, cache_slot=45) planned desc only [2026-03-29 23:47:50.475109 INFO buffer_manager] Allocated weights buffer at (17197715456, 0) [2026-03-29 23:47:50.475123 INFO buffer_manager] Allocated weights buffer at (17197715456, 132120576) [2026-03-29 23:47:50.475125 INFO buffer_manager] Allocated weights buffer at (17329836032, 57344) [2026-03-29 23:47:50.475127 INFO buffer_manager] Allocated weights buffer at (17329893376, 132120576) [2026-03-29 23:47:50.475128 INFO buffer_manager] Allocated weights buffer at (17462013952, 57344) [2026-03-29 23:47:50.475130 INFO buffer_manager] Allocated weights buffer at (17462071296, 132120576) [2026-03-29 23:47:50.475131 INFO buffer_manager] Allocated weights buffer at (17594191872, 57344) [2026-03-29 23:47:50.475132 INFO buffer_manager] Allocated weights buffer at (17594249216, 0) [2026-03-29 23:47:50.475134 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=46, cache_slot=46) planned desc only [2026-03-29 23:47:50.511317 INFO buffer_manager] Allocated weights buffer at (17594249216, 0) [2026-03-29 23:47:50.511330 INFO buffer_manager] Allocated weights buffer at (17594249216, 132120576) [2026-03-29 23:47:50.511335 INFO buffer_manager] Allocated weights buffer at (17726369792, 57344) [2026-03-29 23:47:50.511336 INFO buffer_manager] Allocated weights buffer at (17726427136, 132120576) [2026-03-29 23:47:50.511338 INFO buffer_manager] Allocated weights buffer at (17858547712, 57344) [2026-03-29 23:47:50.511339 INFO buffer_manager] Allocated weights buffer at (17858605056, 132120576) [2026-03-29 23:47:50.511341 INFO buffer_manager] Allocated weights buffer at (17990725632, 57344) [2026-03-29 23:47:50.511342 INFO buffer_manager] Allocated weights buffer at (17990782976, 0) [2026-03-29 23:47:50.511344 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=47, cache_slot=47) planned desc only [2026-03-29 23:47:50.547602 INFO buffer_manager] Allocated weights buffer at (17990782976, 0) [2026-03-29 23:47:50.547618 INFO buffer_manager] Allocated weights buffer at (17990782976, 132120576) [2026-03-29 23:47:50.547620 INFO buffer_manager] Allocated weights buffer at (18122903552, 57344) [2026-03-29 23:47:50.547621 INFO buffer_manager] Allocated weights buffer at (18122960896, 132120576) [2026-03-29 23:47:50.547623 INFO buffer_manager] Allocated weights buffer at (18255081472, 57344) [2026-03-29 23:47:50.547624 INFO buffer_manager] Allocated weights buffer at (18255138816, 132120576) [2026-03-29 23:47:50.547626 INFO buffer_manager] Allocated weights buffer at (18387259392, 57344) [2026-03-29 23:47:50.547628 INFO buffer_manager] Allocated weights buffer at (18387316736, 0) [2026-03-29 23:47:50.547629 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=48, cache_slot=48) planned desc only [2026-03-29 23:47:50.583898 INFO buffer_manager] Allocated weights buffer at (18387316736, 0) [2026-03-29 23:47:50.583913 INFO buffer_manager] Allocated weights buffer at (18387316736, 132120576) [2026-03-29 23:47:50.583915 INFO buffer_manager] Allocated weights buffer at (18519437312, 57344) [2026-03-29 23:47:50.583917 INFO buffer_manager] Allocated weights buffer at (18519494656, 132120576) [2026-03-29 23:47:50.583918 INFO buffer_manager] Allocated weights buffer at (18651615232, 57344) [2026-03-29 23:47:50.583920 INFO buffer_manager] Allocated weights buffer at (18651672576, 132120576) [2026-03-29 23:47:50.583921 INFO buffer_manager] Allocated weights buffer at (18783793152, 57344) [2026-03-29 23:47:50.583923 INFO buffer_manager] Allocated weights buffer at (18783850496, 0) [2026-03-29 23:47:50.583924 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=49, cache_slot=49) planned desc only [2026-03-29 23:47:50.620158 INFO buffer_manager] Allocated weights buffer at (18783850496, 0) [2026-03-29 23:47:50.620172 INFO buffer_manager] Allocated weights buffer at (18783850496, 132120576) [2026-03-29 23:47:50.620174 INFO buffer_manager] Allocated weights buffer at (18915971072, 57344) [2026-03-29 23:47:50.620176 INFO buffer_manager] Allocated weights buffer at (18916028416, 132120576) [2026-03-29 23:47:50.620178 INFO buffer_manager] Allocated weights buffer at (19048148992, 57344) [2026-03-29 23:47:50.620179 INFO buffer_manager] Allocated weights buffer at (19048206336, 132120576) [2026-03-29 23:47:50.620181 INFO buffer_manager] Allocated weights buffer at (19180326912, 57344) [2026-03-29 23:47:50.620182 INFO buffer_manager] Allocated weights buffer at (19180384256, 0) [2026-03-29 23:47:50.620184 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=50, cache_slot=50) planned desc only [2026-03-29 23:47:50.656396 INFO buffer_manager] Allocated weights buffer at (19180384256, 0) [2026-03-29 23:47:50.656410 INFO buffer_manager] Allocated weights buffer at (19180384256, 132120576) [2026-03-29 23:47:50.656411 INFO buffer_manager] Allocated weights buffer at (19312504832, 57344) [2026-03-29 23:47:50.656413 INFO buffer_manager] Allocated weights buffer at (19312562176, 132120576) [2026-03-29 23:47:50.656415 INFO buffer_manager] Allocated weights buffer at (19444682752, 57344) [2026-03-29 23:47:50.656416 INFO buffer_manager] Allocated weights buffer at (19444740096, 132120576) [2026-03-29 23:47:50.656420 INFO buffer_manager] Allocated weights buffer at (19576860672, 57344) [2026-03-29 23:47:50.656421 INFO buffer_manager] Allocated weights buffer at (19576918016, 0) [2026-03-29 23:47:50.656423 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=51, cache_slot=51) planned desc only [2026-03-29 23:47:50.692729 INFO buffer_manager] Allocated weights buffer at (19576918016, 0) [2026-03-29 23:47:50.692743 INFO buffer_manager] Allocated weights buffer at (19576918016, 132120576) [2026-03-29 23:47:50.692745 INFO buffer_manager] Allocated weights buffer at (19709038592, 57344) [2026-03-29 23:47:50.692747 INFO buffer_manager] Allocated weights buffer at (19709095936, 132120576) [2026-03-29 23:47:50.692748 INFO buffer_manager] Allocated weights buffer at (19841216512, 57344) [2026-03-29 23:47:50.692750 INFO buffer_manager] Allocated weights buffer at (19841273856, 132120576) [2026-03-29 23:47:50.692751 INFO buffer_manager] Allocated weights buffer at (19973394432, 57344) [2026-03-29 23:47:50.692753 INFO buffer_manager] Allocated weights buffer at (19973451776, 0) [2026-03-29 23:47:50.692754 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=52, cache_slot=52) planned desc only [2026-03-29 23:47:50.728989 INFO buffer_manager] Allocated weights buffer at (19973451776, 0) [2026-03-29 23:47:50.729005 INFO buffer_manager] Allocated weights buffer at (19973451776, 132120576) [2026-03-29 23:47:50.729007 INFO buffer_manager] Allocated weights buffer at (20105572352, 57344) [2026-03-29 23:47:50.729009 INFO buffer_manager] Allocated weights buffer at (20105629696, 132120576) [2026-03-29 23:47:50.729010 INFO buffer_manager] Allocated weights buffer at (20237750272, 57344) [2026-03-29 23:47:50.729012 INFO buffer_manager] Allocated weights buffer at (20237807616, 132120576) [2026-03-29 23:47:50.729013 INFO buffer_manager] Allocated weights buffer at (20369928192, 57344) [2026-03-29 23:47:50.729015 INFO buffer_manager] Allocated weights buffer at (20369985536, 0) [2026-03-29 23:47:50.729016 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=53, cache_slot=53) planned desc only [2026-03-29 23:47:50.765241 INFO buffer_manager] Allocated weights buffer at (20369985536, 0) [2026-03-29 23:47:50.765256 INFO buffer_manager] Allocated weights buffer at (20369985536, 132120576) [2026-03-29 23:47:50.765258 INFO buffer_manager] Allocated weights buffer at (20502106112, 57344) [2026-03-29 23:47:50.765260 INFO buffer_manager] Allocated weights buffer at (20502163456, 132120576) [2026-03-29 23:47:50.765261 INFO buffer_manager] Allocated weights buffer at (20634284032, 57344) [2026-03-29 23:47:50.765263 INFO buffer_manager] Allocated weights buffer at (20634341376, 132120576) [2026-03-29 23:47:50.765264 INFO buffer_manager] Allocated weights buffer at (20766461952, 57344) [2026-03-29 23:47:50.765265 INFO buffer_manager] Allocated weights buffer at (20766519296, 0) [2026-03-29 23:47:50.765267 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=54, cache_slot=54) planned desc only [2026-03-29 23:47:50.801666 INFO buffer_manager] Allocated weights buffer at (20766519296, 0) [2026-03-29 23:47:50.801681 INFO buffer_manager] Allocated weights buffer at (20766519296, 132120576) [2026-03-29 23:47:50.801683 INFO buffer_manager] Allocated weights buffer at (20898639872, 57344) [2026-03-29 23:47:50.801684 INFO buffer_manager] Allocated weights buffer at (20898697216, 132120576) [2026-03-29 23:47:50.801686 INFO buffer_manager] Allocated weights buffer at (21030817792, 57344) [2026-03-29 23:47:50.801687 INFO buffer_manager] Allocated weights buffer at (21030875136, 132120576) [2026-03-29 23:47:50.801689 INFO buffer_manager] Allocated weights buffer at (21162995712, 57344) [2026-03-29 23:47:50.801690 INFO buffer_manager] Allocated weights buffer at (21163053056, 0) [2026-03-29 23:47:50.801692 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=55, cache_slot=55) planned desc only [2026-03-29 23:47:50.838036 INFO buffer_manager] Allocated weights buffer at (21163053056, 0) [2026-03-29 23:47:50.838052 INFO buffer_manager] Allocated weights buffer at (21163053056, 132120576) [2026-03-29 23:47:50.838054 INFO buffer_manager] Allocated weights buffer at (21295173632, 57344) [2026-03-29 23:47:50.838056 INFO buffer_manager] Allocated weights buffer at (21295230976, 132120576) [2026-03-29 23:47:50.838057 INFO buffer_manager] Allocated weights buffer at (21427351552, 57344) [2026-03-29 23:47:50.838060 INFO buffer_manager] Allocated weights buffer at (21427408896, 132120576) [2026-03-29 23:47:50.838062 INFO buffer_manager] Allocated weights buffer at (21559529472, 57344) [2026-03-29 23:47:50.838063 INFO buffer_manager] Allocated weights buffer at (21559586816, 0) [2026-03-29 23:47:50.838064 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=56, cache_slot=56) planned desc only [2026-03-29 23:47:50.874445 INFO buffer_manager] Allocated weights buffer at (21559586816, 0) [2026-03-29 23:47:50.874460 INFO buffer_manager] Allocated weights buffer at (21559586816, 132120576) [2026-03-29 23:47:50.874462 INFO buffer_manager] Allocated weights buffer at (21691707392, 57344) [2026-03-29 23:47:50.874463 INFO buffer_manager] Allocated weights buffer at (21691764736, 132120576) [2026-03-29 23:47:50.874465 INFO buffer_manager] Allocated weights buffer at (21823885312, 57344) [2026-03-29 23:47:50.874466 INFO buffer_manager] Allocated weights buffer at (21823942656, 132120576) [2026-03-29 23:47:50.874468 INFO buffer_manager] Allocated weights buffer at (21956063232, 57344) [2026-03-29 23:47:50.874469 INFO buffer_manager] Allocated weights buffer at (21956120576, 0) [2026-03-29 23:47:50.874471 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=57, cache_slot=57) planned desc only [2026-03-29 23:47:50.910793 INFO buffer_manager] Allocated weights buffer at (21956120576, 0) [2026-03-29 23:47:50.910806 INFO buffer_manager] Allocated weights buffer at (21956120576, 132120576) [2026-03-29 23:47:50.910808 INFO buffer_manager] Allocated weights buffer at (22088241152, 57344) [2026-03-29 23:47:50.910809 INFO buffer_manager] Allocated weights buffer at (22088298496, 132120576) [2026-03-29 23:47:50.910811 INFO buffer_manager] Allocated weights buffer at (22220419072, 57344) [2026-03-29 23:47:50.910812 INFO buffer_manager] Allocated weights buffer at (22220476416, 132120576) [2026-03-29 23:47:50.910814 INFO buffer_manager] Allocated weights buffer at (22352596992, 57344) [2026-03-29 23:47:50.910815 INFO buffer_manager] Allocated weights buffer at (22352654336, 0) [2026-03-29 23:47:50.910817 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=58, cache_slot=58) planned desc only [2026-03-29 23:47:50.947185 INFO buffer_manager] Allocated weights buffer at (22352654336, 0) [2026-03-29 23:47:50.947199 INFO buffer_manager] Allocated weights buffer at (22352654336, 132120576) [2026-03-29 23:47:50.947201 INFO buffer_manager] Allocated weights buffer at (22484774912, 57344) [2026-03-29 23:47:50.947203 INFO buffer_manager] Allocated weights buffer at (22484832256, 132120576) [2026-03-29 23:47:50.947204 INFO buffer_manager] Allocated weights buffer at (22616952832, 57344) [2026-03-29 23:47:50.947206 INFO buffer_manager] Allocated weights buffer at (22617010176, 132120576) [2026-03-29 23:47:50.947207 INFO buffer_manager] Allocated weights buffer at (22749130752, 57344) [2026-03-29 23:47:50.947209 INFO buffer_manager] Allocated weights buffer at (22749188096, 0) [2026-03-29 23:47:50.947210 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=59, cache_slot=59) planned desc only [2026-03-29 23:47:50.983544 INFO buffer_manager] Allocated weights buffer at (22749188096, 0) [2026-03-29 23:47:50.983558 INFO buffer_manager] Allocated weights buffer at (22749188096, 132120576) [2026-03-29 23:47:50.983560 INFO buffer_manager] Allocated weights buffer at (22881308672, 57344) [2026-03-29 23:47:50.983562 INFO buffer_manager] Allocated weights buffer at (22881366016, 132120576) [2026-03-29 23:47:50.983563 INFO buffer_manager] Allocated weights buffer at (23013486592, 57344) [2026-03-29 23:47:50.983565 INFO buffer_manager] Allocated weights buffer at (23013543936, 132120576) [2026-03-29 23:47:50.983570 INFO buffer_manager] Allocated weights buffer at (23145664512, 57344) [2026-03-29 23:47:50.983572 INFO buffer_manager] Allocated weights buffer at (23145721856, 0) [2026-03-29 23:47:50.983573 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=60, cache_slot=60) planned desc only [2026-03-29 23:47:51.346525 INFO buffer_manager] Allocated weights buffer at (23145721856, 0) [2026-03-29 23:47:51.346548 INFO buffer_manager] Allocated weights buffer at (23145721856, 132120576) [2026-03-29 23:47:51.346550 INFO buffer_manager] Allocated weights buffer at (23277842432, 57344) [2026-03-29 23:47:51.346551 INFO buffer_manager] Allocated weights buffer at (23277899776, 132120576) [2026-03-29 23:47:51.346553 INFO buffer_manager] Allocated weights buffer at (23410020352, 57344) [2026-03-29 23:47:51.346554 INFO buffer_manager] Allocated weights buffer at (23410077696, 132120576) [2026-03-29 23:47:51.346556 INFO buffer_manager] Allocated weights buffer at (23542198272, 57344) [2026-03-29 23:47:51.346557 INFO buffer_manager] Allocated weights buffer at (23542255616, 0) [2026-03-29 23:47:51.346559 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=61, cache_slot=61) planned desc only [2026-03-29 23:47:58.458501 INFO fp8_dpdk_common] fp9 fast path forced on by default in the current kernel build [2026-03-29 23:47:58.473826 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=70, expert_tiles=70, avg_tile_batch=1.26, prepare=179.781µs, send=2.302493ms, judge_wait=11.443643ms, fetch=927.651µs, reduce=134ns; duck time-ns stats: p50=11.028039ms, p90=11.055707ms, max=11.065361ms; kernel_model: matmul=0.242221 GFLOP (21.890 GFLOP/s @ duck_max), param_stream=0.096338G (8.706 Gparam/s @ duck_max), weight_stream=103.404 MiB (9.799 GB/s @ duck_max) [2026-03-29 23:47:58.487764 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=66, expert_tiles=66, avg_tile_batch=1.33, prepare=103.658µs, send=611.615µs, judge_wait=10.648717ms, fetch=675.642µs, reduce=20ns; duck time-ns stats: p50=10.456965ms, p90=10.512827ms, max=10.522822ms; kernel_model: matmul=0.242221 GFLOP (23.019 GFLOP/s @ duck_max), param_stream=0.090833G (8.632 Gparam/s @ duck_max), weight_stream=97.495 MiB (9.715 GB/s @ duck_max) [2026-03-29 23:47:58.502885 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=70, expert_tiles=70, avg_tile_batch=1.26, prepare=26.83µs, send=612.748µs, judge_wait=11.91852ms, fetch=640.735µs, reduce=26ns; duck time-ns stats: p50=11.742426ms, p90=11.778499ms, max=11.807577ms; kernel_model: matmul=0.242221 GFLOP (20.514 GFLOP/s @ duck_max), param_stream=0.096338G (8.159 Gparam/s @ duck_max), weight_stream=103.404 MiB (9.183 GB/s @ duck_max) [2026-03-29 23:47:58.517556 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=69, expert_tiles=69, avg_tile_batch=1.28, prepare=19.528µs, send=613.453µs, judge_wait=11.142594ms, fetch=640.436µs, reduce=26ns; duck time-ns stats: p50=10.912595ms, p90=10.971457ms, max=11.033448ms; kernel_model: matmul=0.242221 GFLOP (21.953 GFLOP/s @ duck_max), param_stream=0.094962G (8.607 Gparam/s @ duck_max), weight_stream=101.927 MiB (9.687 GB/s @ duck_max) [2026-03-29 23:47:58.531600 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=68, expert_tiles=68, avg_tile_batch=1.29, prepare=18.702µs, send=613.675µs, judge_wait=10.581497ms, fetch=646.846µs, reduce=19ns; duck time-ns stats: p50=10.359385ms, p90=10.41578ms, max=10.470837ms; kernel_model: matmul=0.242221 GFLOP (23.133 GFLOP/s @ duck_max), param_stream=0.093585G (8.938 Gparam/s @ duck_max), weight_stream=100.450 MiB (10.059 GB/s @ duck_max) [2026-03-29 23:47:58.546327 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=66, expert_tiles=66, avg_tile_batch=1.33, prepare=68.476µs, send=613.587µs, judge_wait=11.241743ms, fetch=640.982µs, reduce=18ns; duck time-ns stats: p50=11.081577ms, p90=11.114891ms, max=11.130509ms; kernel_model: matmul=0.242221 GFLOP (21.762 GFLOP/s @ duck_max), param_stream=0.090833G (8.161 Gparam/s @ duck_max), weight_stream=97.495 MiB (9.185 GB/s @ duck_max) [2026-03-29 23:47:58.560465 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=64, expert_tiles=64, avg_tile_batch=1.38, prepare=17.727µs, send=615.092µs, judge_wait=10.582555ms, fetch=641.204µs, reduce=18ns; duck time-ns stats: p50=10.41487ms, p90=10.446865ms, max=10.471979ms; kernel_model: matmul=0.242221 GFLOP (23.130 GFLOP/s @ duck_max), param_stream=0.088080G (8.411 Gparam/s @ duck_max), weight_stream=94.541 MiB (9.467 GB/s @ duck_max) [2026-03-29 23:47:58.574405 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=60, expert_tiles=62, avg_tile_batch=1.42, prepare=17.92µs, send=612.414µs, judge_wait=10.414845ms, fetch=636.767µs, reduce=18ns; duck time-ns stats: p50=10.268836ms, p90=10.301492ms, max=10.30508ms; kernel_model: matmul=0.242221 GFLOP (23.505 GFLOP/s @ duck_max), param_stream=0.085328G (8.280 Gparam/s @ duck_max), weight_stream=91.587 MiB (9.319 GB/s @ duck_max) [2026-03-29 23:47:58.588417 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=65, expert_tiles=65, avg_tile_batch=1.35, prepare=17.602µs, send=612.666µs, judge_wait=10.557798ms, fetch=640.697µs, reduce=24ns; duck time-ns stats: p50=10.393938ms, p90=10.434906ms, max=10.449772ms; kernel_model: matmul=0.242221 GFLOP (23.180 GFLOP/s @ duck_max), param_stream=0.089457G (8.561 Gparam/s @ duck_max), weight_stream=96.018 MiB (9.635 GB/s @ duck_max) [2026-03-29 23:47:58.602812 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=62, expert_tiles=63, avg_tile_batch=1.40, prepare=18.518µs, send=618.915µs, judge_wait=10.883317ms, fetch=645.233µs, reduce=20ns; duck time-ns stats: p50=10.732444ms, p90=10.74931ms, max=10.769471ms; kernel_model: matmul=0.242221 GFLOP (22.491 GFLOP/s @ duck_max), param_stream=0.086704G (8.051 Gparam/s @ duck_max), weight_stream=93.064 MiB (9.061 GB/s @ duck_max) [2026-03-29 23:47:58.616228 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=49, expert_tiles=52, avg_tile_batch=1.69, prepare=18.773µs, send=612.574µs, judge_wait=9.894656ms, fetch=643.295µs, reduce=18ns; duck time-ns stats: p50=9.713859ms, p90=9.773831ms, max=9.789244ms; kernel_model: matmul=0.242221 GFLOP (24.744 GFLOP/s @ duck_max), param_stream=0.071565G (7.311 Gparam/s @ duck_max), weight_stream=76.815 MiB (8.228 GB/s @ duck_max) [2026-03-29 23:47:58.629947 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=50, expert_tiles=52, avg_tile_batch=1.69, prepare=18.403µs, send=614.098µs, judge_wait=10.202785ms, fetch=642.807µs, reduce=20ns; duck time-ns stats: p50=10.05581ms, p90=10.077286ms, max=10.092804ms; kernel_model: matmul=0.242221 GFLOP (23.999 GFLOP/s @ duck_max), param_stream=0.071565G (7.091 Gparam/s @ duck_max), weight_stream=76.815 MiB (7.981 GB/s @ duck_max) [2026-03-29 23:47:58.642638 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=48, expert_tiles=51, avg_tile_batch=1.73, prepare=17.875µs, send=614.418µs, judge_wait=9.303697ms, fetch=640.903µs, reduce=24ns; duck time-ns stats: p50=9.138962ms, p90=9.176207ms, max=9.198648ms; kernel_model: matmul=0.242221 GFLOP (26.332 GFLOP/s @ duck_max), param_stream=0.070189G (7.630 Gparam/s @ duck_max), weight_stream=75.337 MiB (8.588 GB/s @ duck_max) [2026-03-29 23:47:58.655466 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=49, expert_tiles=50, avg_tile_batch=1.76, prepare=19.557µs, send=611.9µs, judge_wait=9.657917ms, fetch=641.77µs, reduce=24ns; duck time-ns stats: p50=9.489627ms, p90=9.536723ms, max=9.5461ms; kernel_model: matmul=0.242221 GFLOP (25.374 GFLOP/s @ duck_max), param_stream=0.068813G (7.208 Gparam/s @ duck_max), weight_stream=73.860 MiB (8.113 GB/s @ duck_max) [2026-03-29 23:47:58.668067 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=50, expert_tiles=51, avg_tile_batch=1.73, prepare=20.97µs, send=612.616µs, judge_wait=9.459259ms, fetch=648.044µs, reduce=26ns; duck time-ns stats: p50=9.276392ms, p90=9.322345ms, max=9.355567ms; kernel_model: matmul=0.242221 GFLOP (25.891 GFLOP/s @ duck_max), param_stream=0.070189G (7.502 Gparam/s @ duck_max), weight_stream=75.337 MiB (8.444 GB/s @ duck_max) [2026-03-29 23:47:58.680945 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=54, expert_tiles=55, avg_tile_batch=1.60, prepare=16.766µs, send=613.331µs, judge_wait=9.856837ms, fetch=635.629µs, reduce=23ns; duck time-ns stats: p50=9.711953ms, p90=9.739938ms, max=9.751031ms; kernel_model: matmul=0.242221 GFLOP (24.841 GFLOP/s @ duck_max), param_stream=0.075694G (7.763 Gparam/s @ duck_max), weight_stream=81.246 MiB (8.737 GB/s @ duck_max) [2026-03-29 23:47:58.694067 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=56, expert_tiles=57, avg_tile_batch=1.54, prepare=16.466µs, send=612.118µs, judge_wait=10.135808ms, fetch=637.807µs, reduce=20ns; duck time-ns stats: p50=9.980288ms, p90=10.010602ms, max=10.016364ms; kernel_model: matmul=0.242221 GFLOP (24.183 GFLOP/s @ duck_max), param_stream=0.078447G (7.832 Gparam/s @ duck_max), weight_stream=84.201 MiB (8.815 GB/s @ duck_max) [2026-03-29 23:47:58.706997 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=55, expert_tiles=56, avg_tile_batch=1.57, prepare=16.751µs, send=611.837µs, judge_wait=9.912291ms, fetch=635.843µs, reduce=17ns; duck time-ns stats: p50=9.75615ms, p90=9.789699ms, max=9.805365ms; kernel_model: matmul=0.242221 GFLOP (24.703 GFLOP/s @ duck_max), param_stream=0.077070G (7.860 Gparam/s @ duck_max), weight_stream=82.723 MiB (8.846 GB/s @ duck_max) [2026-03-29 23:47:58.720318 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=47, expert_tiles=50, avg_tile_batch=1.76, prepare=16.664µs, send=611.516µs, judge_wait=10.345158ms, fetch=638.667µs, reduce=14ns; duck time-ns stats: p50=10.201498ms, p90=10.21994ms, max=10.229335ms; kernel_model: matmul=0.242221 GFLOP (23.679 GFLOP/s @ duck_max), param_stream=0.068813G (6.727 Gparam/s @ duck_max), weight_stream=73.860 MiB (7.571 GB/s @ duck_max) [2026-03-29 23:47:58.734013 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=50, expert_tiles=53, avg_tile_batch=1.66, prepare=16.488µs, send=611.89µs, judge_wait=10.710728ms, fetch=638.251µs, reduce=14ns; duck time-ns stats: p50=10.535236ms, p90=10.573526ms, max=10.60117ms; kernel_model: matmul=0.242221 GFLOP (22.849 GFLOP/s @ duck_max), param_stream=0.072942G (6.881 Gparam/s @ duck_max), weight_stream=78.292 MiB (7.744 GB/s @ duck_max) [2026-03-29 23:47:58.746325 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=49, expert_tiles=51, avg_tile_batch=1.73, prepare=16.816µs, send=611.373µs, judge_wait=9.301603ms, fetch=635.039µs, reduce=19ns; duck time-ns stats: p50=9.155936ms, p90=9.181335ms, max=9.198222ms; kernel_model: matmul=0.242221 GFLOP (26.333 GFLOP/s @ duck_max), param_stream=0.070189G (7.631 Gparam/s @ duck_max), weight_stream=75.337 MiB (8.588 GB/s @ duck_max) [2026-03-29 23:47:58.760411 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=41, expert_tiles=46, avg_tile_batch=1.91, prepare=16.588µs, send=612.29µs, judge_wait=11.112016ms, fetch=640.207µs, reduce=14ns; duck time-ns stats: p50=10.951245ms, p90=10.982727ms, max=11.000485ms; kernel_model: matmul=0.242221 GFLOP (22.019 GFLOP/s @ duck_max), param_stream=0.063308G (5.755 Gparam/s @ duck_max), weight_stream=67.951 MiB (6.477 GB/s @ duck_max) [2026-03-29 23:47:58.772462 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=44, expert_tiles=47, avg_tile_batch=1.87, prepare=16.714µs, send=612.77µs, judge_wait=9.078755ms, fetch=636.282µs, reduce=12ns; duck time-ns stats: p50=8.934431ms, p90=8.959175ms, max=8.975129ms; kernel_model: matmul=0.242221 GFLOP (26.988 GFLOP/s @ duck_max), param_stream=0.064684G (7.207 Gparam/s @ duck_max), weight_stream=69.429 MiB (8.111 GB/s @ duck_max) [2026-03-29 23:47:58.785701 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=58, expert_tiles=60, avg_tile_batch=1.47, prepare=19.791µs, send=611.254µs, judge_wait=10.241081ms, fetch=641.869µs, reduce=17ns; duck time-ns stats: p50=10.07142ms, p90=10.102409ms, max=10.12081ms; kernel_model: matmul=0.242221 GFLOP (23.933 GFLOP/s @ duck_max), param_stream=0.082575G (8.159 Gparam/s @ duck_max), weight_stream=88.632 MiB (9.183 GB/s @ duck_max) [2026-03-29 23:47:58.797905 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=43, expert_tiles=45, avg_tile_batch=1.96, prepare=16.038µs, send=610.828µs, judge_wait=9.215037ms, fetch=636.001µs, reduce=16ns; duck time-ns stats: p50=9.070766ms, p90=9.100006ms, max=9.110605ms; kernel_model: matmul=0.242221 GFLOP (26.587 GFLOP/s @ duck_max), param_stream=0.061932G (6.798 Gparam/s @ duck_max), weight_stream=66.474 MiB (7.651 GB/s @ duck_max) [2026-03-29 23:47:58.809460 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=43, expert_tiles=45, avg_tile_batch=1.96, prepare=16.683µs, send=612.272µs, judge_wait=8.600676ms, fetch=636.179µs, reduce=15ns; duck time-ns stats: p50=8.437093ms, p90=8.476582ms, max=8.495386ms; kernel_model: matmul=0.242221 GFLOP (28.512 GFLOP/s @ duck_max), param_stream=0.061932G (7.290 Gparam/s @ duck_max), weight_stream=66.474 MiB (8.205 GB/s @ duck_max) [2026-03-29 23:47:58.822117 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=52, expert_tiles=56, avg_tile_batch=1.57, prepare=16.647µs, send=611.525µs, judge_wait=9.666228ms, fetch=634.148µs, reduce=16ns; duck time-ns stats: p50=9.509559ms, p90=9.540452ms, max=9.559857ms; kernel_model: matmul=0.242221 GFLOP (25.337 GFLOP/s @ duck_max), param_stream=0.077070G (8.062 Gparam/s @ duck_max), weight_stream=82.723 MiB (9.074 GB/s @ duck_max) [2026-03-29 23:47:58.835664 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=51, expert_tiles=53, avg_tile_batch=1.66, prepare=15.943µs, send=612.252µs, judge_wait=10.577332ms, fetch=637.179µs, reduce=18ns; duck time-ns stats: p50=10.380022ms, p90=10.410525ms, max=10.45358ms; kernel_model: matmul=0.242221 GFLOP (23.171 GFLOP/s @ duck_max), param_stream=0.072942G (6.978 Gparam/s @ duck_max), weight_stream=78.292 MiB (7.853 GB/s @ duck_max) [2026-03-29 23:47:58.847752 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=49, expert_tiles=49, avg_tile_batch=1.80, prepare=17.042µs, send=611.36µs, judge_wait=9.132496ms, fetch=638.189µs, reduce=12ns; duck time-ns stats: p50=8.962028ms, p90=8.998158ms, max=9.028654ms; kernel_model: matmul=0.242221 GFLOP (26.828 GFLOP/s @ duck_max), param_stream=0.067437G (7.469 Gparam/s @ duck_max), weight_stream=72.383 MiB (8.406 GB/s @ duck_max) [2026-03-29 23:47:58.860081 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=47, expert_tiles=51, avg_tile_batch=1.73, prepare=16.343µs, send=611.961µs, judge_wait=9.34485ms, fetch=635.678µs, reduce=18ns; duck time-ns stats: p50=9.184832ms, p90=9.214551ms, max=9.243022ms; kernel_model: matmul=0.242221 GFLOP (26.206 GFLOP/s @ duck_max), param_stream=0.070189G (7.594 Gparam/s @ duck_max), weight_stream=75.337 MiB (8.547 GB/s @ duck_max) [2026-03-29 23:47:58.872550 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=50, expert_tiles=52, avg_tile_batch=1.69, prepare=16.367µs, send=616.46µs, judge_wait=9.475387ms, fetch=635.765µs, reduce=16ns; duck time-ns stats: p50=9.301955ms, p90=9.334051ms, max=9.371447ms; kernel_model: matmul=0.242221 GFLOP (25.847 GFLOP/s @ duck_max), param_stream=0.071565G (7.637 Gparam/s @ duck_max), weight_stream=76.815 MiB (8.595 GB/s @ duck_max) [2026-03-29 23:47:58.885078 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=46, expert_tiles=48, avg_tile_batch=1.83, prepare=16.207µs, send=610.891µs, judge_wait=9.51291ms, fetch=634.406µs, reduce=16ns; duck time-ns stats: p50=9.352225ms, p90=9.387897ms, max=9.408639ms; kernel_model: matmul=0.242221 GFLOP (25.745 GFLOP/s @ duck_max), param_stream=0.066060G (7.021 Gparam/s @ duck_max), weight_stream=70.906 MiB (7.902 GB/s @ duck_max) [2026-03-29 23:47:58.898033 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=56, expert_tiles=58, avg_tile_batch=1.52, prepare=16.499µs, send=611.343µs, judge_wait=9.951382ms, fetch=637.176µs, reduce=17ns; duck time-ns stats: p50=9.773474ms, p90=9.802977ms, max=9.846113ms; kernel_model: matmul=0.242221 GFLOP (24.601 GFLOP/s @ duck_max), param_stream=0.079823G (8.107 Gparam/s @ duck_max), weight_stream=85.678 MiB (9.124 GB/s @ duck_max) [2026-03-29 23:47:58.911416 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=56, expert_tiles=59, avg_tile_batch=1.49, prepare=16.385µs, send=611.129µs, judge_wait=10.372518ms, fetch=654.933µs, reduce=13ns; duck time-ns stats: p50=10.199731ms, p90=10.232672ms, max=10.257298ms; kernel_model: matmul=0.242221 GFLOP (23.615 GFLOP/s @ duck_max), param_stream=0.081199G (7.916 Gparam/s @ duck_max), weight_stream=87.155 MiB (8.910 GB/s @ duck_max) [2026-03-29 23:47:58.924598 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=50, expert_tiles=54, avg_tile_batch=1.63, prepare=17.936µs, send=612.073µs, judge_wait=10.166883ms, fetch=635.973µs, reduce=19ns; duck time-ns stats: p50=10.00217ms, p90=10.03966ms, max=10.061777ms; kernel_model: matmul=0.242221 GFLOP (24.073 GFLOP/s @ duck_max), param_stream=0.074318G (7.386 Gparam/s @ duck_max), weight_stream=79.769 MiB (8.313 GB/s @ duck_max) [2026-03-29 23:47:58.938078 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=46, expert_tiles=50, avg_tile_batch=1.76, prepare=70.815µs, send=610.425µs, judge_wait=10.385758ms, fetch=638.059µs, reduce=19ns; duck time-ns stats: p50=10.212015ms, p90=10.24588ms, max=10.280221ms; kernel_model: matmul=0.242221 GFLOP (23.562 GFLOP/s @ duck_max), param_stream=0.068813G (6.694 Gparam/s @ duck_max), weight_stream=73.860 MiB (7.534 GB/s @ duck_max) [2026-03-29 23:47:58.952022 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=57, expert_tiles=58, avg_tile_batch=1.52, prepare=21.559µs, send=610.388µs, judge_wait=10.856596ms, fetch=641.614µs, reduce=20ns; duck time-ns stats: p50=10.690804ms, p90=10.719696ms, max=10.748596ms; kernel_model: matmul=0.242221 GFLOP (22.535 GFLOP/s @ duck_max), param_stream=0.079823G (7.426 Gparam/s @ duck_max), weight_stream=85.678 MiB (8.358 GB/s @ duck_max) [2026-03-29 23:47:58.965075 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=52, expert_tiles=55, avg_tile_batch=1.60, prepare=16.904µs, send=612.004µs, judge_wait=10.006268ms, fetch=636.92µs, reduce=14ns; duck time-ns stats: p50=9.863703ms, p90=9.889371ms, max=9.903015ms; kernel_model: matmul=0.242221 GFLOP (24.459 GFLOP/s @ duck_max), param_stream=0.075694G (7.644 Gparam/s @ duck_max), weight_stream=81.246 MiB (8.603 GB/s @ duck_max) [2026-03-29 23:47:58.977738 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=49, expert_tiles=49, avg_tile_batch=1.80, prepare=15.7µs, send=612.495µs, judge_wait=9.615039ms, fetch=634.97µs, reduce=14ns; duck time-ns stats: p50=9.481325ms, p90=9.507832ms, max=9.513919ms; kernel_model: matmul=0.242221 GFLOP (25.460 GFLOP/s @ duck_max), param_stream=0.067437G (7.088 Gparam/s @ duck_max), weight_stream=72.383 MiB (7.978 GB/s @ duck_max) [2026-03-29 23:47:58.990554 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=52, expert_tiles=56, avg_tile_batch=1.57, prepare=15.411µs, send=611.113µs, judge_wait=9.835173ms, fetch=636.571µs, reduce=14ns; duck time-ns stats: p50=9.680888ms, p90=9.725453ms, max=9.732177ms; kernel_model: matmul=0.242221 GFLOP (24.889 GFLOP/s @ duck_max), param_stream=0.077070G (7.919 Gparam/s @ duck_max), weight_stream=82.723 MiB (8.913 GB/s @ duck_max) [2026-03-29 23:47:59.003624 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=55, expert_tiles=58, avg_tile_batch=1.52, prepare=16µs, send=611.071µs, judge_wait=10.084886ms, fetch=638.544µs, reduce=14ns; duck time-ns stats: p50=9.926539ms, p90=9.963773ms, max=9.97291ms; kernel_model: matmul=0.242221 GFLOP (24.288 GFLOP/s @ duck_max), param_stream=0.079823G (8.004 Gparam/s @ duck_max), weight_stream=85.678 MiB (9.008 GB/s @ duck_max) [2026-03-29 23:47:59.017050 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=55, expert_tiles=57, avg_tile_batch=1.54, prepare=16.378µs, send=610.703µs, judge_wait=10.45341ms, fetch=638.934µs, reduce=14ns; duck time-ns stats: p50=10.255646ms, p90=10.314636ms, max=10.33064ms; kernel_model: matmul=0.242221 GFLOP (23.447 GFLOP/s @ duck_max), param_stream=0.078447G (7.594 Gparam/s @ duck_max), weight_stream=84.201 MiB (8.546 GB/s @ duck_max) [2026-03-29 23:47:59.030087 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=52, expert_tiles=52, avg_tile_batch=1.69, prepare=15.917µs, send=612.305µs, judge_wait=10.020856ms, fetch=637.69µs, reduce=14ns; duck time-ns stats: p50=9.868403ms, p90=9.901077ms, max=9.914504ms; kernel_model: matmul=0.242221 GFLOP (24.431 GFLOP/s @ duck_max), param_stream=0.071565G (7.218 Gparam/s @ duck_max), weight_stream=76.815 MiB (8.124 GB/s @ duck_max) [2026-03-29 23:47:59.043550 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=56, expert_tiles=58, avg_tile_batch=1.52, prepare=16.348µs, send=610.504µs, judge_wait=10.451781ms, fetch=637.142µs, reduce=14ns; duck time-ns stats: p50=10.270321ms, p90=10.317609ms, max=10.344846ms; kernel_model: matmul=0.242221 GFLOP (23.415 GFLOP/s @ duck_max), param_stream=0.079823G (7.716 Gparam/s @ duck_max), weight_stream=85.678 MiB (8.684 GB/s @ duck_max) [2026-03-29 23:47:59.057514 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=56, expert_tiles=58, avg_tile_batch=1.52, prepare=16.136µs, send=611.616µs, judge_wait=10.956388ms, fetch=638.75µs, reduce=13ns; duck time-ns stats: p50=10.780873ms, p90=10.817292ms, max=10.848477ms; kernel_model: matmul=0.242221 GFLOP (22.328 GFLOP/s @ duck_max), param_stream=0.079823G (7.358 Gparam/s @ duck_max), weight_stream=85.678 MiB (8.281 GB/s @ duck_max) [2026-03-29 23:47:59.070751 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=54, expert_tiles=57, avg_tile_batch=1.54, prepare=18.966µs, send=610.885µs, judge_wait=10.195674ms, fetch=646.328µs, reduce=13ns; duck time-ns stats: p50=10.01241ms, p90=10.053396ms, max=10.07375ms; kernel_model: matmul=0.242221 GFLOP (24.045 GFLOP/s @ duck_max), param_stream=0.078447G (7.787 Gparam/s @ duck_max), weight_stream=84.201 MiB (8.764 GB/s @ duck_max) [2026-03-29 23:47:59.084369 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=63, expert_tiles=63, avg_tile_batch=1.40, prepare=16.084µs, send=613.926µs, judge_wait=10.576251ms, fetch=642.708µs, reduce=24ns; duck time-ns stats: p50=10.407474ms, p90=10.44343ms, max=10.452532ms; kernel_model: matmul=0.242221 GFLOP (23.173 GFLOP/s @ duck_max), param_stream=0.086704G (8.295 Gparam/s @ duck_max), weight_stream=93.064 MiB (9.336 GB/s @ duck_max) [2026-03-29 23:47:59.097977 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=64, expert_tiles=66, avg_tile_batch=1.33, prepare=16.615µs, send=611.935µs, judge_wait=10.476495ms, fetch=644.308µs, reduce=21ns; duck time-ns stats: p50=10.317587ms, p90=10.344993ms, max=10.352196ms; kernel_model: matmul=0.242221 GFLOP (23.398 GFLOP/s @ duck_max), param_stream=0.090833G (8.774 Gparam/s @ duck_max), weight_stream=97.495 MiB (9.875 GB/s @ duck_max) [2026-03-29 23:47:59.112106 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=65, expert_tiles=67, avg_tile_batch=1.31, prepare=15.995µs, send=612.283µs, judge_wait=11.004429ms, fetch=643.99µs, reduce=19ns; duck time-ns stats: p50=10.838777ms, p90=10.859854ms, max=10.885167ms; kernel_model: matmul=0.242221 GFLOP (22.252 GFLOP/s @ duck_max), param_stream=0.092209G (8.471 Gparam/s @ duck_max), weight_stream=98.973 MiB (9.534 GB/s @ duck_max) [2026-03-29 23:47:59.126293 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=58, expert_tiles=59, avg_tile_batch=1.49, prepare=17.36µs, send=612.905µs, judge_wait=11.141228ms, fetch=646.361µs, reduce=19ns; duck time-ns stats: p50=10.961871ms, p90=11.006271ms, max=11.017746ms; kernel_model: matmul=0.242221 GFLOP (21.985 GFLOP/s @ duck_max), param_stream=0.081199G (7.370 Gparam/s @ duck_max), weight_stream=87.155 MiB (8.295 GB/s @ duck_max) [2026-03-29 23:47:59.139381 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=51, expert_tiles=53, avg_tile_batch=1.66, prepare=20.298µs, send=612.07µs, judge_wait=9.972558ms, fetch=639.73µs, reduce=20ns; duck time-ns stats: p50=9.802525ms, p90=9.848997ms, max=9.864847ms; kernel_model: matmul=0.242221 GFLOP (24.554 GFLOP/s @ duck_max), param_stream=0.072942G (7.394 Gparam/s @ duck_max), weight_stream=78.292 MiB (8.322 GB/s @ duck_max) [2026-03-29 23:47:59.153487 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=55, expert_tiles=58, avg_tile_batch=1.52, prepare=17.891µs, send=612.627µs, judge_wait=11.024348ms, fetch=640.743µs, reduce=20ns; duck time-ns stats: p50=10.836827ms, p90=10.885856ms, max=10.913046ms; kernel_model: matmul=0.242221 GFLOP (22.196 GFLOP/s @ duck_max), param_stream=0.079823G (7.314 Gparam/s @ duck_max), weight_stream=85.678 MiB (8.232 GB/s @ duck_max) [2026-03-29 23:47:59.167707 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=54, expert_tiles=55, avg_tile_batch=1.60, prepare=16.471µs, send=610.756µs, judge_wait=11.167893ms, fetch=641.673µs, reduce=18ns; duck time-ns stats: p50=10.916461ms, p90=10.957193ms, max=11.044686ms; kernel_model: matmul=0.242221 GFLOP (21.931 GFLOP/s @ duck_max), param_stream=0.075694G (6.853 Gparam/s @ duck_max), weight_stream=81.246 MiB (7.713 GB/s @ duck_max) [2026-03-29 23:47:59.180593 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=53, expert_tiles=54, avg_tile_batch=1.63, prepare=16.948µs, send=612.184µs, judge_wait=9.821692ms, fetch=640.689µs, reduce=26ns; duck time-ns stats: p50=9.671924ms, p90=9.696716ms, max=9.716127ms; kernel_model: matmul=0.242221 GFLOP (24.930 GFLOP/s @ duck_max), param_stream=0.074318G (7.649 Gparam/s @ duck_max), weight_stream=79.769 MiB (8.609 GB/s @ duck_max) [2026-03-29 23:47:59.194274 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=54, expert_tiles=55, avg_tile_batch=1.60, prepare=16.275µs, send=611.593µs, judge_wait=10.63218ms, fetch=642.021µs, reduce=16ns; duck time-ns stats: p50=10.467876ms, p90=10.499694ms, max=10.52369ms; kernel_model: matmul=0.242221 GFLOP (23.017 GFLOP/s @ duck_max), param_stream=0.075694G (7.193 Gparam/s @ duck_max), weight_stream=81.246 MiB (8.095 GB/s @ duck_max) [2026-03-29 23:47:59.207981 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=56, expert_tiles=57, avg_tile_batch=1.54, prepare=24.979µs, send=612.104µs, judge_wait=10.556032ms, fetch=640.135µs, reduce=14ns; duck time-ns stats: p50=10.391098ms, p90=10.423137ms, max=10.446721ms; kernel_model: matmul=0.242221 GFLOP (23.186 GFLOP/s @ duck_max), param_stream=0.078447G (7.509 Gparam/s @ duck_max), weight_stream=84.201 MiB (8.452 GB/s @ duck_max) [2026-03-29 23:47:59.221234 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=56, expert_tiles=58, avg_tile_batch=1.52, prepare=17.289µs, send=614.121µs, judge_wait=10.1735ms, fetch=640.705µs, reduce=14ns; duck time-ns stats: p50=9.989071ms, p90=10.042348ms, max=10.058426ms; kernel_model: matmul=0.242221 GFLOP (24.081 GFLOP/s @ duck_max), param_stream=0.079823G (7.936 Gparam/s @ duck_max), weight_stream=85.678 MiB (8.932 GB/s @ duck_max) [2026-03-29 23:47:59.235029 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=57, expert_tiles=58, avg_tile_batch=1.52, prepare=17.293µs, send=612.479µs, judge_wait=10.735853ms, fetch=642.4µs, reduce=13ns; duck time-ns stats: p50=10.551064ms, p90=10.598467ms, max=10.611105ms; kernel_model: matmul=0.242221 GFLOP (22.827 GFLOP/s @ duck_max), param_stream=0.079823G (7.523 Gparam/s @ duck_max), weight_stream=85.678 MiB (8.467 GB/s @ duck_max) [2026-03-29 23:47:59.275916 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=10, top_k=8, tasks=80, unique_experts=58, expert_tiles=58, avg_tile_batch=1.38, prepare=133.364µs, send=1.2284ms, judge_wait=9.701264ms, fetch=628.973µs, reduce=20ns; duck time-ns stats: p50=9.530071ms, p90=9.556622ms, max=9.576012ms; kernel_model: matmul=0.220201 GFLOP (22.995 GFLOP/s @ duck_max), param_stream=0.079823G (8.336 Gparam/s @ duck_max), weight_stream=85.678 MiB (9.382 GB/s @ duck_max) [2026-03-29 23:47:59.284246 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.001778ms; phases: prepare=4.852µs, send=61.325µs, judge_wait=807.3µs, fetch=90.977µs, reduce=20ns, writeback=507ns; duck time-ns stats: p50=727.395µs, p90=732.148µs, max=733.071µs; effective_read: activated_experts=8, params=0.011010G (15.019 Gparam/s @ duck_max), memory=11.818 MiB (16.904 GB/s @ duck_max), judge_gap=74.229µs, judge_ratio=1.101x [2026-03-29 23:48:00.038782 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.281983ms; phases: prepare=5.262µs, send=345.89µs, judge_wait=779.109µs, fetch=96.046µs, reduce=19ns, writeback=488ns; duck time-ns stats: p50=694.865µs, p90=700.262µs, max=701.512µs; effective_read: activated_experts=8, params=0.011010G (15.695 Gparam/s @ duck_max), memory=11.818 MiB (17.664 GB/s @ duck_max), judge_gap=77.597µs, judge_ratio=1.111x Token # 1: 779.144ms; value: next_token_ids=tensor([1415], device='cuda:0') mtp accept=1 prop=1415 top1=1415 accp=1.000 next=draft=112036 prop=112036 olap pair=724.3ms serial=1336.1ms gain=611.8ms ratio=0.46 s0=635.4ms s1=700.7ms wait=0.2/42.5ms pred gate=device [2026-03-29 23:48:00.042883 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.040426ms; phases: prepare=3.212µs, send=62.652µs, judge_wait=845.771µs, fetch=91.76µs, reduce=19ns, writeback=519ns; duck time-ns stats: p50=757.086µs, p90=765.164µs, max=768.557µs; effective_read: activated_experts=8, params=0.011010G (14.326 Gparam/s @ duck_max), memory=11.818 MiB (16.123 GB/s @ duck_max), judge_gap=77.214µs, judge_ratio=1.100x Token # 2: 3.877ms; value: next_token_ids=tensor([112036], device='cuda:0') mtp accept=1 prop=112036 top1=112036 accp=1.000 next=pair draft=49672 prop=49672 pred gate=device [2026-03-29 23:48:00.156910 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.092013ms; phases: prepare=3.913µs, send=61.369µs, judge_wait=830.708µs, fetch=104.333µs, reduce=135ns, writeback=596ns; duck time-ns stats: p50=750.853µs, p90=753.424µs, max=755.702µs; effective_read: activated_experts=8, params=0.011010G (14.569 Gparam/s @ duck_max), memory=11.818 MiB (16.398 GB/s @ duck_max), judge_gap=75.006µs, judge_ratio=1.099x Token # 3: 114.252ms; value: next_token_ids=tensor([582], device='cuda:0') mtp accept=0 prop=49672 top1=49672 accp=0.797 next=draft=7163 prop=7163 olap pair=108.7ms serial=192.2ms gain=83.5ms ratio=0.43 s0=4.4ms s1=187.8ms wait=0.1/50.7ms pred gate=device [2026-03-29 23:48:00.271529 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 978.419µs; phases: prepare=3.296µs, send=61.76µs, judge_wait=784.003µs, fetch=91.949µs, reduce=20ns, writeback=532ns; duck time-ns stats: p50=701.15µs, p90=709.766µs, max=710.478µs; effective_read: activated_experts=8, params=0.011010G (15.497 Gparam/s @ duck_max), memory=11.818 MiB (17.441 GB/s @ duck_max), judge_gap=73.525µs, judge_ratio=1.103x Token # 4: 114.648ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=108.9ms serial=192.5ms gain=83.6ms ratio=0.43 s0=4.3ms s1=188.2ms wait=0.1/51.0ms pred gate=device [2026-03-29 23:48:00.275562 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 970.665µs; phases: prepare=3.165µs, send=61.244µs, judge_wait=777.973µs, fetch=91.68µs, reduce=20ns, writeback=626ns; duck time-ns stats: p50=692.043µs, p90=701.932µs, max=704.148µs; effective_read: activated_experts=8, params=0.011010G (15.636 Gparam/s @ duck_max), memory=11.818 MiB (17.598 GB/s @ duck_max), judge_gap=73.825µs, judge_ratio=1.105x Token # 5: 3.839ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device [2026-03-29 23:48:00.389890 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.010743ms; phases: prepare=5.884µs, send=62.857µs, judge_wait=784.632µs, fetch=95.817µs, reduce=20ns, writeback=1.27µs; duck time-ns stats: p50=696.445µs, p90=700.829µs, max=702.142µs; effective_read: activated_experts=8, params=0.011010G (15.681 Gparam/s @ duck_max), memory=11.818 MiB (17.648 GB/s @ duck_max), judge_gap=82.49µs, judge_ratio=1.117x Token # 6: 114.911ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=draft=223 prop=389 olap pair=108.8ms serial=191.7ms gain=82.9ms ratio=0.43 s0=8.8ms s1=182.8ms wait=0.2/45.7ms pred gate=device [2026-03-29 23:48:00.394759 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.009486ms; phases: prepare=4.864µs, send=63.798µs, judge_wait=785.252µs, fetch=93.499µs, reduce=20ns, writeback=1.358µs; duck time-ns stats: p50=698.255µs, p90=702.345µs, max=706.642µs; effective_read: activated_experts=8, params=0.011010G (15.581 Gparam/s @ duck_max), memory=11.818 MiB (17.536 GB/s @ duck_max), judge_gap=78.61µs, judge_ratio=1.111x Token # 7: 4.657ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=0.187 next=pair draft=1703 prop=1703 pred gate=device [2026-03-29 23:48:00.510278 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.053776ms; phases: prepare=4.616µs, send=62.142µs, judge_wait=857.323µs, fetch=91.182µs, reduce=21ns, writeback=672ns; duck time-ns stats: p50=753.822µs, p90=760.741µs, max=762.786µs; effective_read: activated_experts=8, params=0.011010G (14.434 Gparam/s @ duck_max), memory=11.818 MiB (16.245 GB/s @ duck_max), judge_gap=94.537µs, judge_ratio=1.124x Token # 8: 115.316ms; value: next_token_ids=tensor([1703], device='cuda:0') mtp accept=1 prop=1703 top1=1703 accp=1.000 next=draft=996 prop=996 olap pair=109.7ms serial=192.4ms gain=82.7ms ratio=0.43 s0=6.8ms s1=185.6ms wait=0.2/48.3ms pred gate=device [2026-03-29 23:48:00.514338 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 972.854µs; phases: prepare=3.492µs, send=61.572µs, judge_wait=778.254µs, fetch=91.361µs, reduce=20ns, writeback=579ns; duck time-ns stats: p50=696.849µs, p90=700.92µs, max=705.043µs; effective_read: activated_experts=8, params=0.011010G (15.616 Gparam/s @ duck_max), memory=11.818 MiB (17.576 GB/s @ duck_max), judge_gap=73.211µs, judge_ratio=1.104x Token # 9: 3.887ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=pair draft=3467 prop=3467 pred gate=device Token # 10: 116.437ms; value: next_token_ids=tensor([3467], device='cuda:0') mtp accept=1 prop=3467 top1=3467 accp=1.000 next=draft=1148 prop=1148 olap pair=110.4ms serial=193.0ms gain=82.6ms ratio=0.43 s0=8.0ms s1=184.9ms wait=0.2/46.8ms pred gate=device Token # 11: 4.652ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=0.805 next=pair draft=4 prop=4 pred gate=device Token # 12: 114.262ms; value: next_token_ids=tensor([4], device='cuda:0') mtp accept=1 prop=4 top1=4 accp=0.859 next=draft=223 prop=223 olap pair=108.7ms serial=191.1ms gain=82.4ms ratio=0.43 s0=7.8ms s1=183.3ms wait=0.2/47.1ms pred gate=device Token # 13: 3.844ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=39932 prop=39932 pred gate=device Token # 14: 115.850ms; value: next_token_ids=tensor([39932], device='cuda:0') mtp accept=1 prop=39932 top1=39932 accp=0.996 next=draft=10094 prop=10094 olap pair=109.7ms serial=192.8ms gain=83.1ms ratio=0.43 s0=7.1ms s1=185.7ms wait=0.2/47.7ms pred gate=device Token # 15: 4.633ms; value: next_token_ids=tensor([5640], device='cuda:0') mtp accept=0 prop=10094 top1=5640 accp=0.450 next=pair draft=1959 prop=1959 pred gate=device Token # 16: 115.182ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=0 prop=1959 top1=1959 accp=0.755 next=draft=27521 prop=27521 olap pair=109.4ms serial=193.5ms gain=84.1ms ratio=0.43 s0=7.7ms s1=185.8ms wait=0.2/47.3ms pred gate=device Token # 17: 115.885ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=110.2ms serial=194.8ms gain=84.5ms ratio=0.43 s0=5.5ms s1=189.2ms wait=0.2/49.5ms pred gate=device Token # 18: 3.840ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=98938 prop=98938 pred gate=device Token # 19: 114.984ms; value: next_token_ids=tensor([98938], device='cuda:0') mtp accept=1 prop=98938 top1=98938 accp=0.843 next=draft=1703 prop=1703 olap pair=109.5ms serial=193.9ms gain=84.4ms ratio=0.44 s0=4.5ms s1=189.3ms wait=0.1/50.3ms pred gate=device Token # 20: 3.937ms; value: next_token_ids=tensor([1703], device='cuda:0') mtp accept=1 prop=1703 top1=1703 accp=0.733 next=pair draft=996 prop=996 pred gate=device Token # 21: 115.752ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=draft=320 prop=478 olap pair=109.7ms serial=193.2ms gain=83.5ms ratio=0.43 s0=5.0ms s1=188.2ms wait=0.1/50.2ms pred gate=device Token # 22: 4.348ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.703 next=pair draft=7346 prop=7346 pred gate=device Token # 23: 115.873ms; value: next_token_ids=tensor([7346], device='cuda:0') mtp accept=1 prop=7346 top1=7346 accp=0.979 next=draft=303 prop=303 olap pair=109.8ms serial=192.8ms gain=83.0ms ratio=0.43 s0=7.3ms s1=185.4ms wait=0.2/47.4ms pred gate=device Token # 24: 4.263ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.999 next=pair draft=4754 prop=32257 pred gate=device Token # 25: 115.607ms; value: next_token_ids=tensor([32257], device='cuda:0') mtp accept=1 prop=32257 top1=4754 accp=0.856 next=draft=28669 prop=28669 olap pair=110.3ms serial=195.0ms gain=84.6ms ratio=0.43 s0=4.4ms s1=190.5ms wait=0.1/50.9ms pred gate=device Token # 26: 3.936ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=0 prop=28669 top1=7163 accp=0.210 next=pair draft=27521 prop=27521 pred gate=device Token # 27: 116.155ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=110.9ms serial=195.7ms gain=84.8ms ratio=0.43 s0=4.4ms s1=191.3ms wait=0.1/49.9ms pred gate=device Token # 28: 3.762ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=0.559 next=pair draft=438 prop=438 pred gate=device Token # 29: 115.743ms; value: next_token_ids=tensor([1959], device='cuda:0') mtp accept=0 prop=438 top1=1959 accp=0.049 next=draft=8283 prop=8283 olap pair=110.4ms serial=195.1ms gain=84.6ms ratio=0.43 s0=5.5ms s1=189.6ms wait=0.1/43.9ms pred gate=device Token # 30: 116.353ms; value: next_token_ids=tensor([8283], device='cuda:0') mtp accept=1 prop=8283 top1=8283 accp=0.999 next=draft=17839 prop=17839 olap pair=110.2ms serial=193.5ms gain=83.3ms ratio=0.43 s0=6.2ms s1=187.3ms wait=0.2/43.3ms pred gate=device Token # 31: 4.633ms; value: next_token_ids=tensor([2431], device='cuda:0') mtp accept=0 prop=17839 top1=17839 accp=0.940 next=pair draft=17839 prop=17839 pred gate=device Token # 32: 115.246ms; value: next_token_ids=tensor([4153], device='cuda:0') mtp accept=0 prop=17839 top1=4153 accp=0.012 next=draft=17256 prop=17256 olap pair=109.8ms serial=193.6ms gain=83.8ms ratio=0.43 s0=5.9ms s1=187.7ms wait=0.2/43.6ms pred gate=device Token # 33: 115.941ms; value: next_token_ids=tensor([17256], device='cuda:0') mtp accept=1 prop=17256 top1=9184 accp=0.630 next=draft=9184 prop=9184 olap pair=110.3ms serial=195.0ms gain=84.7ms ratio=0.43 s0=6.9ms s1=188.1ms wait=0.2/42.4ms pred gate=device Token # 34: 3.762ms; value: next_token_ids=tensor([9184], device='cuda:0') mtp accept=1 prop=9184 top1=9184 accp=0.932 next=pair draft=5870 prop=5870 pred gate=device Token # 35: 116.059ms; value: next_token_ids=tensor([5870], device='cuda:0') mtp accept=1 prop=5870 top1=5870 accp=0.999 next=draft=320 prop=320 olap pair=109.8ms serial=193.2ms gain=83.3ms ratio=0.43 s0=7.6ms s1=185.5ms wait=0.2/41.7ms pred gate=device Token # 36: 4.703ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.999 next=pair draft=7163 prop=7163 pred gate=device Token # 37: 115.189ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.979 next=draft=27521 prop=27521 olap pair=109.7ms serial=193.5ms gain=83.9ms ratio=0.43 s0=8.1ms s1=185.5ms wait=0.2/41.2ms pred gate=device Token # 38: 3.803ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=24 prop=24 pred gate=device Token # 39: 114.833ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=0.993 next=draft=438 prop=438 olap pair=109.5ms serial=193.3ms gain=83.8ms ratio=0.43 s0=6.2ms s1=187.1ms wait=0.2/43.2ms pred gate=device Token # 40: 3.780ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=0 prop=438 top1=389 accp=0.139 next=pair draft=20 prop=20 pred gate=device Token # 41: 115.818ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=64 prop=64 olap pair=110.4ms serial=195.1ms gain=84.8ms ratio=0.43 s0=8.3ms s1=186.9ms wait=0.2/40.9ms pred gate=device Token # 42: 3.881ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=0.995 next=pair draft=397 prop=397 pred gate=device Token # 43: 115.750ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=1 prop=397 top1=397 accp=1.000 next=draft=438 prop=1237 olap pair=110.5ms serial=194.7ms gain=84.2ms ratio=0.43 s0=4.5ms s1=190.1ms wait=0.1/45.0ms pred gate=device Token # 44: 3.766ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=303 accp=0.737 next=pair draft=2524 prop=2524 pred gate=device Token # 45: 115.535ms; value: next_token_ids=tensor([2524], device='cuda:0') mtp accept=1 prop=2524 top1=2524 accp=0.987 next=draft=20 prop=20 olap pair=110.1ms serial=193.9ms gain=83.8ms ratio=0.43 s0=7.6ms s1=186.2ms wait=0.2/41.6ms pred gate=device Token # 46: 3.888ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=0.999 next=pair draft=64 prop=64 pred gate=device Token # 47: 116.906ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=draft=553 prop=553 olap pair=110.8ms serial=196.0ms gain=85.2ms ratio=0.43 s0=7.5ms s1=188.5ms wait=0.2/41.5ms pred gate=device Token # 48: 4.587ms; value: next_token_ids=tensor([553], device='cuda:0') mtp accept=1 prop=553 top1=553 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 49: 116.664ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=0.963 next=draft=5769 prop=5769 olap pair=111.0ms serial=196.1ms gain=85.1ms ratio=0.43 s0=6.4ms s1=189.7ms wait=0.2/43.3ms pred gate=device Token # 50: 3.932ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=1.000 next=pair draft=22 prop=22 pred gate=device Token # 51: 115.736ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=draft=303 prop=303 olap pair=110.5ms serial=194.2ms gain=83.8ms ratio=0.43 s0=7.0ms s1=187.2ms wait=0.2/42.5ms pred gate=device Token # 52: 3.891ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.999 next=pair draft=20 prop=20 pred gate=device Token # 53: 116.349ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=64 prop=64 olap pair=111.1ms serial=196.1ms gain=85.1ms ratio=0.43 s0=5.5ms s1=190.6ms wait=0.1/44.0ms pred gate=device Token # 54: 3.774ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=397 prop=397 pred gate=device Token # 55: 115.240ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=1 prop=397 top1=397 accp=1.000 next=draft=31 prop=31 olap pair=110.0ms serial=195.0ms gain=85.0ms ratio=0.44 s0=4.5ms s1=190.5ms wait=0.1/44.9ms pred gate=device Token # 56: 3.794ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=5769 prop=5769 pred gate=device Token # 57: 115.609ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=0.911 next=draft=22 prop=22 olap pair=110.4ms serial=196.3ms gain=86.0ms ratio=0.44 s0=4.0ms s1=192.4ms wait=0.1/45.8ms pred gate=device Token # 58: 3.691ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=pair draft=64 prop=12 pred gate=device Token # 59: 116.435ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=0 prop=12 top1=64 accp=0.705 next=draft=20 prop=20 olap pair=110.3ms serial=195.6ms gain=85.3ms ratio=0.44 s0=5.5ms s1=190.1ms wait=0.2/44.0ms pred gate=device Token # 60: 117.131ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=31 prop=31 olap pair=110.8ms serial=194.7ms gain=83.9ms ratio=0.43 s0=6.4ms s1=188.3ms wait=0.2/43.4ms pred gate=device Token # 61: 4.596ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 62: 116.571ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=111.1ms serial=194.7ms gain=83.5ms ratio=0.43 s0=7.5ms s1=187.2ms wait=0.2/41.7ms pred gate=device Token # 63: 3.792ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=24 prop=24 pred gate=device Token # 64: 115.038ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=draft=14164 prop=14164 olap pair=109.8ms serial=195.3ms gain=85.4ms ratio=0.44 s0=3.9ms s1=191.3ms wait=0.1/46.0ms pred gate=device Token # 65: 3.780ms; value: next_token_ids=tensor([14164], device='cuda:0') mtp accept=1 prop=14164 top1=14164 accp=0.989 next=pair draft=2636 prop=2636 pred gate=device Token # 66: 114.918ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.985 next=draft=7163 prop=7163 olap pair=109.6ms serial=195.0ms gain=85.4ms ratio=0.44 s0=4.2ms s1=190.8ms wait=0.1/45.3ms pred gate=device Token # 67: 3.833ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 68: 116.346ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=111.0ms serial=196.3ms gain=85.3ms ratio=0.43 s0=4.5ms s1=191.8ms wait=0.1/45.2ms pred gate=device Token # 69: 3.818ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 70: 115.449ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.997 next=draft=223 prop=223 olap pair=110.1ms serial=195.3ms gain=85.1ms ratio=0.44 s0=4.1ms s1=191.2ms wait=0.1/46.0ms pred gate=device Token # 71: 3.762ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 72: 116.414ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.997 next=draft=27521 prop=27521 olap pair=111.1ms serial=196.5ms gain=85.4ms ratio=0.43 s0=4.7ms s1=191.8ms wait=0.1/45.0ms pred gate=device Token # 73: 3.757ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=24 prop=24 pred gate=device Token # 74: 115.425ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=draft=940 prop=940 olap pair=110.1ms serial=195.7ms gain=85.6ms ratio=0.44 s0=4.0ms s1=191.6ms wait=0.1/45.5ms pred gate=device Token # 75: 3.920ms; value: next_token_ids=tensor([982], device='cuda:0') mtp accept=0 prop=940 top1=982 accp=0.000 next=pair draft=223 prop=223 pred gate=device Token # 76: 116.261ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=1457 prop=1457 olap pair=111.0ms serial=197.2ms gain=86.3ms ratio=0.44 s0=6.8ms s1=190.4ms wait=0.2/42.7ms pred gate=device Token # 77: 3.726ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=pair draft=940 prop=940 pred gate=device Token # 78: 115.051ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=0.998 next=draft=223 prop=223 olap pair=109.6ms serial=194.8ms gain=85.2ms ratio=0.44 s0=4.1ms s1=190.7ms wait=0.1/45.5ms pred gate=device Token # 79: 3.848ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 80: 116.086ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=1148 prop=1148 olap pair=110.8ms serial=196.8ms gain=86.0ms ratio=0.44 s0=4.0ms s1=192.8ms wait=0.1/45.8ms pred gate=device Token # 81: 3.759ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=0.888 next=pair draft=14149 prop=14149 pred gate=device Token # 82: 115.585ms; value: next_token_ids=tensor([14149], device='cuda:0') mtp accept=1 prop=14149 top1=14149 accp=0.964 next=draft=303 prop=7163 olap pair=110.3ms serial=195.2ms gain=84.9ms ratio=0.43 s0=5.9ms s1=189.3ms wait=0.2/43.4ms pred gate=device Token # 83: 3.810ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=7163 top1=303 accp=0.804 next=pair draft=7163 prop=7163 pred gate=device Token # 84: 115.343ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=110.0ms serial=195.4ms gain=85.4ms ratio=0.44 s0=5.9ms s1=189.5ms wait=0.2/43.5ms pred gate=device Token # 85: 3.783ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 86: 115.837ms; value: next_token_ids=tensor([6391], device='cuda:0') mtp accept=0 prop=19698 top1=6391 accp=0.210 next=draft=438 prop=438 olap pair=110.5ms serial=196.7ms gain=86.2ms ratio=0.44 s0=4.7ms s1=192.0ms wait=0.1/44.8ms pred gate=device Token # 87: 115.871ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.958 next=draft=223 prop=223 olap pair=110.3ms serial=196.3ms gain=85.9ms ratio=0.44 s0=4.8ms s1=191.5ms wait=0.1/44.4ms pred gate=device Token # 88: 3.823ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 89: 116.874ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=111.5ms serial=198.6ms gain=87.0ms ratio=0.44 s0=4.1ms s1=194.5ms wait=0.1/45.6ms pred gate=device Token # 90: 3.865ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=24 prop=24 pred gate=device Token # 91: 116.024ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=draft=982 prop=982 olap pair=110.4ms serial=196.2ms gain=85.8ms ratio=0.44 s0=4.7ms s1=191.5ms wait=0.1/45.1ms pred gate=device Token # 92: 3.906ms; value: next_token_ids=tensor([982], device='cuda:0') mtp accept=1 prop=982 top1=982 accp=0.983 next=pair draft=223 prop=223 pred gate=device Token # 93: 116.910ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=1457 prop=1457 olap pair=111.5ms serial=195.9ms gain=84.4ms ratio=0.43 s0=4.8ms s1=191.1ms wait=0.1/45.0ms pred gate=device Token # 94: 3.798ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 95: 116.305ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.962 next=draft=2636 prop=867 olap pair=111.0ms serial=197.4ms gain=86.4ms ratio=0.44 s0=4.7ms s1=192.7ms wait=0.1/44.2ms pred gate=device Token # 96: 3.812ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=0 prop=867 top1=2636 accp=0.756 next=pair draft=7163 prop=7163 pred gate=device Token # 97: 115.865ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.999 next=draft=27521 prop=27521 olap pair=110.4ms serial=196.4ms gain=86.0ms ratio=0.44 s0=4.7ms s1=191.7ms wait=0.1/44.5ms pred gate=device Token # 98: 3.768ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 99: 115.338ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=draft=438 prop=438 olap pair=110.0ms serial=195.5ms gain=85.5ms ratio=0.44 s0=4.6ms s1=190.9ms wait=0.1/44.8ms pred gate=device Token # 100: 3.883ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 101: 116.060ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=7163 prop=7163 olap pair=110.8ms serial=196.8ms gain=86.1ms ratio=0.44 s0=4.6ms s1=192.2ms wait=0.1/44.9ms pred gate=device Token # 102: 3.773ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 103: 116.144ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=24 prop=24 olap pair=110.8ms serial=195.8ms gain=85.0ms ratio=0.43 s0=4.7ms s1=191.1ms wait=0.1/44.7ms pred gate=device Token # 104: 3.827ms; value: next_token_ids=tensor([6391], device='cuda:0') mtp accept=0 prop=24 top1=6391 accp=0.098 next=pair draft=940 prop=940 pred gate=device Token # 105: 116.920ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=1.000 next=draft=223 prop=223 olap pair=111.5ms serial=196.3ms gain=84.8ms ratio=0.43 s0=4.9ms s1=191.4ms wait=0.1/44.8ms pred gate=device Token # 106: 4.070ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 107: 117.006ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=438 prop=438 olap pair=110.8ms serial=196.2ms gain=85.4ms ratio=0.44 s0=5.9ms s1=190.2ms wait=0.2/43.8ms pred gate=device Token # 108: 4.662ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.966 next=pair draft=223 prop=223 pred gate=device Token # 109: 115.994ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.999 next=draft=1457 prop=1457 olap pair=110.5ms serial=196.0ms gain=85.4ms ratio=0.44 s0=5.9ms s1=190.1ms wait=0.2/43.8ms pred gate=device Token # 110: 3.788ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=0.999 next=pair draft=982 prop=982 pred gate=device Token # 111: 117.183ms; value: next_token_ids=tensor([982], device='cuda:0') mtp accept=1 prop=982 top1=982 accp=0.999 next=draft=223 prop=223 olap pair=111.0ms serial=196.4ms gain=85.4ms ratio=0.43 s0=5.8ms s1=190.6ms wait=0.2/43.6ms pred gate=device Token # 112: 4.732ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.998 next=pair draft=20 prop=20 pred gate=device Token # 113: 116.257ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=64 prop=64 olap pair=110.6ms serial=193.4ms gain=82.7ms ratio=0.43 s0=8.6ms s1=184.8ms wait=0.2/40.7ms pred gate=device Token # 114: 3.810ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=397 prop=397 pred gate=device Token # 115: 117.035ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=1 prop=397 top1=397 accp=1.000 next=draft=940 prop=940 olap pair=111.6ms serial=196.8ms gain=85.3ms ratio=0.43 s0=6.4ms s1=190.4ms wait=0.2/43.4ms pred gate=device Token # 116: 3.794ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 117: 116.905ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=19 prop=19 olap pair=110.6ms serial=196.5ms gain=85.9ms ratio=0.44 s0=5.6ms s1=190.9ms wait=0.2/44.2ms pred gate=device Token # 118: 4.431ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=320 prop=1148 pred gate=device Token # 119: 116.906ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=0 prop=1148 top1=320 accp=0.812 next=draft=1395 prop=1207 olap pair=111.6ms serial=196.9ms gain=85.3ms ratio=0.43 s0=8.6ms s1=188.3ms wait=0.2/41.2ms pred gate=device Token # 120: 117.524ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=1 prop=1207 top1=1207 accp=0.352 next=draft=13103 prop=1395 olap pair=112.0ms serial=198.1ms gain=86.1ms ratio=0.43 s0=8.2ms s1=189.9ms wait=0.2/40.9ms pred gate=device Token # 121: 3.732ms; value: next_token_ids=tensor([1395], device='cuda:0') mtp accept=1 prop=1395 top1=13103 accp=0.940 next=pair draft=4036 prop=4036 pred gate=device Token # 122: 116.946ms; value: next_token_ids=tensor([4036], device='cuda:0') mtp accept=1 prop=4036 top1=4036 accp=0.918 next=draft=718 prop=718 olap pair=110.8ms serial=195.3ms gain=84.5ms ratio=0.43 s0=6.2ms s1=189.1ms wait=0.2/43.4ms pred gate=device Token # 123: 4.615ms; value: next_token_ids=tensor([718], device='cuda:0') mtp accept=1 prop=718 top1=718 accp=0.525 next=pair draft=768 prop=768 pred gate=device Token # 124: 115.320ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.889 next=draft=7163 prop=7163 olap pair=109.9ms serial=194.7ms gain=84.9ms ratio=0.44 s0=5.5ms s1=189.2ms wait=0.1/44.3ms pred gate=device Token # 125: 3.916ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.981 next=pair draft=27521 prop=27521 pred gate=device Token # 126: 115.776ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=110.4ms serial=193.8ms gain=83.4ms ratio=0.43 s0=4.8ms s1=189.0ms wait=0.1/45.1ms pred gate=device Token # 127: 3.803ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=0.520 next=pair draft=438 prop=438 pred gate=device Token # 128: 118.295ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=112.9ms serial=198.7ms gain=85.8ms ratio=0.43 s0=5.4ms s1=193.3ms wait=0.2/44.1ms pred gate=device Token # 129: 3.775ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 130: 118.103ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=64 prop=64 olap pair=111.8ms serial=196.2ms gain=84.4ms ratio=0.43 s0=6.2ms s1=190.0ms wait=0.2/43.5ms pred gate=device Token # 131: 4.781ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=397 prop=397 pred gate=device Token # 132: 116.918ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=1 prop=397 top1=397 accp=1.000 next=draft=982 prop=982 olap pair=110.6ms serial=194.1ms gain=83.5ms ratio=0.43 s0=8.3ms s1=185.8ms wait=0.2/41.2ms pred gate=device Token # 133: 4.653ms; value: next_token_ids=tensor([982], device='cuda:0') mtp accept=1 prop=982 top1=982 accp=0.998 next=pair draft=223 prop=223 pred gate=device Token # 134: 116.768ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=1457 prop=1457 olap pair=110.8ms serial=196.2ms gain=85.4ms ratio=0.44 s0=7.9ms s1=188.3ms wait=0.2/41.9ms pred gate=device Token # 135: 4.280ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=pair draft=940 prop=940 pred gate=device Token # 136: 115.791ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=0.996 next=draft=223 prop=223 olap pair=110.3ms serial=195.1ms gain=84.8ms ratio=0.43 s0=6.1ms s1=189.0ms wait=0.2/43.4ms pred gate=device Token # 137: 3.911ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 138: 116.336ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=1148 prop=1148 olap pair=111.0ms serial=196.3ms gain=85.3ms ratio=0.43 s0=4.2ms s1=192.1ms wait=0.1/45.9ms pred gate=device Token # 139: 3.800ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=0.977 next=pair draft=422 prop=422 pred gate=device Token # 140: 116.724ms; value: next_token_ids=tensor([422], device='cuda:0') mtp accept=1 prop=422 top1=422 accp=0.624 next=draft=57573 prop=303 olap pair=110.6ms serial=194.4ms gain=83.8ms ratio=0.43 s0=7.5ms s1=186.9ms wait=0.2/41.8ms pred gate=device Token # 141: 4.277ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=57573 accp=0.787 next=pair draft=20 prop=20 pred gate=device Token # 142: 117.162ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=64 prop=64 olap pair=110.9ms serial=194.8ms gain=83.9ms ratio=0.43 s0=7.2ms s1=187.5ms wait=0.2/42.5ms pred gate=device Token # 143: 4.735ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=397 prop=397 pred gate=device Token # 144: 116.805ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=1 prop=397 top1=397 accp=1.000 next=draft=438 prop=438 olap pair=110.8ms serial=195.2ms gain=84.4ms ratio=0.43 s0=8.5ms s1=186.7ms wait=0.2/40.8ms pred gate=device Token # 145: 3.908ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.898 next=pair draft=223 prop=223 pred gate=device Token # 146: 116.922ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=7163 prop=7163 olap pair=111.6ms serial=196.8ms gain=85.2ms ratio=0.43 s0=8.3ms s1=188.5ms wait=0.2/41.3ms pred gate=device Token # 147: 3.831ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 148: 117.343ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=24 prop=24 olap pair=111.9ms serial=196.0ms gain=84.1ms ratio=0.43 s0=7.9ms s1=188.1ms wait=0.2/41.6ms pred gate=device Token # 149: 3.820ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 150: 116.763ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=2636 prop=2636 olap pair=110.6ms serial=192.8ms gain=82.2ms ratio=0.43 s0=8.4ms s1=184.5ms wait=0.2/41.1ms pred gate=device Token # 151: 4.604ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.918 next=pair draft=7163 prop=223 pred gate=device Token # 152: 117.680ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=0 prop=223 top1=7163 accp=0.962 next=draft=27521 prop=27521 olap pair=111.3ms serial=195.7ms gain=84.4ms ratio=0.43 s0=8.6ms s1=187.1ms wait=0.2/40.9ms pred gate=device Token # 153: 117.789ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=111.2ms serial=196.1ms gain=84.9ms ratio=0.43 s0=8.8ms s1=187.3ms wait=0.2/40.6ms pred gate=device Token # 154: 3.868ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=0.743 next=pair draft=438 prop=438 pred gate=device Token # 155: 117.374ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=111.2ms serial=195.7ms gain=84.5ms ratio=0.43 s0=8.5ms s1=187.2ms wait=0.2/40.8ms pred gate=device Token # 156: 4.692ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 157: 117.184ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=110.8ms serial=196.3ms gain=85.5ms ratio=0.44 s0=6.4ms s1=189.9ms wait=0.2/42.9ms pred gate=device Token # 158: 4.680ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=0.901 next=pair draft=24 prop=24 pred gate=device Token # 159: 116.051ms; value: next_token_ids=tensor([6391], device='cuda:0') mtp accept=0 prop=24 top1=6391 accp=0.396 next=draft=940 prop=940 olap pair=110.5ms serial=195.8ms gain=85.3ms ratio=0.44 s0=6.9ms s1=188.9ms wait=0.2/42.7ms pred gate=device Token # 160: 118.607ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=1.000 next=draft=223 prop=223 olap pair=112.2ms serial=197.4ms gain=85.1ms ratio=0.43 s0=6.8ms s1=190.5ms wait=0.2/42.9ms pred gate=device Token # 161: 4.223ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 162: 115.489ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=438 prop=438 olap pair=110.2ms serial=194.0ms gain=83.8ms ratio=0.43 s0=8.4ms s1=185.6ms wait=0.2/41.0ms pred gate=device Token # 163: 3.845ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.565 next=pair draft=343 prop=343 pred gate=device Token # 164: 115.952ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=343 top1=223 accp=0.041 next=draft=1457 prop=1457 olap pair=110.6ms serial=195.8ms gain=85.2ms ratio=0.44 s0=5.7ms s1=190.1ms wait=0.2/43.9ms pred gate=device Token # 165: 116.898ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=draft=982 prop=982 olap pair=111.5ms serial=197.4ms gain=85.9ms ratio=0.44 s0=4.5ms s1=192.9ms wait=0.1/45.3ms pred gate=device Token # 166: 3.853ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=0 prop=982 top1=12 accp=0.325 next=pair draft=7163 prop=7163 pred gate=device Token # 167: 117.429ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.908 next=draft=27521 prop=27521 olap pair=111.5ms serial=197.7ms gain=86.2ms ratio=0.44 s0=6.7ms s1=190.9ms wait=0.2/42.8ms pred gate=device Token # 168: 3.830ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=24 prop=24 pred gate=device Token # 169: 116.796ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=draft=940 prop=940 olap pair=110.7ms serial=195.3ms gain=84.6ms ratio=0.43 s0=7.7ms s1=187.7ms wait=0.2/41.8ms pred gate=device Token # 170: 4.004ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 171: 116.382ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=19 prop=19 olap pair=111.0ms serial=197.4ms gain=86.4ms ratio=0.44 s0=4.1ms s1=193.3ms wait=0.1/45.8ms pred gate=device Token # 172: 3.786ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 173: 116.818ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=0 prop=320 top1=438 accp=0.272 next=draft=223 prop=223 olap pair=111.3ms serial=198.1ms gain=86.8ms ratio=0.44 s0=4.8ms s1=193.3ms wait=0.1/44.6ms pred gate=device Token # 174: 116.798ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.991 next=draft=1457 prop=1457 olap pair=111.3ms serial=197.0ms gain=85.7ms ratio=0.44 s0=4.3ms s1=192.8ms wait=0.1/45.5ms pred gate=device Token # 175: 3.821ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=pair draft=12 prop=982 pred gate=device Token # 176: 118.109ms; value: next_token_ids=tensor([982], device='cuda:0') mtp accept=1 prop=982 top1=982 accp=0.490 next=draft=223 prop=223 olap pair=111.9ms serial=196.9ms gain=85.0ms ratio=0.43 s0=7.8ms s1=189.2ms wait=0.2/41.8ms pred gate=device Token # 177: 4.732ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 178: 116.475ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=64 prop=64 olap pair=111.1ms serial=196.8ms gain=85.7ms ratio=0.44 s0=6.6ms s1=190.1ms wait=0.2/42.8ms pred gate=device Token # 179: 3.812ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=397 prop=397 pred gate=device Token # 180: 116.407ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=1 prop=397 top1=397 accp=1.000 next=draft=940 prop=940 olap pair=111.1ms serial=196.6ms gain=85.5ms ratio=0.43 s0=7.1ms s1=189.5ms wait=0.2/42.2ms pred gate=device Token # 181: 3.827ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 182: 116.626ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=19 prop=19 olap pair=111.1ms serial=196.4ms gain=85.4ms ratio=0.43 s0=6.7ms s1=189.8ms wait=0.2/42.9ms pred gate=device Token # 183: 3.812ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 184: 117.209ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.993 next=draft=1207 prop=1207 olap pair=111.9ms serial=197.9ms gain=86.0ms ratio=0.43 s0=6.3ms s1=191.6ms wait=0.2/43.3ms pred gate=device Token # 185: 3.832ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=1 prop=1207 top1=1207 accp=0.911 next=pair draft=13103 prop=13103 pred gate=device Token # 186: 116.389ms; value: next_token_ids=tensor([13103], device='cuda:0') mtp accept=1 prop=13103 top1=13103 accp=0.931 next=draft=1877 prop=450 olap pair=111.0ms serial=197.6ms gain=86.6ms ratio=0.44 s0=4.1ms s1=193.5ms wait=0.1/45.8ms pred gate=device Token # 187: 3.788ms; value: next_token_ids=tensor([1877], device='cuda:0') mtp accept=0 prop=450 top1=1877 accp=0.559 next=pair draft=57482 prop=1395 pred gate=device Token # 188: 117.392ms; value: next_token_ids=tensor([9308], device='cuda:0') mtp accept=0 prop=1395 top1=9308 accp=0.136 next=draft=57482 prop=57482 olap pair=111.2ms serial=196.8ms gain=85.7ms ratio=0.44 s0=7.9ms s1=188.9ms wait=0.2/41.5ms pred gate=device Token # 189: 117.006ms; value: next_token_ids=tensor([57482], device='cuda:0') mtp accept=1 prop=57482 top1=57482 accp=0.991 next=draft=3442 prop=10 olap pair=111.3ms serial=196.8ms gain=85.5ms ratio=0.43 s0=8.9ms s1=188.0ms wait=0.2/40.5ms pred gate=device Token # 190: 3.897ms; value: next_token_ids=tensor([3442], device='cuda:0') mtp accept=0 prop=10 top1=3442 accp=0.621 next=pair draft=5870 prop=5870 pred gate=device Token # 191: 116.432ms; value: next_token_ids=tensor([5870], device='cuda:0') mtp accept=1 prop=5870 top1=5870 accp=1.000 next=draft=303 prop=303 olap pair=111.0ms serial=196.2ms gain=85.2ms ratio=0.43 s0=6.8ms s1=189.5ms wait=0.2/42.9ms pred gate=device Token # 192: 3.785ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.538 next=pair draft=7882 prop=7882 pred gate=device Token # 193: 116.664ms; value: next_token_ids=tensor([7882], device='cuda:0') mtp accept=1 prop=7882 top1=7882 accp=0.990 next=draft=343 prop=343 olap pair=111.3ms serial=195.8ms gain=84.5ms ratio=0.43 s0=6.2ms s1=189.6ms wait=0.2/43.3ms pred gate=device Token # 194: 3.978ms; value: next_token_ids=tensor([40], device='cuda:0') mtp accept=0 prop=343 top1=15206 accp=0.220 next=pair draft=26806 prop=26806 pred gate=device Token # 195: 117.934ms; value: next_token_ids=tensor([26806], device='cuda:0') mtp accept=1 prop=26806 top1=26806 accp=1.000 next=draft=996 prop=996 olap pair=112.6ms serial=198.4ms gain=85.8ms ratio=0.43 s0=6.6ms s1=191.8ms wait=0.2/43.0ms pred gate=device Token # 196: 3.822ms; value: next_token_ids=tensor([12595], device='cuda:0') mtp accept=0 prop=996 top1=12595 accp=0.124 next=pair draft=5870 prop=5870 pred gate=device Token # 197: 117.116ms; value: next_token_ids=tensor([1542], device='cuda:0') mtp accept=0 prop=5870 top1=1542 accp=0.028 next=draft=933 prop=933 olap pair=111.7ms serial=197.4ms gain=85.7ms ratio=0.43 s0=5.2ms s1=192.2ms wait=0.1/44.4ms pred gate=device Token # 198: 117.337ms; value: next_token_ids=tensor([933], device='cuda:0') mtp accept=1 prop=933 top1=933 accp=0.997 next=draft=1148 prop=1148 olap pair=111.8ms serial=197.4ms gain=85.6ms ratio=0.43 s0=4.7ms s1=192.7ms wait=0.1/45.2ms pred gate=device Token # 199: 3.770ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=0.997 next=pair draft=4029 prop=5640 pred gate=device Token # 200: 117.288ms; value: next_token_ids=tensor([40], device='cuda:0') mtp accept=0 prop=5640 top1=40 accp=0.106 next=draft=26806 prop=26806 olap pair=111.2ms serial=196.9ms gain=85.7ms ratio=0.44 s0=7.8ms s1=189.1ms wait=0.2/41.6ms pred gate=device Token # 201: 118.000ms; value: next_token_ids=tensor([26806], device='cuda:0') mtp accept=1 prop=26806 top1=26806 accp=1.000 next=draft=996 prop=996 olap pair=111.4ms serial=197.3ms gain=85.8ms ratio=0.44 s0=8.7ms s1=188.5ms wait=0.2/40.8ms pred gate=device Token # 202: 4.802ms; value: next_token_ids=tensor([50746], device='cuda:0') mtp accept=0 prop=996 top1=50746 accp=0.028 next=pair draft=20 prop=20 pred gate=device Token # 203: 116.503ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=0.968 next=draft=40919 prop=40919 olap pair=110.8ms serial=196.0ms gain=85.2ms ratio=0.43 s0=7.8ms s1=188.2ms wait=0.2/41.6ms pred gate=device Token # 204: 3.978ms; value: next_token_ids=tensor([40919], device='cuda:0') mtp accept=1 prop=40919 top1=40919 accp=0.999 next=pair draft=20 prop=20 pred gate=device Token # 205: 117.516ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=51187 prop=51187 olap pair=112.2ms serial=198.8ms gain=86.6ms ratio=0.44 s0=8.1ms s1=190.7ms wait=0.2/41.4ms pred gate=device Token # 206: 3.843ms; value: next_token_ids=tensor([51187], device='cuda:0') mtp accept=1 prop=51187 top1=51187 accp=1.000 next=pair draft=12754 prop=12754 pred gate=device Token # 207: 117.364ms; value: next_token_ids=tensor([12754], device='cuda:0') mtp accept=1 prop=12754 top1=12754 accp=0.999 next=draft=19 prop=19 olap pair=111.1ms serial=196.4ms gain=85.4ms ratio=0.43 s0=6.1ms s1=190.3ms wait=0.2/43.6ms pred gate=device Token # 208: 4.673ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 209: 116.923ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.921 next=draft=4407 prop=4407 olap pair=111.4ms serial=196.6ms gain=85.2ms ratio=0.43 s0=7.7ms s1=188.8ms wait=0.2/41.4ms pred gate=device Token # 210: 3.842ms; value: next_token_ids=tensor([4407], device='cuda:0') mtp accept=1 prop=4407 top1=4407 accp=0.994 next=pair draft=20 prop=20 pred gate=device Token # 211: 116.515ms; value: next_token_ids=tensor([14784], device='cuda:0') mtp accept=0 prop=20 top1=67231 accp=0.044 next=draft=389 prop=389 olap pair=111.2ms serial=197.9ms gain=86.7ms ratio=0.44 s0=4.3ms s1=193.6ms wait=0.1/45.3ms pred gate=device Token # 212: 115.824ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=1.000 next=draft=397 prop=397 olap pair=110.4ms serial=196.4ms gain=86.0ms ratio=0.44 s0=3.9ms s1=192.5ms wait=0.1/46.1ms pred gate=device Token # 213: 3.807ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=1 prop=397 top1=397 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 214: 117.061ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=2099 prop=2099 olap pair=111.7ms serial=198.3ms gain=86.6ms ratio=0.44 s0=4.3ms s1=194.0ms wait=0.1/45.4ms pred gate=device Token # 215: 3.766ms; value: next_token_ids=tensor([2099], device='cuda:0') mtp accept=1 prop=2099 top1=2099 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 216: 116.485ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=0.984 next=draft=301 prop=301 olap pair=111.0ms serial=196.6ms gain=85.7ms ratio=0.44 s0=4.7ms s1=192.0ms wait=0.1/45.1ms pred gate=device Token # 217: 3.863ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=1.000 next=pair draft=61947 prop=61947 pred gate=device Token # 218: 115.737ms; value: next_token_ids=tensor([61947], device='cuda:0') mtp accept=1 prop=61947 top1=61947 accp=1.000 next=draft=320 prop=478 olap pair=110.3ms serial=195.2ms gain=84.9ms ratio=0.43 s0=5.6ms s1=189.7ms wait=0.2/43.8ms pred gate=device Token # 219: 3.795ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=320 accp=0.930 next=pair draft=33518 prop=33518 pred gate=device Token # 220: 116.145ms; value: next_token_ids=tensor([18467], device='cuda:0') mtp accept=0 prop=33518 top1=13103 accp=0.151 next=draft=16992 prop=5640 olap pair=110.9ms serial=197.2ms gain=86.4ms ratio=0.44 s0=4.4ms s1=192.8ms wait=0.1/45.2ms pred gate=device Token # 221: 116.929ms; value: next_token_ids=tensor([16992], device='cuda:0') mtp accept=0 prop=5640 top1=16992 accp=0.714 next=draft=1703 prop=1703 olap pair=111.6ms serial=196.4ms gain=84.9ms ratio=0.43 s0=6.0ms s1=190.4ms wait=0.2/43.2ms pred gate=device Token # 222: 116.901ms; value: next_token_ids=tensor([5640], device='cuda:0') mtp accept=0 prop=1703 top1=5640 accp=0.075 next=draft=1877 prop=1877 olap pair=111.6ms serial=197.6ms gain=86.0ms ratio=0.44 s0=4.5ms s1=193.1ms wait=0.1/44.9ms pred gate=device Token # 223: 117.536ms; value: next_token_ids=tensor([844], device='cuda:0') mtp accept=0 prop=1877 top1=844 accp=0.122 next=draft=1703 prop=1703 olap pair=111.6ms serial=197.3ms gain=85.7ms ratio=0.43 s0=8.0ms s1=189.3ms wait=0.2/41.4ms pred gate=device Token # 224: 117.787ms; value: next_token_ids=tensor([1703], device='cuda:0') mtp accept=1 prop=1703 top1=1703 accp=0.773 next=draft=996 prop=996 olap pair=111.5ms serial=198.2ms gain=86.6ms ratio=0.44 s0=5.7ms s1=192.5ms wait=0.2/44.1ms pred gate=device Token # 225: 4.433ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=0.724 next=pair draft=28134 prop=28134 pred gate=device Token # 226: 116.579ms; value: next_token_ids=tensor([28134], device='cuda:0') mtp accept=1 prop=28134 top1=28134 accp=0.958 next=draft=320 prop=320 olap pair=111.3ms serial=197.7ms gain=86.4ms ratio=0.44 s0=5.6ms s1=192.1ms wait=0.2/44.0ms pred gate=device Token # 227: 3.734ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=7346 prop=7346 pred gate=device Token # 228: 116.387ms; value: next_token_ids=tensor([7346], device='cuda:0') mtp accept=1 prop=7346 top1=7346 accp=0.935 next=draft=303 prop=303 olap pair=111.1ms serial=196.5ms gain=85.4ms ratio=0.43 s0=5.0ms s1=191.5ms wait=0.1/45.0ms pred gate=device Token # 229: 3.741ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.999 next=pair draft=5640 prop=5640 pred gate=device Token # 230: 116.371ms; value: next_token_ids=tensor([18467], device='cuda:0') mtp accept=0 prop=5640 top1=18467 accp=0.542 next=draft=5640 prop=20495 olap pair=111.1ms serial=196.7ms gain=85.7ms ratio=0.44 s0=4.3ms s1=192.4ms wait=0.1/45.4ms pred gate=device Token # 231: 117.460ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=0 prop=20495 top1=4339 accp=0.185 next=draft=1959 prop=7163 olap pair=111.2ms serial=197.1ms gain=85.8ms ratio=0.44 s0=5.6ms s1=191.4ms wait=0.2/44.1ms pred gate=device Token # 232: 116.400ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.396 next=draft=27521 prop=27521 olap pair=110.6ms serial=196.2ms gain=85.7ms ratio=0.44 s0=5.1ms s1=191.1ms wait=0.1/44.5ms pred gate=device Token # 233: 4.658ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 234: 116.191ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=draft=2311 prop=2311 olap pair=110.8ms serial=196.4ms gain=85.5ms ratio=0.44 s0=7.3ms s1=189.0ms wait=0.2/42.1ms pred gate=device Token # 235: 3.737ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=0 prop=2311 top1=1267 accp=0.785 next=pair draft=223 prop=223 pred gate=device Token # 236: 116.146ms; value: next_token_ids=tensor([111348], device='cuda:0') mtp accept=0 prop=223 top1=111348 accp=0.062 next=draft=1703 prop=1703 olap pair=110.8ms serial=196.3ms gain=85.6ms ratio=0.44 s0=7.5ms s1=188.9ms wait=0.2/42.1ms pred gate=device Token # 237: 117.712ms; value: next_token_ids=tensor([1703], device='cuda:0') mtp accept=1 prop=1703 top1=1703 accp=1.000 next=draft=996 prop=996 olap pair=111.5ms serial=198.0ms gain=86.5ms ratio=0.44 s0=4.5ms s1=193.4ms wait=0.1/45.4ms pred gate=device Token # 238: 4.574ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=pair draft=478 prop=303 pred gate=device Token # 239: 116.523ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=0 prop=303 top1=478 accp=0.648 next=draft=14785 prop=14785 olap pair=111.1ms serial=197.4ms gain=86.3ms ratio=0.44 s0=4.9ms s1=192.5ms wait=0.1/45.0ms pred gate=device Token # 240: 115.934ms; value: next_token_ids=tensor([14785], device='cuda:0') mtp accept=1 prop=14785 top1=14785 accp=0.560 next=draft=4339 prop=4339 olap pair=110.6ms serial=195.4ms gain=84.8ms ratio=0.43 s0=4.3ms s1=191.0ms wait=0.1/45.6ms pred gate=device Token # 241: 3.720ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=1255 accp=0.544 next=pair draft=5189 prop=5189 pred gate=device Token # 242: 115.974ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=0 prop=5189 top1=7163 accp=0.061 next=draft=27521 prop=27521 olap pair=110.7ms serial=195.7ms gain=85.1ms ratio=0.43 s0=6.4ms s1=189.4ms wait=0.2/43.3ms pred gate=device Token # 243: 116.772ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=111.2ms serial=196.4ms gain=85.1ms ratio=0.43 s0=6.4ms s1=189.9ms wait=0.2/43.3ms pred gate=device Token # 244: 3.711ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=1267 prop=1267 pred gate=device Token # 245: 120.444ms; value: next_token_ids=tensor([58000], device='cuda:0') mtp accept=0 prop=1267 top1=58000 accp=0.449 next=draft=844 prop=844 olap pair=112.5ms serial=198.2ms gain=85.8ms ratio=0.43 s0=5.0ms s1=193.2ms wait=0.1/44.9ms pred gate=device Token # 246: 117.058ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=0 prop=844 top1=20 accp=0.215 next=draft=410 prop=410 olap pair=111.6ms serial=198.1ms gain=86.5ms ratio=0.44 s0=4.3ms s1=193.9ms wait=0.1/45.3ms pred gate=device Token # 247: 117.045ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=0 prop=410 top1=768 accp=0.001 next=draft=15754 prop=19585 olap pair=111.4ms serial=197.2ms gain=85.8ms ratio=0.43 s0=4.4ms s1=192.8ms wait=0.1/45.4ms pred gate=device Token # 248: 119.164ms; value: next_token_ids=tensor([19585], device='cuda:0') mtp accept=1 prop=19585 top1=19585 accp=0.232 next=draft=100791 prop=100791 olap pair=112.8ms serial=196.3ms gain=83.5ms ratio=0.43 s0=8.0ms s1=188.2ms wait=0.2/41.9ms pred gate=device Token # 249: 4.568ms; value: next_token_ids=tensor([100791], device='cuda:0') mtp accept=1 prop=100791 top1=100791 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 250: 118.263ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.987 next=draft=2636 prop=2636 olap pair=112.0ms serial=198.2ms gain=86.2ms ratio=0.43 s0=8.7ms s1=189.4ms wait=0.2/41.0ms pred gate=device Token # 251: 4.627ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=1.000 next=pair draft=2099 prop=2099 pred gate=device Token # 252: 117.915ms; value: next_token_ids=tensor([113008], device='cuda:0') mtp accept=0 prop=2099 top1=113008 accp=0.332 next=draft=20 prop=20 olap pair=111.6ms serial=197.3ms gain=85.7ms ratio=0.43 s0=7.0ms s1=190.3ms wait=0.2/42.7ms pred gate=device Token # 253: 117.590ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=124637 prop=124637 olap pair=111.0ms serial=196.9ms gain=85.9ms ratio=0.44 s0=6.2ms s1=190.8ms wait=0.2/43.5ms pred gate=device Token # 254: 4.568ms; value: next_token_ids=tensor([124637], device='cuda:0') mtp accept=1 prop=124637 top1=124637 accp=1.000 next=pair draft=876 prop=876 pred gate=device Token # 255: 117.172ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=0 prop=876 top1=876 accp=0.748 next=draft=58000 prop=58000 olap pair=111.5ms serial=197.2ms gain=85.7ms ratio=0.43 s0=8.0ms s1=189.2ms wait=0.2/41.6ms pred gate=device Token # 256: 117.171ms; value: next_token_ids=tensor([58000], device='cuda:0') mtp accept=1 prop=58000 top1=58000 accp=1.000 next=draft=21 prop=21 olap pair=111.6ms serial=197.8ms gain=86.2ms ratio=0.44 s0=6.2ms s1=191.6ms wait=0.2/43.8ms pred gate=device Token # 257: 3.832ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=pair draft=768 prop=768 pred gate=device Token # 258: 116.953ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=8283 prop=8283 olap pair=110.9ms serial=195.1ms gain=84.2ms ratio=0.43 s0=8.9ms s1=186.2ms wait=0.2/40.7ms pred gate=device Token # 259: 4.592ms; value: next_token_ids=tensor([8283], device='cuda:0') mtp accept=1 prop=8283 top1=8283 accp=1.000 next=pair draft=548 prop=548 pred gate=device Token # 260: 117.560ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=1 prop=548 top1=49076 accp=0.355 next=draft=768 prop=768 olap pair=111.3ms serial=196.8ms gain=85.5ms ratio=0.43 s0=8.1ms s1=188.7ms wait=0.2/41.4ms pred gate=device Token # 261: 4.702ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.999 next=pair draft=19 prop=19 pred gate=device Token # 262: 118.469ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=13 prop=13 olap pair=112.1ms serial=197.4ms gain=85.3ms ratio=0.43 s0=5.5ms s1=191.9ms wait=0.1/44.4ms pred gate=device Token # 263: 4.772ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=18 prop=18 pred gate=device Token # 264: 117.210ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=13 prop=13 olap pair=111.7ms serial=198.7ms gain=87.0ms ratio=0.44 s0=4.5ms s1=194.3ms wait=0.1/45.5ms pred gate=device Token # 265: 3.788ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=22 prop=22 pred gate=device Token # 266: 117.042ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=draft=13 prop=13 olap pair=111.7ms serial=197.0ms gain=85.3ms ratio=0.43 s0=7.8ms s1=189.2ms wait=0.2/41.9ms pred gate=device Token # 267: 3.821ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=26 prop=26 pred gate=device Token # 268: 116.256ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=draft=13 prop=13 olap pair=111.0ms serial=197.1ms gain=86.1ms ratio=0.44 s0=4.3ms s1=192.8ms wait=0.1/45.6ms pred gate=device Token # 269: 3.865ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=23 prop=23 pred gate=device Token # 270: 117.784ms; value: next_token_ids=tensor([23], device='cuda:0') mtp accept=1 prop=23 top1=23 accp=1.000 next=draft=13 prop=13 olap pair=111.8ms serial=198.8ms gain=87.0ms ratio=0.44 s0=4.6ms s1=194.1ms wait=0.1/45.1ms pred gate=device Token # 271: 3.820ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=25 prop=25 pred gate=device Token # 272: 117.614ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=draft=13 prop=13 olap pair=111.4ms serial=196.5ms gain=85.1ms ratio=0.43 s0=6.3ms s1=190.3ms wait=0.2/43.5ms pred gate=device Token # 273: 4.730ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=24 prop=24 pred gate=device Token # 274: 116.699ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=draft=13 prop=13 olap pair=111.2ms serial=196.9ms gain=85.7ms ratio=0.44 s0=4.9ms s1=192.0ms wait=0.1/45.0ms pred gate=device Token # 275: 3.834ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=18 prop=18 pred gate=device Token # 276: 116.862ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=13 prop=13 olap pair=111.5ms serial=197.1ms gain=85.6ms ratio=0.43 s0=4.4ms s1=192.6ms wait=0.1/45.5ms pred gate=device Token # 277: 3.796ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 278: 116.635ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=438 prop=438 olap pair=111.3ms serial=195.4ms gain=84.1ms ratio=0.43 s0=5.8ms s1=189.6ms wait=0.2/43.7ms pred gate=device Token # 279: 3.791ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.933 next=pair draft=223 prop=223 pred gate=device Token # 280: 116.891ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=2111 prop=2111 olap pair=111.6ms serial=197.7ms gain=86.1ms ratio=0.44 s0=5.2ms s1=192.6ms wait=0.1/44.5ms pred gate=device Token # 281: 3.819ms; value: next_token_ids=tensor([2111], device='cuda:0') mtp accept=1 prop=2111 top1=2111 accp=1.000 next=pair draft=1148 prop=1148 pred gate=device Token # 282: 116.793ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=1148 top1=303 accp=0.083 next=draft=2111 prop=2111 olap pair=111.6ms serial=197.4ms gain=85.8ms ratio=0.43 s0=4.3ms s1=193.0ms wait=0.1/45.5ms pred gate=device Token # 283: 117.003ms; value: next_token_ids=tensor([2111], device='cuda:0') mtp accept=1 prop=2111 top1=2111 accp=1.000 next=draft=113008 prop=58000 olap pair=111.6ms serial=197.1ms gain=85.5ms ratio=0.43 s0=6.1ms s1=191.0ms wait=0.2/43.5ms pred gate=device Token # 284: 3.785ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=0 prop=58000 top1=113008 accp=0.812 next=pair draft=223 prop=223 pred gate=device Token # 285: 116.769ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=21 prop=21 olap pair=111.3ms serial=196.1ms gain=84.8ms ratio=0.43 s0=7.6ms s1=188.6ms wait=0.2/42.0ms pred gate=device Token # 286: 3.790ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 287: 116.788ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.999 next=draft=223 prop=223 olap pair=111.3ms serial=196.5ms gain=85.1ms ratio=0.43 s0=6.6ms s1=189.9ms wait=0.2/43.2ms pred gate=device Token # 288: 3.768ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 289: 116.958ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=303 prop=303 olap pair=111.6ms serial=196.4ms gain=84.8ms ratio=0.43 s0=4.7ms s1=191.7ms wait=0.1/45.4ms pred gate=device Token # 290: 3.837ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.999 next=pair draft=2636 prop=2636 pred gate=device Token # 291: 116.489ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=1.000 next=draft=113008 prop=113008 olap pair=111.1ms serial=197.0ms gain=85.8ms ratio=0.44 s0=6.0ms s1=190.9ms wait=0.2/43.6ms pred gate=device Token # 292: 3.780ms; value: next_token_ids=tensor([113008], device='cuda:0') mtp accept=1 prop=113008 top1=113008 accp=1.000 next=pair draft=21 prop=21 pred gate=device Token # 293: 116.818ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=draft=124637 prop=124637 olap pair=111.4ms serial=197.2ms gain=85.7ms ratio=0.43 s0=4.7ms s1=192.4ms wait=0.1/45.2ms pred gate=device Token # 294: 3.754ms; value: next_token_ids=tensor([124637], device='cuda:0') mtp accept=1 prop=124637 top1=124637 accp=1.000 next=pair draft=478 prop=478 pred gate=device Token # 295: 116.164ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=draft=58000 prop=58000 olap pair=110.8ms serial=195.5ms gain=84.7ms ratio=0.43 s0=6.8ms s1=188.7ms wait=0.2/43.0ms pred gate=device Token # 296: 3.803ms; value: next_token_ids=tensor([58000], device='cuda:0') mtp accept=1 prop=58000 top1=58000 accp=1.000 next=pair draft=23 prop=23 pred gate=device Token # 297: 118.207ms; value: next_token_ids=tensor([23], device='cuda:0') mtp accept=1 prop=23 top1=23 accp=1.000 next=draft=768 prop=768 olap pair=112.3ms serial=198.1ms gain=85.7ms ratio=0.43 s0=6.7ms s1=191.4ms wait=0.2/43.2ms pred gate=device Token # 298: 3.793ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=4755 prop=4755 pred gate=device Token # 299: 116.435ms; value: next_token_ids=tensor([4755], device='cuda:0') mtp accept=1 prop=4755 top1=4755 accp=0.984 next=draft=11616 prop=11616 olap pair=111.0ms serial=197.3ms gain=86.2ms ratio=0.44 s0=4.2ms s1=193.1ms wait=0.1/45.7ms pred gate=device Token # 300: 3.809ms; value: next_token_ids=tensor([11616], device='cuda:0') mtp accept=1 prop=11616 top1=11616 accp=1.000 next=pair draft=389 prop=389 pred gate=device Token # 301: 117.219ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=0.997 next=draft=19 prop=19 olap pair=111.8ms serial=196.9ms gain=85.2ms ratio=0.43 s0=4.5ms s1=192.5ms wait=0.1/45.5ms pred gate=device Token # 302: 3.743ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 303: 117.301ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=2636 prop=2636 olap pair=111.9ms serial=198.9ms gain=87.0ms ratio=0.44 s0=4.6ms s1=194.3ms wait=0.1/45.3ms pred gate=device Token # 304: 3.832ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.990 next=pair draft=113008 prop=113008 pred gate=device Token # 305: 116.628ms; value: next_token_ids=tensor([113008], device='cuda:0') mtp accept=1 prop=113008 top1=113008 accp=1.000 next=draft=23 prop=23 olap pair=111.3ms serial=198.0ms gain=86.7ms ratio=0.44 s0=4.2ms s1=193.7ms wait=0.1/45.7ms pred gate=device Token # 306: 3.838ms; value: next_token_ids=tensor([23], device='cuda:0') mtp accept=1 prop=23 top1=23 accp=1.000 next=pair draft=124637 prop=124637 pred gate=device Token # 307: 117.314ms; value: next_token_ids=tensor([124637], device='cuda:0') mtp accept=1 prop=124637 top1=124637 accp=1.000 next=draft=478 prop=478 olap pair=111.5ms serial=195.5ms gain=84.0ms ratio=0.43 s0=6.9ms s1=188.6ms wait=0.2/42.7ms pred gate=device Token # 308: 3.873ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=pair draft=58000 prop=58000 pred gate=device Token # 309: 117.309ms; value: next_token_ids=tensor([58000], device='cuda:0') mtp accept=1 prop=58000 top1=58000 accp=1.000 next=draft=25 prop=25 olap pair=111.1ms serial=196.4ms gain=85.4ms ratio=0.43 s0=7.7ms s1=188.7ms wait=0.2/41.9ms pred gate=device Token # 310: 4.026ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=pair draft=768 prop=768 pred gate=device Token # 311: 118.416ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=18467 prop=7163 olap pair=112.3ms serial=198.3ms gain=86.0ms ratio=0.43 s0=6.5ms s1=191.8ms wait=0.2/43.2ms pred gate=device Token # 312: 4.579ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.390 next=pair draft=27521 prop=27521 pred gate=device Token # 313: 117.245ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=111.7ms serial=197.5ms gain=85.8ms ratio=0.43 s0=4.3ms s1=193.2ms wait=0.1/45.4ms pred gate=device Token # 314: 3.773ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=1267 prop=1267 pred gate=device Token # 315: 117.029ms; value: next_token_ids=tensor([24106], device='cuda:0') mtp accept=0 prop=1267 top1=24106 accp=0.216 next=draft=223 prop=223 olap pair=111.6ms serial=198.3ms gain=86.7ms ratio=0.44 s0=4.5ms s1=193.8ms wait=0.1/44.9ms pred gate=device Token # 316: 116.812ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=25 prop=25 olap pair=111.3ms serial=197.6ms gain=86.2ms ratio=0.44 s0=5.9ms s1=191.6ms wait=0.2/43.6ms pred gate=device Token # 317: 3.752ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=pair draft=1148 prop=1148 pred gate=device Token # 318: 117.127ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=0.989 next=draft=18467 prop=18467 olap pair=111.3ms serial=196.7ms gain=85.4ms ratio=0.43 s0=8.8ms s1=187.9ms wait=0.2/40.8ms pred gate=device Token # 319: 3.859ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=0 prop=18467 top1=4339 accp=0.074 next=pair draft=768 prop=768 pred gate=device Token # 320: 116.873ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=0 prop=768 top1=768 accp=0.817 next=draft=27521 prop=27521 olap pair=111.5ms serial=195.7ms gain=84.2ms ratio=0.43 s0=7.5ms s1=188.2ms wait=0.2/42.0ms pred gate=device Token # 321: 116.951ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=111.4ms serial=196.6ms gain=85.2ms ratio=0.43 s0=6.3ms s1=190.3ms wait=0.2/43.5ms pred gate=device Token # 322: 3.785ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=1267 prop=1267 pred gate=device Token # 323: 117.996ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=draft=223 prop=223 olap pair=112.1ms serial=197.6ms gain=85.5ms ratio=0.43 s0=8.1ms s1=189.5ms wait=0.2/41.5ms pred gate=device Token # 324: 3.877ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=25 prop=25 pred gate=device Token # 325: 116.639ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=draft=320 prop=320 olap pair=111.2ms serial=197.4ms gain=86.2ms ratio=0.44 s0=5.7ms s1=191.7ms wait=0.2/44.1ms pred gate=device Token # 326: 3.737ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.750 next=pair draft=18467 prop=18467 pred gate=device Token # 327: 117.176ms; value: next_token_ids=tensor([18467], device='cuda:0') mtp accept=1 prop=18467 top1=18467 accp=0.786 next=draft=4339 prop=4339 olap pair=111.0ms serial=196.2ms gain=85.2ms ratio=0.43 s0=6.4ms s1=189.8ms wait=0.2/43.3ms pred gate=device Token # 328: 4.529ms; value: next_token_ids=tensor([15552], device='cuda:0') mtp accept=0 prop=4339 top1=4339 accp=0.775 next=pair draft=4339 prop=4339 pred gate=device Token # 329: 117.888ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=0.998 next=draft=768 prop=768 olap pair=111.7ms serial=197.7ms gain=86.0ms ratio=0.44 s0=8.6ms s1=189.1ms wait=0.2/41.1ms pred gate=device Token # 330: 4.615ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.987 next=pair draft=25 prop=7163 pred gate=device Token # 331: 116.197ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=1255 accp=0.164 next=draft=27521 prop=27521 olap pair=110.7ms serial=195.4ms gain=84.6ms ratio=0.43 s0=5.4ms s1=190.0ms wait=0.1/44.4ms pred gate=device Token # 332: 3.798ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 333: 118.184ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=0.775 next=draft=24106 prop=24106 olap pair=112.1ms serial=197.8ms gain=85.7ms ratio=0.43 s0=5.0ms s1=192.8ms wait=0.1/44.8ms pred gate=device Token # 334: 4.675ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=0 prop=24106 top1=438 accp=0.075 next=pair draft=223 prop=223 pred gate=device Token # 335: 117.768ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=7163 prop=7163 olap pair=111.4ms serial=197.3ms gain=85.9ms ratio=0.44 s0=8.2ms s1=189.1ms wait=0.2/41.4ms pred gate=device Token # 336: 4.630ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 337: 117.183ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=6391 prop=6391 olap pair=111.7ms serial=196.9ms gain=85.2ms ratio=0.43 s0=7.5ms s1=189.4ms wait=0.2/42.1ms pred gate=device Token # 338: 3.747ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=0 prop=6391 top1=24 accp=0.248 next=pair draft=982 prop=982 pred gate=device Token # 339: 117.874ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=0 prop=982 top1=12 accp=0.276 next=draft=1457 prop=1457 olap pair=111.6ms serial=195.5ms gain=83.9ms ratio=0.43 s0=8.0ms s1=187.5ms wait=0.2/41.6ms pred gate=device Token # 340: 117.318ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=draft=940 prop=940 olap pair=111.5ms serial=196.6ms gain=85.1ms ratio=0.43 s0=7.3ms s1=189.3ms wait=0.2/42.4ms pred gate=device Token # 341: 3.798ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=0.955 next=pair draft=223 prop=223 pred gate=device Token # 342: 117.715ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.987 next=draft=19 prop=19 olap pair=112.1ms serial=197.9ms gain=85.7ms ratio=0.43 s0=4.6ms s1=193.2ms wait=0.1/45.5ms pred gate=device Token # 343: 3.758ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 344: 117.419ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.765 next=draft=1207 prop=1207 olap pair=111.3ms serial=197.4ms gain=86.0ms ratio=0.44 s0=4.6ms s1=192.8ms wait=0.1/45.5ms pred gate=device Token # 345: 4.668ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=1 prop=1207 top1=1207 accp=1.000 next=pair draft=13103 prop=13103 pred gate=device Token # 346: 117.188ms; value: next_token_ids=tensor([13103], device='cuda:0') mtp accept=1 prop=13103 top1=13103 accp=0.998 next=draft=40438 prop=642 olap pair=111.6ms serial=195.8ms gain=84.2ms ratio=0.43 s0=8.8ms s1=187.0ms wait=0.2/40.8ms pred gate=device Token # 347: 3.776ms; value: next_token_ids=tensor([4036], device='cuda:0') mtp accept=0 prop=642 top1=4036 accp=0.343 next=pair draft=4339 prop=4339 pred gate=device Token # 348: 117.188ms; value: next_token_ids=tensor([2311], device='cuda:0') mtp accept=0 prop=4339 top1=2311 accp=0.453 next=draft=13183 prop=13183 olap pair=111.1ms serial=196.3ms gain=85.2ms ratio=0.43 s0=7.9ms s1=188.5ms wait=0.2/42.0ms pred gate=device Token # 349: 118.414ms; value: next_token_ids=tensor([20251], device='cuda:0') mtp accept=0 prop=13183 top1=20251 accp=0.111 next=draft=13183 prop=13183 olap pair=112.4ms serial=197.5ms gain=85.1ms ratio=0.43 s0=8.0ms s1=189.5ms wait=0.2/41.3ms pred gate=device Token # 350: 116.473ms; value: next_token_ids=tensor([13183], device='cuda:0') mtp accept=1 prop=13183 top1=13183 accp=0.713 next=draft=320 prop=320 olap pair=110.9ms serial=196.6ms gain=85.6ms ratio=0.44 s0=6.7ms s1=189.8ms wait=0.2/42.9ms pred gate=device Token # 351: 3.778ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=478 accp=0.494 next=pair draft=4029 prop=4754 pred gate=device Token # 352: 116.395ms; value: next_token_ids=tensor([4029], device='cuda:0') mtp accept=0 prop=4754 top1=4029 accp=0.431 next=draft=2541 prop=2541 olap pair=111.1ms serial=197.4ms gain=86.3ms ratio=0.44 s0=5.0ms s1=192.4ms wait=0.2/44.8ms pred gate=device Token # 353: 116.949ms; value: next_token_ids=tensor([18467], device='cuda:0') mtp accept=0 prop=2541 top1=2541 accp=0.779 next=draft=2541 prop=2541 olap pair=110.9ms serial=195.8ms gain=85.0ms ratio=0.43 s0=6.3ms s1=189.6ms wait=0.2/43.3ms pred gate=device Token # 354: 117.847ms; value: next_token_ids=tensor([2541], device='cuda:0') mtp accept=1 prop=2541 top1=2541 accp=0.612 next=draft=768 prop=768 olap pair=111.9ms serial=197.6ms gain=85.6ms ratio=0.43 s0=7.9ms s1=189.7ms wait=0.2/41.9ms pred gate=device Token # 355: 3.737ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=0 prop=768 top1=4339 accp=0.374 next=pair draft=2073 prop=2073 pred gate=device Token # 356: 117.171ms; value: next_token_ids=tensor([2073], device='cuda:0') mtp accept=1 prop=2073 top1=2073 accp=1.000 next=draft=303 prop=303 olap pair=111.1ms serial=196.1ms gain=85.1ms ratio=0.43 s0=8.5ms s1=187.7ms wait=0.2/41.0ms pred gate=device Token # 357: 4.641ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.999 next=pair draft=1207 prop=1207 pred gate=device Token # 358: 117.176ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=1 prop=1207 top1=1207 accp=0.981 next=draft=13470 prop=13470 olap pair=111.2ms serial=196.2ms gain=84.9ms ratio=0.43 s0=7.8ms s1=188.3ms wait=0.2/41.5ms pred gate=device Token # 359: 3.805ms; value: next_token_ids=tensor([13470], device='cuda:0') mtp accept=1 prop=13470 top1=13470 accp=0.943 next=pair draft=4372 prop=4372 pred gate=device Token # 360: 117.884ms; value: next_token_ids=tensor([4372], device='cuda:0') mtp accept=1 prop=4372 top1=4372 accp=0.924 next=draft=18804 prop=18804 olap pair=112.0ms serial=198.3ms gain=86.3ms ratio=0.44 s0=7.9ms s1=190.4ms wait=0.2/41.7ms pred gate=device Token # 361: 3.782ms; value: next_token_ids=tensor([18804], device='cuda:0') mtp accept=1 prop=18804 top1=18804 accp=0.979 next=pair draft=303 prop=36101 pred gate=device Token # 362: 116.057ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=36101 top1=303 accp=0.799 next=draft=1300 prop=18467 olap pair=110.7ms serial=196.0ms gain=85.3ms ratio=0.44 s0=6.8ms s1=189.2ms wait=0.2/42.8ms pred gate=device Token # 363: 117.026ms; value: next_token_ids=tensor([1300], device='cuda:0') mtp accept=0 prop=18467 top1=1300 accp=0.533 next=draft=49365 prop=49365 olap pair=111.7ms serial=197.0ms gain=85.4ms ratio=0.43 s0=4.6ms s1=192.5ms wait=0.1/45.4ms pred gate=device Token # 364: 118.320ms; value: next_token_ids=tensor([49365], device='cuda:0') mtp accept=1 prop=49365 top1=49365 accp=0.994 next=draft=4339 prop=4339 olap pair=112.1ms serial=197.6ms gain=85.4ms ratio=0.43 s0=6.7ms s1=190.8ms wait=0.2/42.8ms pred gate=device Token # 365: 4.533ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=0.997 next=pair draft=478 prop=478 pred gate=device Token # 366: 116.902ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.998 next=draft=18467 prop=4339 olap pair=111.4ms serial=197.8ms gain=86.4ms ratio=0.44 s0=6.8ms s1=191.0ms wait=0.2/42.9ms pred gate=device Token # 367: 3.734ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4992 accp=0.403 next=pair draft=7163 prop=7163 pred gate=device Token # 368: 116.498ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.899 next=draft=27521 prop=27521 olap pair=111.2ms serial=197.9ms gain=86.8ms ratio=0.44 s0=4.0ms s1=194.0ms wait=0.1/46.0ms pred gate=device Token # 369: 3.710ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 370: 116.737ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=draft=1267 prop=1267 olap pair=111.4ms serial=197.6ms gain=86.2ms ratio=0.44 s0=4.3ms s1=193.3ms wait=0.1/45.6ms pred gate=device Token # 371: 3.738ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 372: 118.664ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=25 prop=25 olap pair=112.6ms serial=199.9ms gain=87.3ms ratio=0.44 s0=4.9ms s1=195.1ms wait=0.1/45.0ms pred gate=device Token # 373: 4.550ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=pair draft=5189 prop=5189 pred gate=device Token # 374: 117.679ms; value: next_token_ids=tensor([5189], device='cuda:0') mtp accept=1 prop=5189 top1=5189 accp=0.700 next=draft=2821 prop=7163 olap pair=111.7ms serial=194.6ms gain=83.0ms ratio=0.43 s0=8.8ms s1=185.8ms wait=0.2/40.4ms pred gate=device Token # 375: 3.777ms; value: next_token_ids=tensor([4992], device='cuda:0') mtp accept=0 prop=7163 top1=4992 accp=0.139 next=pair draft=2821 prop=2821 pred gate=device Token # 376: 118.032ms; value: next_token_ids=tensor([2821], device='cuda:0') mtp accept=1 prop=2821 top1=2821 accp=0.948 next=draft=768 prop=768 olap pair=111.8ms serial=195.4ms gain=83.6ms ratio=0.43 s0=7.7ms s1=187.7ms wait=0.2/41.8ms pred gate=device Token # 377: 4.583ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=32257 prop=1255 pred gate=device Token # 378: 118.316ms; value: next_token_ids=tensor([553], device='cuda:0') mtp accept=0 prop=1255 top1=553 accp=0.089 next=draft=64 prop=64 olap pair=112.0ms serial=195.7ms gain=83.7ms ratio=0.43 s0=7.1ms s1=188.6ms wait=0.2/42.7ms pred gate=device Token # 379: 117.379ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=0 prop=64 top1=1267 accp=0.068 next=draft=223 prop=223 olap pair=111.7ms serial=196.5ms gain=84.8ms ratio=0.43 s0=8.3ms s1=188.2ms wait=0.2/41.3ms pred gate=device Token # 380: 117.919ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=25 prop=25 olap pair=112.3ms serial=198.4ms gain=86.1ms ratio=0.43 s0=4.7ms s1=193.7ms wait=0.1/45.5ms pred gate=device Token # 381: 3.776ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 382: 117.945ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=112.6ms serial=199.3ms gain=86.7ms ratio=0.44 s0=5.8ms s1=193.5ms wait=0.2/44.0ms pred gate=device Token # 383: 3.864ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.885 next=pair draft=21 prop=21 pred gate=device Token # 384: 119.272ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=draft=303 prop=303 olap pair=113.2ms serial=200.7ms gain=87.5ms ratio=0.44 s0=4.7ms s1=196.0ms wait=0.1/45.5ms pred gate=device Token # 385: 4.644ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.997 next=pair draft=2636 prop=2636 pred gate=device Token # 386: 119.131ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.752 next=draft=7163 prop=7163 olap pair=113.6ms serial=198.9ms gain=85.3ms ratio=0.43 s0=5.8ms s1=193.1ms wait=0.2/46.8ms pred gate=device Token # 387: 3.751ms; value: next_token_ids=tensor([18467], device='cuda:0') mtp accept=0 prop=7163 top1=18467 accp=0.002 next=pair draft=1275 prop=15552 pred gate=device Token # 388: 116.431ms; value: next_token_ids=tensor([43846], device='cuda:0') mtp accept=0 prop=15552 top1=43846 accp=0.173 next=draft=8283 prop=8283 olap pair=111.1ms serial=197.4ms gain=86.3ms ratio=0.44 s0=5.6ms s1=191.8ms wait=0.2/44.3ms pred gate=device Token # 389: 117.923ms; value: next_token_ids=tensor([8283], device='cuda:0') mtp accept=1 prop=8283 top1=8283 accp=0.974 next=draft=320 prop=320 olap pair=112.5ms serial=199.1ms gain=86.5ms ratio=0.43 s0=4.5ms s1=194.6ms wait=0.1/45.5ms pred gate=device Token # 390: 3.742ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.818 next=pair draft=1207 prop=1207 pred gate=device Token # 391: 117.076ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=1 prop=1207 top1=1207 accp=0.927 next=draft=1395 prop=1395 olap pair=111.8ms serial=198.2ms gain=86.4ms ratio=0.44 s0=4.3ms s1=193.9ms wait=0.1/45.6ms pred gate=device Token # 392: 3.746ms; value: next_token_ids=tensor([13103], device='cuda:0') mtp accept=0 prop=1395 top1=13103 accp=0.396 next=pair draft=49391 prop=49391 pred gate=device Token # 393: 118.079ms; value: next_token_ids=tensor([1395], device='cuda:0') mtp accept=0 prop=49391 top1=1395 accp=0.323 next=draft=6573 prop=6573 olap pair=112.0ms serial=197.6ms gain=85.7ms ratio=0.43 s0=8.0ms s1=189.6ms wait=0.2/41.9ms pred gate=device Token # 394: 119.021ms; value: next_token_ids=tensor([6573], device='cuda:0') mtp accept=1 prop=6573 top1=6573 accp=0.989 next=draft=768 prop=768 olap pair=112.6ms serial=197.0ms gain=84.4ms ratio=0.43 s0=8.4ms s1=188.6ms wait=0.2/41.3ms pred gate=device Token # 395: 4.606ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=32257 prop=32257 pred gate=device Token # 396: 117.365ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=0 prop=32257 top1=7163 accp=0.345 next=draft=27521 prop=27521 olap pair=111.9ms serial=196.0ms gain=84.2ms ratio=0.43 s0=8.5ms s1=187.5ms wait=0.2/41.1ms pred gate=device Token # 397: 117.860ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=111.6ms serial=196.5ms gain=84.9ms ratio=0.43 s0=8.0ms s1=188.5ms wait=0.2/41.6ms pred gate=device Token # 398: 4.291ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=0.993 next=pair draft=438 prop=438 pred gate=device Token # 399: 117.889ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.988 next=draft=223 prop=223 olap pair=111.7ms serial=196.2ms gain=84.6ms ratio=0.43 s0=8.4ms s1=187.8ms wait=0.2/41.4ms pred gate=device Token # 400: 4.604ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 401: 116.557ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=111.1ms serial=197.1ms gain=85.9ms ratio=0.44 s0=6.4ms s1=190.7ms wait=0.2/43.7ms pred gate=device Token # 402: 3.732ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=6391 prop=6391 pred gate=device Token # 403: 117.036ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=0 prop=6391 top1=24 accp=0.010 next=draft=982 prop=982 olap pair=111.7ms serial=197.8ms gain=86.0ms ratio=0.44 s0=6.9ms s1=190.8ms wait=0.2/43.3ms pred gate=device Token # 404: 118.110ms; value: next_token_ids=tensor([982], device='cuda:0') mtp accept=1 prop=982 top1=982 accp=0.725 next=draft=223 prop=223 olap pair=112.6ms serial=197.3ms gain=84.8ms ratio=0.43 s0=6.1ms s1=191.3ms wait=0.2/44.1ms pred gate=device Token # 405: 3.887ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=1457 prop=1457 pred gate=device Token # 406: 116.801ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=draft=940 prop=940 olap pair=111.5ms serial=196.9ms gain=85.4ms ratio=0.43 s0=6.9ms s1=189.9ms wait=0.2/43.1ms pred gate=device Token # 407: 3.905ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=0.989 next=pair draft=223 prop=223 pred gate=device Token # 408: 117.067ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=19 prop=19 olap pair=111.7ms serial=199.0ms gain=87.3ms ratio=0.44 s0=3.9ms s1=195.1ms wait=0.1/46.0ms pred gate=device Token # 409: 3.712ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 410: 117.126ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.709 next=draft=1735 prop=1735 olap pair=111.8ms serial=197.7ms gain=85.9ms ratio=0.43 s0=4.5ms s1=193.3ms wait=0.1/45.6ms pred gate=device Token # 411: 3.772ms; value: next_token_ids=tensor([1735], device='cuda:0') mtp accept=1 prop=1735 top1=1735 accp=0.878 next=pair draft=4339 prop=4339 pred gate=device Token # 412: 118.176ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=0.997 next=draft=7163 prop=7163 olap pair=112.1ms serial=197.9ms gain=85.8ms ratio=0.43 s0=7.8ms s1=190.1ms wait=0.2/41.7ms pred gate=device Token # 413: 4.650ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 414: 117.425ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=24 prop=24 olap pair=111.1ms serial=196.2ms gain=85.1ms ratio=0.43 s0=7.3ms s1=188.9ms wait=0.2/42.6ms pred gate=device Token # 415: 4.770ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=pair draft=1267 prop=1267 pred gate=device Token # 416: 117.686ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=draft=223 prop=223 olap pair=111.4ms serial=196.3ms gain=84.9ms ratio=0.43 s0=6.6ms s1=189.8ms wait=0.2/43.5ms pred gate=device Token # 417: 4.777ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=25 prop=25 pred gate=device Token # 418: 117.399ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=draft=320 prop=320 olap pair=111.9ms serial=197.9ms gain=86.0ms ratio=0.43 s0=9.0ms s1=189.0ms wait=0.2/40.9ms pred gate=device Token # 419: 3.830ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=320 top1=303 accp=0.337 next=pair draft=4272 prop=4272 pred gate=device Token # 420: 118.330ms; value: next_token_ids=tensor([4272], device='cuda:0') mtp accept=1 prop=4272 top1=4272 accp=0.981 next=draft=1457 prop=4339 olap pair=112.5ms serial=198.5ms gain=86.0ms ratio=0.43 s0=6.1ms s1=192.4ms wait=0.2/43.6ms pred gate=device Token # 421: 3.875ms; value: next_token_ids=tensor([63760], device='cuda:0') mtp accept=0 prop=4339 top1=63760 accp=0.352 next=pair draft=1457 prop=1457 pred gate=device Token # 422: 117.883ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=draft=1267 prop=1267 olap pair=111.6ms serial=197.9ms gain=86.3ms ratio=0.44 s0=7.3ms s1=190.6ms wait=0.2/42.6ms pred gate=device Token # 423: 4.579ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 424: 116.596ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=25 prop=25 olap pair=111.0ms serial=197.3ms gain=86.3ms ratio=0.44 s0=4.7ms s1=192.6ms wait=0.1/45.2ms pred gate=device Token # 425: 3.791ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 426: 116.560ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=68147 prop=68147 olap pair=111.1ms serial=197.8ms gain=86.7ms ratio=0.44 s0=4.6ms s1=193.2ms wait=0.1/45.0ms pred gate=device Token # 427: 3.847ms; value: next_token_ids=tensor([1107], device='cuda:0') mtp accept=0 prop=68147 top1=68147 accp=0.950 next=pair draft=19 prop=19 pred gate=device Token # 428: 116.513ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=1267 prop=1267 olap pair=111.1ms serial=197.7ms gain=86.6ms ratio=0.44 s0=4.6ms s1=193.2ms wait=0.1/45.1ms pred gate=device Token # 429: 3.752ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=0 prop=1267 top1=478 accp=0.090 next=pair draft=7346 prop=7346 pred gate=device Token # 430: 117.449ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=0 prop=7346 top1=4339 accp=0.081 next=draft=7163 prop=7163 olap pair=111.2ms serial=197.5ms gain=86.3ms ratio=0.44 s0=5.1ms s1=192.3ms wait=0.1/44.4ms pred gate=device Token # 431: 117.341ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.997 next=draft=27521 prop=27521 olap pair=111.7ms serial=198.2ms gain=86.5ms ratio=0.44 s0=6.7ms s1=191.4ms wait=0.2/42.9ms pred gate=device Token # 432: 3.745ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=24 prop=24 pred gate=device Token # 433: 117.347ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=draft=1267 prop=1267 olap pair=111.4ms serial=197.7ms gain=86.3ms ratio=0.44 s0=4.8ms s1=192.9ms wait=0.1/44.9ms pred gate=device Token # 434: 3.816ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 435: 116.620ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=25 prop=25 olap pair=110.8ms serial=195.9ms gain=85.1ms ratio=0.43 s0=6.7ms s1=189.2ms wait=0.2/43.0ms pred gate=device Token # 436: 3.787ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=pair draft=768 prop=768 pred gate=device Token # 437: 117.353ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.944 next=draft=32257 prop=20 olap pair=111.2ms serial=196.6ms gain=85.4ms ratio=0.43 s0=6.2ms s1=190.4ms wait=0.2/43.5ms pred gate=device Token # 438: 4.655ms; value: next_token_ids=tensor([32257], device='cuda:0') mtp accept=0 prop=20 top1=32257 accp=0.761 next=pair draft=20 prop=20 pred gate=device Token # 439: 116.260ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=0.735 next=draft=64 prop=64 olap pair=110.6ms serial=194.8ms gain=84.2ms ratio=0.43 s0=7.6ms s1=187.2ms wait=0.2/41.9ms pred gate=device Token # 440: 3.772ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=397 prop=397 pred gate=device Token # 441: 116.351ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=1 prop=397 top1=397 accp=0.695 next=draft=31 prop=438 olap pair=111.1ms serial=195.5ms gain=84.4ms ratio=0.43 s0=4.9ms s1=190.6ms wait=0.1/45.4ms pred gate=device Token # 442: 3.796ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.408 next=pair draft=223 prop=223 pred gate=device Token # 443: 116.160ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=7163 prop=7163 olap pair=110.8ms serial=196.3ms gain=85.4ms ratio=0.44 s0=4.3ms s1=191.9ms wait=0.1/45.7ms pred gate=device Token # 444: 3.801ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 445: 116.519ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=24 prop=24 olap pair=111.2ms serial=196.5ms gain=85.3ms ratio=0.43 s0=4.4ms s1=192.1ms wait=0.1/45.7ms pred gate=device Token # 446: 3.762ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 447: 117.075ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.979 next=draft=4339 prop=4339 olap pair=111.0ms serial=196.2ms gain=85.2ms ratio=0.43 s0=7.9ms s1=188.3ms wait=0.2/41.7ms pred gate=device Token # 448: 4.335ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=20 accp=0.328 next=pair draft=20 prop=20 pred gate=device Token # 449: 117.405ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=64 prop=64 olap pair=111.3ms serial=197.9ms gain=86.6ms ratio=0.44 s0=4.5ms s1=193.3ms wait=0.1/45.4ms pred gate=device Token # 450: 4.682ms; value: next_token_ids=tensor([51187], device='cuda:0') mtp accept=0 prop=64 top1=51187 accp=0.047 next=pair draft=1267 prop=1267 pred gate=device Token # 451: 116.187ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=draft=223 prop=223 olap pair=110.7ms serial=196.3ms gain=85.7ms ratio=0.44 s0=5.8ms s1=190.6ms wait=0.2/43.9ms pred gate=device Token # 452: 3.855ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=25 prop=25 pred gate=device Token # 453: 117.545ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=draft=74985 prop=301 olap pair=111.4ms serial=197.1ms gain=85.7ms ratio=0.43 s0=5.1ms s1=192.1ms wait=0.1/44.8ms pred gate=device Token # 454: 4.744ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=74985 accp=0.905 next=pair draft=15252 prop=15252 pred gate=device Token # 455: 116.406ms; value: next_token_ids=tensor([15252], device='cuda:0') mtp accept=1 prop=15252 top1=15252 accp=0.697 next=draft=320 prop=320 olap pair=110.9ms serial=197.2ms gain=86.3ms ratio=0.44 s0=5.0ms s1=192.2ms wait=0.1/44.4ms pred gate=device Token # 456: 3.805ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=0 prop=320 top1=320 accp=0.900 next=pair draft=20 prop=20 pred gate=device Token # 457: 116.922ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=0.991 next=draft=64 prop=64 olap pair=111.6ms serial=196.9ms gain=85.3ms ratio=0.43 s0=4.7ms s1=192.2ms wait=0.1/44.9ms pred gate=device Token # 458: 3.812ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 459: 117.214ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=0.990 next=draft=31 prop=31 olap pair=111.8ms serial=197.5ms gain=85.7ms ratio=0.43 s0=4.4ms s1=193.1ms wait=0.1/45.4ms pred gate=device Token # 460: 3.864ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=0.899 next=pair draft=20 prop=20 pred gate=device Token # 461: 116.849ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=1267 prop=1267 olap pair=110.8ms serial=196.2ms gain=85.4ms ratio=0.44 s0=7.2ms s1=189.0ms wait=0.2/42.7ms pred gate=device Token # 462: 4.660ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=0.929 next=pair draft=25 prop=25 pred gate=device Token # 463: 117.335ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=25 top1=223 accp=0.086 next=draft=25 prop=25 olap pair=111.8ms serial=196.0ms gain=84.2ms ratio=0.43 s0=7.7ms s1=188.3ms wait=0.2/42.2ms pred gate=device Token # 464: 117.128ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=draft=31 prop=31 olap pair=111.7ms serial=197.4ms gain=85.7ms ratio=0.43 s0=4.3ms s1=193.1ms wait=0.1/45.8ms pred gate=device Token # 465: 3.838ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 466: 117.474ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=303 prop=303 olap pair=112.1ms serial=198.0ms gain=85.9ms ratio=0.43 s0=6.2ms s1=191.8ms wait=0.2/43.6ms pred gate=device Token # 467: 3.789ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.996 next=pair draft=20 prop=20 pred gate=device Token # 468: 118.392ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=64 prop=64 olap pair=112.4ms serial=198.7ms gain=86.3ms ratio=0.43 s0=5.4ms s1=193.3ms wait=0.1/44.6ms pred gate=device Token # 469: 3.864ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 470: 118.525ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=31 prop=31 olap pair=113.1ms serial=199.7ms gain=86.6ms ratio=0.43 s0=7.9ms s1=191.9ms wait=0.2/42.0ms pred gate=device Token # 471: 3.892ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=22 prop=22 pred gate=device Token # 472: 116.531ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=draft=1267 prop=1267 olap pair=111.1ms serial=196.2ms gain=85.1ms ratio=0.43 s0=7.4ms s1=188.8ms wait=0.2/42.5ms pred gate=device Token # 473: 3.778ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=303 accp=0.499 next=pair draft=25 prop=25 pred gate=device Token # 474: 116.761ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=25 top1=223 accp=0.079 next=draft=25 prop=25 olap pair=111.4ms serial=196.8ms gain=85.4ms ratio=0.43 s0=5.1ms s1=191.6ms wait=0.1/44.8ms pred gate=device Token # 475: 117.260ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=draft=31 prop=31 olap pair=111.9ms serial=199.0ms gain=87.1ms ratio=0.44 s0=4.8ms s1=194.2ms wait=0.1/45.2ms pred gate=device Token # 476: 3.832ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=0.998 next=pair draft=22 prop=22 pred gate=device Token # 477: 116.206ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=draft=303 prop=303 olap pair=110.9ms serial=194.5ms gain=83.7ms ratio=0.43 s0=7.3ms s1=187.3ms wait=0.2/42.4ms pred gate=device Token # 478: 3.816ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 479: 118.364ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=64 prop=64 olap pair=112.1ms serial=198.4ms gain=86.3ms ratio=0.43 s0=5.1ms s1=193.4ms wait=0.1/44.9ms pred gate=device Token # 480: 4.597ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=21 prop=21 pred gate=device Token # 481: 116.887ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=draft=31 prop=31 olap pair=110.9ms serial=196.0ms gain=85.1ms ratio=0.43 s0=9.0ms s1=187.1ms wait=0.2/40.7ms pred gate=device Token # 482: 3.823ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=26 prop=26 pred gate=device Token # 483: 116.635ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=draft=1267 prop=1267 olap pair=111.3ms serial=198.3ms gain=87.0ms ratio=0.44 s0=4.1ms s1=194.3ms wait=0.1/45.9ms pred gate=device Token # 484: 3.713ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 485: 117.302ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.686 next=draft=25 prop=25 olap pair=111.1ms serial=196.8ms gain=85.7ms ratio=0.44 s0=7.4ms s1=189.4ms wait=0.2/42.3ms pred gate=device Token # 486: 4.653ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 487: 118.510ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=19 prop=19 olap pair=112.1ms serial=199.0ms gain=86.8ms ratio=0.44 s0=7.0ms s1=191.9ms wait=0.2/42.8ms pred gate=device Token # 488: 3.809ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 489: 116.942ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=20 prop=20 olap pair=111.6ms serial=197.7ms gain=86.1ms ratio=0.44 s0=6.1ms s1=191.6ms wait=0.2/43.7ms pred gate=device Token # 490: 3.818ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=0.998 next=pair draft=64 prop=64 pred gate=device Token # 491: 117.877ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=draft=22 prop=22 olap pair=111.9ms serial=198.3ms gain=86.5ms ratio=0.44 s0=5.9ms s1=192.4ms wait=0.2/44.1ms pred gate=device Token # 492: 4.554ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 493: 117.432ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=926 prop=926 olap pair=112.1ms serial=199.0ms gain=86.8ms ratio=0.44 s0=4.2ms s1=194.8ms wait=0.1/45.8ms pred gate=device Token # 494: 3.779ms; value: next_token_ids=tensor([926], device='cuda:0') mtp accept=1 prop=926 top1=926 accp=0.995 next=pair draft=1267 prop=1267 pred gate=device Token # 495: 117.846ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=draft=223 prop=223 olap pair=111.7ms serial=196.8ms gain=85.1ms ratio=0.43 s0=7.4ms s1=189.3ms wait=0.2/42.1ms pred gate=device Token # 496: 4.524ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.979 next=pair draft=25 prop=25 pred gate=device Token # 497: 116.696ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=draft=31 prop=31 olap pair=111.4ms serial=197.4ms gain=86.0ms ratio=0.44 s0=6.0ms s1=191.3ms wait=0.2/43.8ms pred gate=device Token # 498: 3.837ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 499: 117.025ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=303 prop=303 olap pair=111.6ms serial=197.8ms gain=86.1ms ratio=0.44 s0=5.0ms s1=192.8ms wait=0.1/44.8ms pred gate=device Token # 500: 3.840ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.994 next=pair draft=20 prop=20 pred gate=device Token # 501: 117.538ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=0 prop=20 top1=20 accp=0.714 next=draft=15252 prop=15252 olap pair=112.2ms serial=198.0ms gain=85.7ms ratio=0.43 s0=4.4ms s1=193.6ms wait=0.1/45.6ms pred gate=device Token # 502: 118.328ms; value: next_token_ids=tensor([15252], device='cuda:0') mtp accept=1 prop=15252 top1=15252 accp=0.995 next=draft=389 prop=389 olap pair=112.9ms serial=199.3ms gain=86.4ms ratio=0.43 s0=4.9ms s1=194.4ms wait=0.1/44.9ms pred gate=device Token # 503: 3.916ms; value: next_token_ids=tensor([545], device='cuda:0') mtp accept=0 prop=389 top1=545 accp=0.109 next=pair draft=21 prop=21 pred gate=device Token # 504: 118.296ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=draft=1148 prop=1148 olap pair=112.0ms serial=197.9ms gain=85.9ms ratio=0.43 s0=6.5ms s1=191.4ms wait=0.2/43.3ms pred gate=device Token # 505: 4.659ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=1.000 next=pair draft=14149 prop=14149 pred gate=device Token # 506: 116.602ms; value: next_token_ids=tensor([14149], device='cuda:0') mtp accept=1 prop=14149 top1=14149 accp=0.997 next=draft=768 prop=768 olap pair=111.0ms serial=195.0ms gain=84.0ms ratio=0.43 s0=8.9ms s1=186.2ms wait=0.2/40.7ms pred gate=device Token # 507: 3.792ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=768 top1=768 accp=0.793 next=pair draft=20 prop=20 pred gate=device Token # 508: 117.307ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=64 prop=64 olap pair=111.9ms serial=198.3ms gain=86.3ms ratio=0.44 s0=4.4ms s1=193.9ms wait=0.1/45.7ms pred gate=device Token # 509: 3.792ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=0.999 next=pair draft=21 prop=21 pred gate=device Token # 510: 116.894ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=draft=1267 prop=56930 olap pair=110.7ms serial=196.0ms gain=85.3ms ratio=0.44 s0=5.8ms s1=190.2ms wait=0.2/43.9ms pred gate=device Token # 511: 4.103ms; value: next_token_ids=tensor([56930], device='cuda:0') mtp accept=1 prop=56930 top1=56930 accp=0.265 next=pair draft=223 prop=223 pred gate=device Token # 512: 116.615ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.769 next=draft=19 prop=19 olap pair=111.2ms serial=198.1ms gain=86.8ms ratio=0.44 s0=4.2ms s1=193.9ms wait=0.1/45.7ms pred gate=device Token # 513: 3.757ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=1267 prop=343 pred gate=device Token # 514: 117.438ms; value: next_token_ids=tensor([343], device='cuda:0') mtp accept=1 prop=343 top1=1267 accp=0.930 next=draft=5158 prop=5158 olap pair=111.8ms serial=196.9ms gain=85.1ms ratio=0.43 s0=7.8ms s1=189.1ms wait=0.2/42.0ms pred gate=device Token # 515: 3.805ms; value: next_token_ids=tensor([5158], device='cuda:0') mtp accept=1 prop=5158 top1=5158 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 516: 117.663ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=25 prop=25 olap pair=112.3ms serial=198.8ms gain=86.5ms ratio=0.44 s0=7.5ms s1=191.3ms wait=0.2/42.3ms pred gate=device Token # 517: 3.828ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=pair draft=16128 prop=16128 pred gate=device Token # 518: 116.637ms; value: next_token_ids=tensor([16128], device='cuda:0') mtp accept=1 prop=16128 top1=16128 accp=1.000 next=draft=2636 prop=2636 olap pair=111.2ms serial=197.5ms gain=86.4ms ratio=0.44 s0=4.4ms s1=193.1ms wait=0.1/45.4ms pred gate=device Token # 519: 3.795ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 520: 116.537ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=0.891 next=draft=64 prop=64 olap pair=111.1ms serial=197.6ms gain=86.6ms ratio=0.44 s0=4.2ms s1=193.4ms wait=0.1/45.7ms pred gate=device Token # 521: 3.819ms; value: next_token_ids=tensor([40919], device='cuda:0') mtp accept=0 prop=64 top1=40919 accp=0.121 next=pair draft=21 prop=21 pred gate=device Token # 522: 116.456ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=0.967 next=draft=77 prop=77 olap pair=111.0ms serial=197.0ms gain=86.0ms ratio=0.44 s0=4.2ms s1=192.7ms wait=0.1/45.6ms pred gate=device Token # 523: 3.938ms; value: next_token_ids=tensor([77], device='cuda:0') mtp accept=1 prop=77 top1=77 accp=1.000 next=pair draft=11 prop=11 pred gate=device Token # 524: 116.730ms; value: next_token_ids=tensor([11], device='cuda:0') mtp accept=1 prop=11 top1=11 accp=1.000 next=draft=56930 prop=56930 olap pair=111.3ms serial=197.9ms gain=86.6ms ratio=0.44 s0=4.4ms s1=193.6ms wait=0.1/45.5ms pred gate=device Token # 525: 3.839ms; value: next_token_ids=tensor([56930], device='cuda:0') mtp accept=1 prop=56930 top1=56930 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 526: 116.958ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=19 prop=19 olap pair=111.5ms serial=196.4ms gain=84.8ms ratio=0.43 s0=5.7ms s1=190.7ms wait=0.2/44.0ms pred gate=device Token # 527: 3.785ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=1267 prop=1267 pred gate=device Token # 528: 117.046ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=0.989 next=draft=223 prop=223 olap pair=111.7ms serial=194.9ms gain=83.2ms ratio=0.43 s0=7.1ms s1=187.8ms wait=0.2/42.7ms pred gate=device Token # 529: 3.862ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=25 prop=25 pred gate=device Token # 530: 117.818ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=draft=320 prop=320 olap pair=112.5ms serial=198.5ms gain=86.0ms ratio=0.43 s0=5.3ms s1=193.2ms wait=0.1/44.6ms pred gate=device Token # 531: 3.788ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=3996 prop=3996 pred gate=device Token # 532: 117.614ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=0 prop=3996 top1=397 accp=0.069 next=draft=1267 prop=438 olap pair=112.2ms serial=196.5ms gain=84.3ms ratio=0.43 s0=4.8ms s1=191.7ms wait=0.1/45.3ms pred gate=device Token # 533: 117.766ms; value: next_token_ids=tensor([58000], device='cuda:0') mtp accept=0 prop=438 top1=58000 accp=0.222 next=draft=21 prop=21 olap pair=111.5ms serial=197.0ms gain=85.5ms ratio=0.43 s0=6.8ms s1=190.1ms wait=0.2/42.5ms pred gate=device Token # 534: 118.738ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=draft=4169 prop=4169 olap pair=112.2ms serial=197.3ms gain=85.1ms ratio=0.43 s0=8.0ms s1=189.3ms wait=0.2/41.7ms pred gate=device Token # 535: 4.579ms; value: next_token_ids=tensor([4169], device='cuda:0') mtp accept=1 prop=4169 top1=4169 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 536: 117.923ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=303 prop=303 olap pair=111.7ms serial=195.8ms gain=84.1ms ratio=0.43 s0=8.9ms s1=186.9ms wait=0.2/40.4ms pred gate=device Token # 537: 4.712ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.993 next=pair draft=2636 prop=2636 pred gate=device Token # 538: 117.392ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=1.000 next=draft=20 prop=20 olap pair=111.8ms serial=196.5ms gain=84.7ms ratio=0.43 s0=7.1ms s1=189.4ms wait=0.2/42.8ms pred gate=device Token # 539: 3.842ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=64 prop=64 pred gate=device Token # 540: 117.772ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=draft=397 prop=397 olap pair=112.2ms serial=197.0ms gain=84.8ms ratio=0.43 s0=8.6ms s1=188.4ms wait=0.2/41.1ms pred gate=device Token # 541: 3.788ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=1 prop=397 top1=397 accp=1.000 next=pair draft=56930 prop=56930 pred gate=device Token # 542: 117.805ms; value: next_token_ids=tensor([56930], device='cuda:0') mtp accept=1 prop=56930 top1=56930 accp=0.939 next=draft=223 prop=223 olap pair=112.4ms serial=198.3ms gain=85.9ms ratio=0.43 s0=5.1ms s1=193.2ms wait=0.1/44.9ms pred gate=device Token # 543: 3.902ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 544: 117.228ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=64 prop=64 olap pair=111.9ms serial=197.0ms gain=85.1ms ratio=0.43 s0=4.8ms s1=192.2ms wait=0.1/45.4ms pred gate=device Token # 545: 3.853ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 546: 117.230ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=438 prop=438 olap pair=111.8ms serial=198.2ms gain=86.4ms ratio=0.44 s0=6.1ms s1=192.1ms wait=0.2/43.7ms pred gate=device Token # 547: 3.886ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.792 next=pair draft=223 prop=223 pred gate=device Token # 548: 116.882ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=22 prop=22 olap pair=111.4ms serial=197.5ms gain=86.1ms ratio=0.44 s0=5.3ms s1=192.3ms wait=0.1/44.7ms pred gate=device Token # 549: 3.791ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=pair draft=1267 prop=1267 pred gate=device Token # 550: 117.392ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=0.999 next=draft=223 prop=223 olap pair=111.9ms serial=199.1ms gain=87.2ms ratio=0.44 s0=4.0ms s1=195.1ms wait=0.1/46.0ms pred gate=device Token # 551: 3.896ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=25 prop=25 pred gate=device Token # 552: 116.283ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=draft=320 prop=320 olap pair=110.9ms serial=197.3ms gain=86.3ms ratio=0.44 s0=4.1ms s1=193.1ms wait=0.1/45.7ms pred gate=device Token # 553: 3.829ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.996 next=pair draft=2636 prop=2636 pred gate=device Token # 554: 116.725ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.990 next=draft=7163 prop=7163 olap pair=110.9ms serial=197.0ms gain=86.1ms ratio=0.44 s0=4.4ms s1=192.6ms wait=0.1/45.6ms pred gate=device Token # 555: 4.184ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.994 next=pair draft=27521 prop=27521 pred gate=device Token # 556: 118.813ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=24 prop=24 olap pair=112.5ms serial=200.2ms gain=87.7ms ratio=0.44 s0=4.2ms s1=196.0ms wait=0.1/45.8ms pred gate=device Token # 557: 4.644ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=pair draft=56930 prop=56930 pred gate=device Token # 558: 118.278ms; value: next_token_ids=tensor([56930], device='cuda:0') mtp accept=1 prop=56930 top1=1267 accp=0.889 next=draft=223 prop=223 olap pair=111.9ms serial=196.9ms gain=85.0ms ratio=0.43 s0=8.6ms s1=188.3ms wait=0.2/41.1ms pred gate=device Token # 559: 4.698ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=22 prop=22 pred gate=device Token # 560: 116.982ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=draft=1267 prop=1267 olap pair=111.3ms serial=195.8ms gain=84.4ms ratio=0.43 s0=7.0ms s1=188.7ms wait=0.2/43.0ms pred gate=device Token # 561: 3.872ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=0.972 next=pair draft=223 prop=223 pred gate=device Token # 562: 117.992ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=25 prop=25 olap pair=111.8ms serial=196.1ms gain=84.3ms ratio=0.43 s0=7.0ms s1=189.1ms wait=0.2/43.2ms pred gate=device Token # 563: 4.258ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=pair draft=320 prop=478 pred gate=device Token # 564: 116.226ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=320 accp=0.792 next=draft=4272 prop=3996 olap pair=110.9ms serial=195.5ms gain=84.7ms ratio=0.43 s0=7.7ms s1=187.8ms wait=0.2/41.8ms pred gate=device Token # 565: 3.797ms; value: next_token_ids=tensor([3996], device='cuda:0') mtp accept=1 prop=3996 top1=4272 accp=0.923 next=pair draft=1457 prop=1457 pred gate=device Token # 566: 117.175ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=draft=12 prop=982 olap pair=111.8ms serial=196.1ms gain=84.4ms ratio=0.43 s0=8.8ms s1=187.3ms wait=0.2/40.9ms pred gate=device Token # 567: 3.815ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=0 prop=982 top1=1267 accp=0.061 next=pair draft=223 prop=223 pred gate=device Token # 568: 117.046ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=25 prop=25 olap pair=111.4ms serial=197.5ms gain=86.1ms ratio=0.44 s0=4.5ms s1=193.0ms wait=0.1/45.4ms pred gate=device Token # 569: 3.713ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 570: 118.574ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=112.3ms serial=197.8ms gain=85.5ms ratio=0.43 s0=7.5ms s1=190.3ms wait=0.2/42.2ms pred gate=device Token # 571: 4.633ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 572: 117.514ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=0.893 next=draft=1237 prop=1237 olap pair=111.6ms serial=195.6ms gain=84.0ms ratio=0.43 s0=8.7ms s1=187.0ms wait=0.2/40.9ms pred gate=device Token # 573: 3.963ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=2524 prop=2524 pred gate=device Token # 574: 116.728ms; value: next_token_ids=tensor([2524], device='cuda:0') mtp accept=1 prop=2524 top1=2524 accp=1.000 next=draft=25 prop=25 olap pair=111.3ms serial=196.3ms gain=85.0ms ratio=0.43 s0=8.8ms s1=187.5ms wait=0.2/41.1ms pred gate=device Token # 575: 3.761ms; value: next_token_ids=tensor([3565], device='cuda:0') mtp accept=0 prop=25 top1=3565 accp=0.011 next=pair draft=31 prop=31 pred gate=device Token # 576: 116.964ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=0 prop=31 top1=389 accp=0.001 next=draft=25 prop=25 olap pair=111.6ms serial=197.3ms gain=85.8ms ratio=0.43 s0=6.4ms s1=190.9ms wait=0.2/43.4ms pred gate=device Token # 577: 116.895ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=929 accp=0.307 next=draft=301 prop=301 olap pair=111.3ms serial=196.6ms gain=85.2ms ratio=0.43 s0=4.6ms s1=192.0ms wait=0.1/45.5ms pred gate=device Token # 578: 3.821ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=0.973 next=pair draft=56571 prop=56571 pred gate=device Token # 579: 117.718ms; value: next_token_ids=tensor([56571], device='cuda:0') mtp accept=1 prop=56571 top1=56571 accp=1.000 next=draft=14164 prop=14164 olap pair=112.3ms serial=199.7ms gain=87.3ms ratio=0.44 s0=6.7ms s1=192.9ms wait=0.2/43.2ms pred gate=device Token # 580: 3.753ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=14164 top1=303 accp=0.120 next=pair draft=1457 prop=1457 pred gate=device Token # 581: 117.696ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=0.969 next=draft=1267 prop=1267 olap pair=111.5ms serial=197.7ms gain=86.3ms ratio=0.44 s0=6.1ms s1=191.6ms wait=0.2/43.7ms pred gate=device Token # 582: 4.015ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=0 prop=1267 top1=15 accp=0.017 next=pair draft=3565 prop=3565 pred gate=device Token # 583: 117.339ms; value: next_token_ids=tensor([3565], device='cuda:0') mtp accept=1 prop=3565 top1=3565 accp=1.000 next=draft=31 prop=31 olap pair=111.1ms serial=196.6ms gain=85.4ms ratio=0.43 s0=6.6ms s1=190.0ms wait=0.2/43.3ms pred gate=device Token # 584: 4.691ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 585: 116.380ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=14164 prop=14164 olap pair=110.9ms serial=196.2ms gain=85.3ms ratio=0.43 s0=8.3ms s1=187.9ms wait=0.2/41.4ms pred gate=device Token # 586: 3.784ms; value: next_token_ids=tensor([14164], device='cuda:0') mtp accept=1 prop=14164 top1=14164 accp=0.809 next=pair draft=2636 prop=2636 pred gate=device Token # 587: 117.407ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=1.000 next=draft=7163 prop=7163 olap pair=112.1ms serial=198.9ms gain=86.8ms ratio=0.44 s0=4.7ms s1=194.1ms wait=0.1/45.1ms pred gate=device Token # 588: 3.753ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 589: 117.210ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=0.847 next=draft=24 prop=24 olap pair=111.4ms serial=196.5ms gain=85.2ms ratio=0.43 s0=6.1ms s1=190.4ms wait=0.2/43.9ms pred gate=device Token # 590: 3.794ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=0.999 next=pair draft=12 prop=12 pred gate=device Token # 591: 117.372ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=0.975 next=draft=1457 prop=1457 olap pair=111.1ms serial=196.7ms gain=85.6ms ratio=0.43 s0=5.1ms s1=191.5ms wait=0.1/44.9ms pred gate=device Token # 592: 4.593ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=pair draft=56930 prop=56930 pred gate=device Token # 593: 118.013ms; value: next_token_ids=tensor([56930], device='cuda:0') mtp accept=1 prop=56930 top1=56930 accp=0.919 next=draft=223 prop=223 olap pair=111.7ms serial=197.5ms gain=85.8ms ratio=0.43 s0=8.9ms s1=188.6ms wait=0.2/41.0ms pred gate=device Token # 594: 4.753ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.995 next=pair draft=22 prop=22 pred gate=device Token # 595: 116.684ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=draft=12 prop=12 olap pair=111.2ms serial=197.1ms gain=85.9ms ratio=0.44 s0=6.9ms s1=190.2ms wait=0.2/43.0ms pred gate=device Token # 596: 3.850ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=0.823 next=pair draft=20 prop=20 pred gate=device Token # 597: 116.448ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=438 prop=438 olap pair=111.1ms serial=197.0ms gain=85.9ms ratio=0.44 s0=4.4ms s1=192.6ms wait=0.1/45.6ms pred gate=device Token # 598: 3.823ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.931 next=pair draft=223 prop=223 pred gate=device Token # 599: 116.676ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=26 prop=26 olap pair=111.2ms serial=197.8ms gain=86.6ms ratio=0.44 s0=4.2ms s1=193.7ms wait=0.1/45.5ms pred gate=device Token # 600: 3.774ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=pair draft=1267 prop=1267 pred gate=device Token # 601: 117.685ms; value: next_token_ids=tensor([56930], device='cuda:0') mtp accept=0 prop=1267 top1=56930 accp=0.003 next=draft=223 prop=223 olap pair=111.4ms serial=197.3ms gain=86.0ms ratio=0.44 s0=7.5ms s1=189.8ms wait=0.2/42.3ms pred gate=device Token # 602: 116.338ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=19 prop=19 olap pair=110.7ms serial=196.3ms gain=85.5ms ratio=0.44 s0=8.8ms s1=187.5ms wait=0.2/40.7ms pred gate=device Token # 603: 3.758ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=1267 prop=1267 pred gate=device Token # 604: 117.442ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=draft=223 prop=223 olap pair=111.0ms serial=196.7ms gain=85.7ms ratio=0.44 s0=8.6ms s1=188.1ms wait=0.2/41.0ms pred gate=device Token # 605: 4.829ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=25 prop=25 pred gate=device Token # 606: 117.588ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=draft=320 prop=320 olap pair=111.6ms serial=196.3ms gain=84.7ms ratio=0.43 s0=8.7ms s1=187.6ms wait=0.2/41.2ms pred gate=device Token # 607: 3.861ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.990 next=pair draft=4272 prop=4272 pred gate=device Token # 608: 116.256ms; value: next_token_ids=tensor([4272], device='cuda:0') mtp accept=1 prop=4272 top1=4272 accp=0.984 next=draft=1107 prop=1107 olap pair=110.8ms serial=195.6ms gain=84.8ms ratio=0.43 s0=4.8ms s1=190.8ms wait=0.1/45.0ms pred gate=device Token # 609: 3.867ms; value: next_token_ids=tensor([1107], device='cuda:0') mtp accept=1 prop=1107 top1=1107 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 610: 117.247ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=768 prop=768 olap pair=111.8ms serial=197.5ms gain=85.7ms ratio=0.43 s0=7.2ms s1=190.2ms wait=0.2/42.5ms pred gate=device Token # 611: 3.774ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.873 next=pair draft=19 prop=7163 pred gate=device Token # 612: 116.738ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=0 prop=7163 top1=19 accp=0.947 next=draft=13 prop=13 olap pair=111.3ms serial=195.9ms gain=84.6ms ratio=0.43 s0=7.3ms s1=188.7ms wait=0.2/42.3ms pred gate=device Token # 613: 116.749ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=19 prop=19 olap pair=111.2ms serial=196.2ms gain=85.0ms ratio=0.43 s0=6.8ms s1=189.3ms wait=0.2/43.0ms pred gate=device Token # 614: 3.815ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 615: 116.660ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=20 prop=20 olap pair=111.2ms serial=196.6ms gain=85.4ms ratio=0.43 s0=5.7ms s1=190.9ms wait=0.2/44.0ms pred gate=device Token # 616: 3.831ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=1267 prop=1267 pred gate=device Token # 617: 116.920ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=draft=223 prop=223 olap pair=111.5ms serial=197.2ms gain=85.7ms ratio=0.43 s0=5.5ms s1=191.8ms wait=0.1/44.5ms pred gate=device Token # 618: 3.816ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=25 prop=25 pred gate=device Token # 619: 116.037ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=draft=320 prop=320 olap pair=110.7ms serial=194.9ms gain=84.3ms ratio=0.43 s0=6.9ms s1=188.1ms wait=0.2/42.7ms pred gate=device Token # 620: 3.846ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=2636 prop=2636 pred gate=device Token # 621: 116.633ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=1.000 next=draft=7163 prop=7163 olap pair=111.3ms serial=197.4ms gain=86.1ms ratio=0.44 s0=4.3ms s1=193.0ms wait=0.1/45.6ms pred gate=device Token # 622: 3.829ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 623: 118.335ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=112.1ms serial=197.1ms gain=85.0ms ratio=0.43 s0=6.8ms s1=190.3ms wait=0.2/43.0ms pred gate=device Token # 624: 4.777ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=0.999 next=pair draft=56930 prop=56930 pred gate=device Token # 625: 117.277ms; value: next_token_ids=tensor([56930], device='cuda:0') mtp accept=1 prop=56930 top1=56930 accp=0.970 next=draft=223 prop=223 olap pair=111.7ms serial=197.4ms gain=85.8ms ratio=0.43 s0=8.5ms s1=188.9ms wait=0.2/41.0ms pred gate=device Token # 626: 3.850ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 627: 118.247ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=1267 prop=1267 olap pair=112.1ms serial=197.4ms gain=85.3ms ratio=0.43 s0=6.9ms s1=190.5ms wait=0.2/43.0ms pred gate=device Token # 628: 4.839ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 629: 117.613ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=25 prop=25 olap pair=112.0ms serial=198.2ms gain=86.2ms ratio=0.43 s0=6.6ms s1=191.6ms wait=0.2/43.3ms pred gate=device Token # 630: 3.814ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 631: 117.272ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.866 next=draft=41772 prop=41772 olap pair=111.1ms serial=196.3ms gain=85.1ms ratio=0.43 s0=6.7ms s1=189.5ms wait=0.2/43.1ms pred gate=device Token # 632: 4.253ms; value: next_token_ids=tensor([41772], device='cuda:0') mtp accept=1 prop=41772 top1=2099 accp=0.457 next=pair draft=18 prop=18 pred gate=device Token # 633: 117.625ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=303 prop=303 olap pair=111.5ms serial=197.6ms gain=86.1ms ratio=0.44 s0=7.3ms s1=190.2ms wait=0.2/42.4ms pred gate=device Token # 634: 4.653ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=0 prop=303 top1=320 accp=0.313 next=pair draft=2636 prop=2636 pred gate=device Token # 635: 117.084ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.995 next=draft=113008 prop=113008 olap pair=111.4ms serial=196.7ms gain=85.3ms ratio=0.43 s0=8.3ms s1=188.4ms wait=0.2/41.3ms pred gate=device Token # 636: 3.854ms; value: next_token_ids=tensor([113008], device='cuda:0') mtp accept=1 prop=113008 top1=113008 accp=1.000 next=pair draft=25 prop=25 pred gate=device Token # 637: 117.859ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=draft=124637 prop=124637 olap pair=111.7ms serial=196.6ms gain=84.9ms ratio=0.43 s0=8.5ms s1=188.1ms wait=0.2/41.1ms pred gate=device Token # 638: 4.529ms; value: next_token_ids=tensor([124637], device='cuda:0') mtp accept=1 prop=124637 top1=124637 accp=1.000 next=pair draft=478 prop=478 pred gate=device Token # 639: 117.243ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=draft=58000 prop=58000 olap pair=111.2ms serial=196.6ms gain=85.3ms ratio=0.43 s0=8.8ms s1=187.7ms wait=0.2/40.8ms pred gate=device Token # 640: 3.804ms; value: next_token_ids=tensor([58000], device='cuda:0') mtp accept=1 prop=58000 top1=58000 accp=0.980 next=pair draft=779 prop=779 pred gate=device Token # 641: 118.253ms; value: next_token_ids=tensor([779], device='cuda:0') mtp accept=1 prop=779 top1=779 accp=1.000 next=draft=768 prop=768 olap pair=112.1ms serial=198.0ms gain=85.9ms ratio=0.43 s0=8.4ms s1=189.6ms wait=0.2/41.3ms pred gate=device Token # 642: 4.587ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=5640 prop=4339 pred gate=device Token # 643: 116.370ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=0.603 next=draft=7163 prop=7163 olap pair=110.8ms serial=195.4ms gain=84.6ms ratio=0.43 s0=7.6ms s1=187.8ms wait=0.2/42.4ms pred gate=device Token # 644: 3.825ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 645: 117.591ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=111.3ms serial=196.4ms gain=85.1ms ratio=0.43 s0=8.6ms s1=187.8ms wait=0.2/41.2ms pred gate=device Token # 646: 4.662ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=1267 prop=1267 pred gate=device Token # 647: 116.917ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=draft=223 prop=223 olap pair=111.4ms serial=196.4ms gain=85.0ms ratio=0.43 s0=7.1ms s1=189.3ms wait=0.2/43.0ms pred gate=device Token # 648: 3.844ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=779 prop=779 pred gate=device Token # 649: 118.059ms; value: next_token_ids=tensor([779], device='cuda:0') mtp accept=1 prop=779 top1=779 accp=1.000 next=draft=320 prop=320 olap pair=112.0ms serial=197.6ms gain=85.6ms ratio=0.43 s0=7.6ms s1=190.0ms wait=0.2/42.3ms pred gate=device Token # 650: 4.657ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=55180 prop=55180 pred gate=device Token # 651: 117.105ms; value: next_token_ids=tensor([2541], device='cuda:0') mtp accept=0 prop=55180 top1=2541 accp=0.220 next=draft=55180 prop=55180 olap pair=111.6ms serial=196.3ms gain=84.7ms ratio=0.43 s0=8.2ms s1=188.2ms wait=0.2/41.6ms pred gate=device Token # 652: 117.930ms; value: next_token_ids=tensor([55180], device='cuda:0') mtp accept=1 prop=55180 top1=55180 accp=1.000 next=draft=8283 prop=52759 olap pair=112.1ms serial=197.5ms gain=85.5ms ratio=0.43 s0=6.9ms s1=190.6ms wait=0.2/43.1ms pred gate=device Token # 653: 3.831ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=0 prop=52759 top1=548 accp=0.041 next=pair draft=768 prop=768 pred gate=device Token # 654: 117.248ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=824 accp=0.353 next=draft=1255 prop=1255 olap pair=111.9ms serial=197.2ms gain=85.3ms ratio=0.43 s0=7.2ms s1=190.0ms wait=0.2/42.7ms pred gate=device Token # 655: 3.802ms; value: next_token_ids=tensor([1255], device='cuda:0') mtp accept=1 prop=1255 top1=1255 accp=0.995 next=pair draft=4240 prop=4240 pred gate=device Token # 656: 116.271ms; value: next_token_ids=tensor([47861], device='cuda:0') mtp accept=0 prop=4240 top1=47861 accp=0.163 next=draft=3257 prop=3257 olap pair=110.8ms serial=195.9ms gain=85.1ms ratio=0.43 s0=6.7ms s1=189.2ms wait=0.2/43.0ms pred gate=device Token # 657: 118.551ms; value: next_token_ids=tensor([3257], device='cuda:0') mtp accept=1 prop=3257 top1=3257 accp=1.000 next=draft=303 prop=303 olap pair=112.3ms serial=196.5ms gain=84.2ms ratio=0.43 s0=8.1ms s1=188.4ms wait=0.2/41.7ms pred gate=device Token # 658: 4.494ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=8283 prop=8283 pred gate=device Token # 659: 116.752ms; value: next_token_ids=tensor([8283], device='cuda:0') mtp accept=1 prop=8283 top1=8283 accp=0.838 next=draft=55180 prop=768 olap pair=111.5ms serial=196.6ms gain=85.1ms ratio=0.43 s0=4.6ms s1=192.0ms wait=0.1/45.7ms pred gate=device Token # 660: 3.812ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.068 next=pair draft=19 prop=19 pred gate=device Token # 661: 117.143ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=14 prop=14 olap pair=111.3ms serial=195.4ms gain=84.2ms ratio=0.43 s0=5.7ms s1=189.7ms wait=0.1/44.3ms pred gate=device Token # 662: 3.854ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=1.000 next=pair draft=18 prop=18 pred gate=device Token # 663: 116.691ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=14 prop=14 olap pair=111.4ms serial=197.1ms gain=85.7ms ratio=0.43 s0=4.8ms s1=192.3ms wait=0.1/45.1ms pred gate=device Token # 664: 3.779ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=1.000 next=pair draft=24 prop=24 pred gate=device Token # 665: 117.043ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=draft=14 prop=14 olap pair=111.3ms serial=195.3ms gain=84.0ms ratio=0.43 s0=8.2ms s1=187.1ms wait=0.2/41.5ms pred gate=device Token # 666: 3.853ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=1.000 next=pair draft=25 prop=25 pred gate=device Token # 667: 117.188ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=draft=14 prop=14 olap pair=111.0ms serial=196.2ms gain=85.2ms ratio=0.43 s0=8.5ms s1=187.7ms wait=0.2/41.4ms pred gate=device Token # 668: 4.631ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=1.000 next=pair draft=23 prop=23 pred gate=device Token # 669: 117.284ms; value: next_token_ids=tensor([23], device='cuda:0') mtp accept=1 prop=23 top1=23 accp=1.000 next=draft=14 prop=14 olap pair=111.8ms serial=196.9ms gain=85.1ms ratio=0.43 s0=8.5ms s1=188.4ms wait=0.2/41.3ms pred gate=device Token # 670: 3.868ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=1.000 next=pair draft=26 prop=26 pred gate=device Token # 671: 117.678ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=draft=14 prop=14 olap pair=111.6ms serial=197.0ms gain=85.4ms ratio=0.43 s0=6.6ms s1=190.3ms wait=0.2/43.3ms pred gate=device Token # 672: 4.808ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=1.000 next=pair draft=22 prop=22 pred gate=device Token # 673: 116.620ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=draft=14 prop=14 olap pair=111.3ms serial=197.5ms gain=86.2ms ratio=0.44 s0=5.8ms s1=191.8ms wait=0.2/44.2ms pred gate=device Token # 674: 3.794ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=1.000 next=pair draft=18 prop=18 pred gate=device Token # 675: 115.485ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=14 prop=14 olap pair=110.2ms serial=194.4ms gain=84.2ms ratio=0.43 s0=4.6ms s1=189.8ms wait=0.1/45.3ms pred gate=device Token # 676: 3.767ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 677: 116.188ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=320 prop=320 olap pair=110.9ms serial=195.7ms gain=84.9ms ratio=0.43 s0=4.5ms s1=191.2ms wait=0.1/45.4ms pred gate=device Token # 678: 3.761ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=0 prop=320 top1=1237 accp=0.065 next=pair draft=55180 prop=55180 pred gate=device Token # 679: 117.670ms; value: next_token_ids=tensor([4754], device='cuda:0') mtp accept=0 prop=55180 top1=4754 accp=0.081 next=draft=18143 prop=18143 olap pair=111.5ms serial=196.1ms gain=84.6ms ratio=0.43 s0=6.6ms s1=189.6ms wait=0.2/43.2ms pred gate=device Token # 680: 118.022ms; value: next_token_ids=tensor([8283], device='cuda:0') mtp accept=0 prop=18143 top1=8283 accp=0.022 next=draft=18143 prop=18143 olap pair=111.6ms serial=198.0ms gain=86.3ms ratio=0.44 s0=5.2ms s1=192.7ms wait=0.1/44.8ms pred gate=device Token # 681: 116.977ms; value: next_token_ids=tensor([18143], device='cuda:0') mtp accept=1 prop=18143 top1=18143 accp=0.788 next=draft=768 prop=768 olap pair=111.3ms serial=196.1ms gain=84.8ms ratio=0.43 s0=8.3ms s1=187.7ms wait=0.2/41.4ms pred gate=device Token # 682: 3.777ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.995 next=pair draft=19 prop=19 pred gate=device Token # 683: 118.257ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=0 prop=19 top1=7163 accp=0.041 next=draft=27521 prop=27521 olap pair=112.1ms serial=198.0ms gain=85.9ms ratio=0.43 s0=8.3ms s1=189.7ms wait=0.2/41.5ms pred gate=device Token # 684: 118.592ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=112.0ms serial=197.4ms gain=85.4ms ratio=0.43 s0=6.5ms s1=190.9ms wait=0.2/43.5ms pred gate=device Token # 685: 4.636ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 686: 118.008ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.828 next=draft=8283 prop=8283 olap pair=111.7ms serial=196.8ms gain=85.1ms ratio=0.43 s0=7.1ms s1=189.7ms wait=0.2/42.9ms pred gate=device Token # 687: 4.628ms; value: next_token_ids=tensor([8283], device='cuda:0') mtp accept=1 prop=8283 top1=8283 accp=0.880 next=pair draft=389 prop=389 pred gate=device Token # 688: 117.723ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=0 prop=389 top1=5585 accp=0.431 next=draft=768 prop=768 olap pair=112.2ms serial=198.2ms gain=85.9ms ratio=0.43 s0=7.1ms s1=191.1ms wait=0.2/42.8ms pred gate=device Token # 689: 117.853ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.996 next=draft=19 prop=19 olap pair=112.4ms serial=198.6ms gain=86.2ms ratio=0.43 s0=8.0ms s1=190.6ms wait=0.2/41.9ms pred gate=device Token # 690: 3.830ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=0.667 next=pair draft=14 prop=14 pred gate=device Token # 691: 116.839ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=0 prop=14 top1=1237 accp=0.002 next=draft=558 prop=558 olap pair=111.0ms serial=196.5ms gain=85.4ms ratio=0.43 s0=7.3ms s1=189.1ms wait=0.2/42.3ms pred gate=device Token # 692: 117.075ms; value: next_token_ids=tensor([558], device='cuda:0') mtp accept=1 prop=558 top1=558 accp=0.991 next=draft=1190 prop=1190 olap pair=111.5ms serial=198.6ms gain=87.1ms ratio=0.44 s0=4.2ms s1=194.4ms wait=0.1/45.5ms pred gate=device Token # 693: 3.808ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=pair draft=7417 prop=7417 pred gate=device Token # 694: 117.310ms; value: next_token_ids=tensor([7417], device='cuda:0') mtp accept=1 prop=7417 top1=7417 accp=0.892 next=draft=18 prop=18 olap pair=111.8ms serial=198.9ms gain=87.1ms ratio=0.44 s0=4.0ms s1=194.9ms wait=0.1/45.9ms pred gate=device Token # 695: 3.773ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=pair draft=1237 prop=1237 pred gate=device Token # 696: 117.310ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=0.999 next=draft=1471 prop=1471 olap pair=111.8ms serial=196.7ms gain=84.9ms ratio=0.43 s0=4.6ms s1=192.1ms wait=0.1/45.5ms pred gate=device Token # 697: 3.955ms; value: next_token_ids=tensor([1471], device='cuda:0') mtp accept=1 prop=1471 top1=1471 accp=1.000 next=pair draft=1190 prop=1190 pred gate=device Token # 698: 116.591ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=draft=7417 prop=7417 olap pair=111.1ms serial=197.6ms gain=86.5ms ratio=0.44 s0=4.1ms s1=193.5ms wait=0.1/45.8ms pred gate=device Token # 699: 3.753ms; value: next_token_ids=tensor([7417], device='cuda:0') mtp accept=1 prop=7417 top1=7417 accp=1.000 next=pair draft=24 prop=24 pred gate=device Token # 700: 117.045ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=draft=1237 prop=1237 olap pair=111.7ms serial=198.8ms gain=87.1ms ratio=0.44 s0=4.2ms s1=194.6ms wait=0.1/45.8ms pred gate=device Token # 701: 3.847ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=2723 prop=2723 pred gate=device Token # 702: 117.219ms; value: next_token_ids=tensor([2723], device='cuda:0') mtp accept=1 prop=2723 top1=2723 accp=1.000 next=draft=1190 prop=1190 olap pair=111.5ms serial=198.0ms gain=86.5ms ratio=0.44 s0=4.8ms s1=193.2ms wait=0.1/45.2ms pred gate=device Token # 703: 3.839ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=pair draft=1148 prop=1148 pred gate=device Token # 704: 117.474ms; value: next_token_ids=tensor([7417], device='cuda:0') mtp accept=0 prop=1148 top1=7417 accp=0.165 next=draft=25 prop=25 olap pair=112.1ms serial=197.9ms gain=85.8ms ratio=0.43 s0=4.8ms s1=193.1ms wait=0.1/45.4ms pred gate=device Token # 705: 116.444ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=draft=1237 prop=1237 olap pair=111.0ms serial=197.3ms gain=86.3ms ratio=0.44 s0=4.3ms s1=193.0ms wait=0.1/45.5ms pred gate=device Token # 706: 3.876ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=3563 prop=3563 pred gate=device Token # 707: 116.840ms; value: next_token_ids=tensor([3563], device='cuda:0') mtp accept=1 prop=3563 top1=3563 accp=1.000 next=draft=1190 prop=1190 olap pair=111.4ms serial=198.2ms gain=86.8ms ratio=0.44 s0=4.6ms s1=193.6ms wait=0.1/44.9ms pred gate=device Token # 708: 3.805ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=pair draft=7417 prop=7417 pred gate=device Token # 709: 117.419ms; value: next_token_ids=tensor([7417], device='cuda:0') mtp accept=1 prop=7417 top1=7417 accp=1.000 next=draft=23 prop=23 olap pair=112.0ms serial=199.5ms gain=87.5ms ratio=0.44 s0=4.6ms s1=195.0ms wait=0.1/45.0ms pred gate=device Token # 710: 3.793ms; value: next_token_ids=tensor([23], device='cuda:0') mtp accept=1 prop=23 top1=23 accp=1.000 next=pair draft=1237 prop=1237 pred gate=device Token # 711: 117.397ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=0.996 next=draft=2115 prop=2115 olap pair=111.0ms serial=196.3ms gain=85.3ms ratio=0.43 s0=6.8ms s1=189.5ms wait=0.2/42.9ms pred gate=device Token # 712: 4.697ms; value: next_token_ids=tensor([2115], device='cuda:0') mtp accept=1 prop=2115 top1=2115 accp=1.000 next=pair draft=1190 prop=1190 pred gate=device Token # 713: 116.911ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=draft=7417 prop=7417 olap pair=111.4ms serial=197.1ms gain=85.8ms ratio=0.44 s0=7.9ms s1=189.2ms wait=0.2/41.7ms pred gate=device Token # 714: 3.807ms; value: next_token_ids=tensor([7417], device='cuda:0') mtp accept=1 prop=7417 top1=7417 accp=1.000 next=pair draft=26 prop=26 pred gate=device Token # 715: 116.757ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=draft=1237 prop=1237 olap pair=111.4ms serial=197.4ms gain=86.0ms ratio=0.44 s0=7.1ms s1=190.3ms wait=0.2/42.7ms pred gate=device Token # 716: 3.887ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=53434 prop=53434 pred gate=device Token # 717: 116.525ms; value: next_token_ids=tensor([53434], device='cuda:0') mtp accept=1 prop=53434 top1=53434 accp=1.000 next=draft=1190 prop=1190 olap pair=111.1ms serial=197.7ms gain=86.6ms ratio=0.44 s0=4.5ms s1=193.1ms wait=0.1/45.2ms pred gate=device Token # 718: 3.866ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=pair draft=7417 prop=7417 pred gate=device Token # 719: 116.642ms; value: next_token_ids=tensor([7417], device='cuda:0') mtp accept=1 prop=7417 top1=7417 accp=1.000 next=draft=22 prop=22 olap pair=111.3ms serial=198.1ms gain=86.9ms ratio=0.44 s0=4.2ms s1=193.9ms wait=0.1/45.7ms pred gate=device Token # 720: 3.816ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=pair draft=1237 prop=1237 pred gate=device Token # 721: 117.505ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=draft=28043 prop=28043 olap pair=112.0ms serial=199.0ms gain=87.0ms ratio=0.44 s0=4.6ms s1=194.4ms wait=0.1/45.1ms pred gate=device Token # 722: 3.798ms; value: next_token_ids=tensor([28043], device='cuda:0') mtp accept=1 prop=28043 top1=28043 accp=1.000 next=pair draft=7417 prop=7417 pred gate=device Token # 723: 116.055ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=0 prop=7417 top1=1190 accp=0.008 next=draft=7417 prop=7417 olap pair=110.6ms serial=196.4ms gain=85.8ms ratio=0.44 s0=5.8ms s1=190.6ms wait=0.2/43.9ms pred gate=device Token # 724: 117.022ms; value: next_token_ids=tensor([7417], device='cuda:0') mtp accept=1 prop=7417 top1=7417 accp=1.000 next=draft=18 prop=18 olap pair=111.6ms serial=196.7ms gain=85.1ms ratio=0.43 s0=7.2ms s1=189.5ms wait=0.2/42.4ms pred gate=device Token # 725: 3.850ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=pair draft=1237 prop=1237 pred gate=device Token # 726: 117.632ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=draft=18874 prop=18874 olap pair=111.3ms serial=197.2ms gain=86.0ms ratio=0.44 s0=6.1ms s1=191.1ms wait=0.2/43.8ms pred gate=device Token # 727: 4.477ms; value: next_token_ids=tensor([18874], device='cuda:0') mtp accept=1 prop=18874 top1=18874 accp=1.000 next=pair draft=1190 prop=7417 pred gate=device Token # 728: 116.591ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=0 prop=7417 top1=1190 accp=0.750 next=draft=7417 prop=7417 olap pair=111.2ms serial=197.8ms gain=86.5ms ratio=0.44 s0=4.6ms s1=193.2ms wait=0.1/45.1ms pred gate=device Token # 729: 118.058ms; value: next_token_ids=tensor([7417], device='cuda:0') mtp accept=1 prop=7417 top1=7417 accp=1.000 next=draft=19 prop=19 olap pair=111.7ms serial=198.2ms gain=86.6ms ratio=0.44 s0=6.2ms s1=192.0ms wait=0.2/43.3ms pred gate=device Token # 730: 3.915ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=1237 prop=1237 pred gate=device Token # 731: 116.876ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=draft=5528 prop=5528 olap pair=111.4ms serial=198.2ms gain=86.8ms ratio=0.44 s0=4.2ms s1=194.0ms wait=0.1/45.7ms pred gate=device Token # 732: 3.922ms; value: next_token_ids=tensor([5528], device='cuda:0') mtp accept=1 prop=5528 top1=5528 accp=1.000 next=pair draft=1190 prop=1190 pred gate=device Token # 733: 115.959ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=draft=14164 prop=14164 olap pair=110.6ms serial=196.8ms gain=86.2ms ratio=0.44 s0=4.0ms s1=192.8ms wait=0.1/45.9ms pred gate=device Token # 734: 3.871ms; value: next_token_ids=tensor([7417], device='cuda:0') mtp accept=0 prop=14164 top1=7417 accp=0.331 next=pair draft=18 prop=18 pred gate=device Token # 735: 116.527ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=0.573 next=draft=1148 prop=1148 olap pair=111.1ms serial=197.6ms gain=86.4ms ratio=0.44 s0=4.1ms s1=193.5ms wait=0.1/45.8ms pred gate=device Token # 736: 3.836ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=0 prop=1148 top1=1237 accp=0.051 next=pair draft=1471 prop=1471 pred gate=device Token # 737: 117.001ms; value: next_token_ids=tensor([1471], device='cuda:0') mtp accept=1 prop=1471 top1=1471 accp=1.000 next=draft=5528 prop=5528 olap pair=111.5ms serial=196.8ms gain=85.4ms ratio=0.43 s0=4.4ms s1=192.5ms wait=0.1/45.5ms pred gate=device Token # 738: 3.814ms; value: next_token_ids=tensor([5528], device='cuda:0') mtp accept=1 prop=5528 top1=5528 accp=1.000 next=pair draft=1190 prop=1190 pred gate=device Token # 739: 116.924ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=draft=1227 prop=1227 olap pair=111.5ms serial=197.1ms gain=85.6ms ratio=0.43 s0=4.1ms s1=193.0ms wait=0.1/45.8ms pred gate=device Token # 740: 3.812ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=1 prop=1227 top1=1227 accp=0.979 next=pair draft=1148 prop=1148 pred gate=device Token # 741: 116.759ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=1.000 next=draft=14149 prop=14149 olap pair=111.4ms serial=198.1ms gain=86.6ms ratio=0.44 s0=4.5ms s1=193.5ms wait=0.1/45.1ms pred gate=device Token # 742: 3.761ms; value: next_token_ids=tensor([10877], device='cuda:0') mtp accept=0 prop=14149 top1=10877 accp=0.321 next=pair draft=303 prop=303 pred gate=device Token # 743: 116.672ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.999 next=draft=8283 prop=8283 olap pair=111.3ms serial=198.0ms gain=86.7ms ratio=0.44 s0=4.5ms s1=193.4ms wait=0.1/45.0ms pred gate=device Token # 744: 3.787ms; value: next_token_ids=tensor([8283], device='cuda:0') mtp accept=1 prop=8283 top1=8283 accp=0.989 next=pair draft=389 prop=389 pred gate=device Token # 745: 117.113ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=0.998 next=draft=27 prop=27 olap pair=111.7ms serial=198.8ms gain=87.1ms ratio=0.44 s0=4.5ms s1=194.3ms wait=0.1/44.9ms pred gate=device Token # 746: 3.796ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=0 prop=27 top1=7163 accp=0.083 next=pair draft=27521 prop=27521 pred gate=device Token # 747: 116.806ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=111.3ms serial=197.8ms gain=86.5ms ratio=0.44 s0=4.8ms s1=193.1ms wait=0.1/44.8ms pred gate=device Token # 748: 3.759ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 749: 116.598ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.999 next=draft=2075 prop=2075 olap pair=111.2ms serial=197.2ms gain=86.0ms ratio=0.44 s0=4.6ms s1=192.5ms wait=0.1/45.0ms pred gate=device Token # 750: 3.860ms; value: next_token_ids=tensor([2075], device='cuda:0') mtp accept=1 prop=2075 top1=2075 accp=0.777 next=pair draft=27 prop=27 pred gate=device Token # 751: 116.904ms; value: next_token_ids=tensor([27], device='cuda:0') mtp accept=1 prop=27 top1=27 accp=1.000 next=draft=1190 prop=1190 olap pair=111.4ms serial=198.2ms gain=86.8ms ratio=0.44 s0=4.4ms s1=193.8ms wait=0.1/45.3ms pred gate=device Token # 752: 3.839ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=pair draft=1148 prop=1148 pred gate=device Token # 753: 116.634ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=768 accp=0.470 next=draft=14149 prop=14149 olap pair=111.3ms serial=197.5ms gain=86.3ms ratio=0.44 s0=4.7ms s1=192.9ms wait=0.1/45.0ms pred gate=device Token # 754: 3.819ms; value: next_token_ids=tensor([71215], device='cuda:0') mtp accept=0 prop=14149 top1=71215 accp=0.159 next=pair draft=19 prop=19 pred gate=device Token # 755: 117.075ms; value: next_token_ids=tensor([27], device='cuda:0') mtp accept=0 prop=19 top1=27 accp=0.070 next=draft=1190 prop=1190 olap pair=111.7ms serial=197.7ms gain=86.1ms ratio=0.44 s0=6.2ms s1=191.6ms wait=0.2/43.4ms pred gate=device Token # 756: 117.158ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=0.921 next=draft=768 prop=768 olap pair=111.6ms serial=197.4ms gain=85.8ms ratio=0.43 s0=5.1ms s1=192.3ms wait=0.1/44.9ms pred gate=device Token # 757: 3.816ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.996 next=pair draft=19 prop=19 pred gate=device Token # 758: 117.029ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=14 prop=14 olap pair=111.7ms serial=198.6ms gain=86.9ms ratio=0.44 s0=4.5ms s1=194.1ms wait=0.1/45.3ms pred gate=device Token # 759: 3.841ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=14 top1=223 accp=0.002 next=pair draft=18 prop=18 pred gate=device Token # 760: 116.882ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=223 prop=223 olap pair=111.4ms serial=197.5ms gain=86.0ms ratio=0.44 s0=4.4ms s1=193.0ms wait=0.1/45.5ms pred gate=device Token # 761: 3.871ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=22 prop=22 pred gate=device Token # 762: 116.945ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=draft=223 prop=223 olap pair=111.6ms serial=197.2ms gain=85.6ms ratio=0.43 s0=4.6ms s1=192.6ms wait=0.1/45.1ms pred gate=device Token # 763: 3.849ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=26 prop=26 pred gate=device Token # 764: 117.283ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=draft=223 prop=223 olap pair=111.8ms serial=198.6ms gain=86.8ms ratio=0.44 s0=4.4ms s1=194.2ms wait=0.1/45.5ms pred gate=device Token # 765: 3.940ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=23 prop=23 pred gate=device Token # 766: 116.982ms; value: next_token_ids=tensor([23], device='cuda:0') mtp accept=1 prop=23 top1=23 accp=1.000 next=draft=223 prop=223 olap pair=110.7ms serial=194.2ms gain=83.5ms ratio=0.43 s0=8.5ms s1=185.8ms wait=0.2/40.9ms pred gate=device Token # 767: 4.724ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=25 prop=25 pred gate=device Token # 768: 116.797ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=draft=223 prop=223 olap pair=111.2ms serial=197.2ms gain=85.9ms ratio=0.44 s0=7.9ms s1=189.3ms wait=0.2/41.8ms pred gate=device Token # 769: 3.949ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=24 prop=24 pred gate=device Token # 770: 117.900ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=draft=223 prop=223 olap pair=111.7ms serial=197.0ms gain=85.3ms ratio=0.43 s0=8.2ms s1=188.8ms wait=0.2/40.8ms pred gate=device Token # 771: 4.712ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=18 prop=18 pred gate=device Token # 772: 117.872ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=18 prop=18 olap pair=111.6ms serial=197.9ms gain=86.3ms ratio=0.44 s0=8.1ms s1=189.8ms wait=0.2/41.5ms pred gate=device Token # 773: 4.772ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=18 top1=223 accp=0.021 next=pair draft=19 prop=19 pred gate=device Token # 774: 118.155ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=320 prop=320 olap pair=111.7ms serial=197.9ms gain=86.2ms ratio=0.44 s0=8.4ms s1=189.5ms wait=0.2/41.3ms pred gate=device Token # 775: 4.755ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.998 next=pair draft=2636 prop=2636 pred gate=device Token # 776: 117.883ms; value: next_token_ids=tensor([1255], device='cuda:0') mtp accept=0 prop=2636 top1=2636 accp=0.947 next=draft=47861 prop=47861 olap pair=111.5ms serial=196.3ms gain=84.8ms ratio=0.43 s0=8.5ms s1=187.8ms wait=0.2/41.1ms pred gate=device Token # 777: 118.065ms; value: next_token_ids=tensor([47861], device='cuda:0') mtp accept=1 prop=47861 top1=47861 accp=0.884 next=draft=3257 prop=3257 olap pair=111.6ms serial=197.2ms gain=85.5ms ratio=0.43 s0=8.2ms s1=188.9ms wait=0.2/41.4ms pred gate=device Token # 778: 4.576ms; value: next_token_ids=tensor([3257], device='cuda:0') mtp accept=1 prop=3257 top1=3257 accp=0.998 next=pair draft=768 prop=768 pred gate=device Token # 779: 117.693ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.999 next=draft=558 prop=558 olap pair=111.4ms serial=196.8ms gain=85.4ms ratio=0.43 s0=8.6ms s1=188.2ms wait=0.2/41.0ms pred gate=device Token # 780: 4.411ms; value: next_token_ids=tensor([558], device='cuda:0') mtp accept=1 prop=558 top1=558 accp=0.831 next=pair draft=1190 prop=1190 pred gate=device Token # 781: 116.883ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=draft=768 prop=768 olap pair=110.9ms serial=195.8ms gain=85.0ms ratio=0.43 s0=7.8ms s1=188.0ms wait=0.2/41.9ms pred gate=device Token # 782: 3.890ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.934 next=pair draft=19 prop=19 pred gate=device Token # 783: 117.890ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=303 prop=303 olap pair=111.7ms serial=196.6ms gain=85.0ms ratio=0.43 s0=6.0ms s1=190.6ms wait=0.2/43.8ms pred gate=device Token # 784: 4.686ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=1471 prop=1471 pred gate=device Token # 785: 118.529ms; value: next_token_ids=tensor([1471], device='cuda:0') mtp accept=1 prop=1471 top1=1471 accp=1.000 next=draft=1190 prop=1190 olap pair=112.0ms serial=196.1ms gain=84.0ms ratio=0.43 s0=8.8ms s1=187.2ms wait=0.2/40.6ms pred gate=device Token # 786: 4.688ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=pair draft=768 prop=768 pred gate=device Token # 787: 116.389ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=18 prop=18 olap pair=110.8ms serial=196.3ms gain=85.5ms ratio=0.44 s0=6.9ms s1=189.4ms wait=0.2/42.9ms pred gate=device Token # 788: 3.869ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 789: 118.104ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=2723 prop=2723 olap pair=111.8ms serial=196.8ms gain=85.0ms ratio=0.43 s0=8.5ms s1=188.3ms wait=0.2/41.0ms pred gate=device Token # 790: 4.779ms; value: next_token_ids=tensor([2723], device='cuda:0') mtp accept=1 prop=2723 top1=2723 accp=1.000 next=pair draft=1190 prop=1190 pred gate=device Token # 791: 117.319ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=draft=768 prop=768 olap pair=111.7ms serial=195.7ms gain=84.0ms ratio=0.43 s0=8.3ms s1=187.4ms wait=0.2/41.4ms pred gate=device Token # 792: 3.808ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=24 prop=24 pred gate=device Token # 793: 116.792ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=draft=303 prop=303 olap pair=111.4ms serial=197.8ms gain=86.4ms ratio=0.44 s0=4.2ms s1=193.7ms wait=0.1/46.1ms pred gate=device Token # 794: 3.849ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=3563 prop=3563 pred gate=device Token # 795: 118.397ms; value: next_token_ids=tensor([3563], device='cuda:0') mtp accept=1 prop=3563 top1=3563 accp=1.000 next=draft=1190 prop=1190 olap pair=112.0ms serial=197.8ms gain=85.7ms ratio=0.43 s0=4.5ms s1=193.3ms wait=0.1/45.8ms pred gate=device Token # 796: 4.506ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=pair draft=768 prop=768 pred gate=device Token # 797: 117.026ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=25 prop=25 olap pair=111.6ms serial=197.9ms gain=86.3ms ratio=0.44 s0=7.1ms s1=190.8ms wait=0.2/42.8ms pred gate=device Token # 798: 3.809ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 799: 117.350ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=2115 prop=2115 olap pair=111.9ms serial=197.0ms gain=85.1ms ratio=0.43 s0=7.4ms s1=189.6ms wait=0.2/42.2ms pred gate=device Token # 800: 3.844ms; value: next_token_ids=tensor([2115], device='cuda:0') mtp accept=1 prop=2115 top1=2115 accp=1.000 next=pair draft=1190 prop=1190 pred gate=device Token # 801: 117.105ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=draft=768 prop=768 olap pair=111.5ms serial=195.5ms gain=84.0ms ratio=0.43 s0=8.0ms s1=187.5ms wait=0.2/41.8ms pred gate=device Token # 802: 3.921ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=23 prop=23 pred gate=device Token # 803: 118.843ms; value: next_token_ids=tensor([23], device='cuda:0') mtp accept=1 prop=23 top1=23 accp=1.000 next=draft=303 prop=303 olap pair=113.0ms serial=198.1ms gain=85.1ms ratio=0.43 s0=6.4ms s1=191.7ms wait=0.2/43.5ms pred gate=device Token # 804: 3.968ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=53434 prop=53434 pred gate=device Token # 805: 117.428ms; value: next_token_ids=tensor([53434], device='cuda:0') mtp accept=1 prop=53434 top1=53434 accp=1.000 next=draft=1190 prop=1190 olap pair=111.2ms serial=195.8ms gain=84.7ms ratio=0.43 s0=7.1ms s1=188.8ms wait=0.2/42.7ms pred gate=device Token # 806: 4.761ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=pair draft=768 prop=768 pred gate=device Token # 807: 116.044ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=26 prop=26 olap pair=110.5ms serial=194.9ms gain=84.4ms ratio=0.43 s0=7.3ms s1=187.6ms wait=0.2/42.3ms pred gate=device Token # 808: 3.800ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 809: 116.878ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=28043 prop=28043 olap pair=111.4ms serial=196.5ms gain=85.1ms ratio=0.43 s0=7.3ms s1=189.2ms wait=0.2/42.7ms pred gate=device Token # 810: 3.807ms; value: next_token_ids=tensor([28043], device='cuda:0') mtp accept=1 prop=28043 top1=28043 accp=1.000 next=pair draft=1190 prop=1190 pred gate=device Token # 811: 117.912ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=draft=768 prop=768 olap pair=111.6ms serial=197.4ms gain=85.7ms ratio=0.43 s0=6.9ms s1=190.5ms wait=0.2/43.0ms pred gate=device Token # 812: 4.295ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=22 prop=22 pred gate=device Token # 813: 117.644ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=draft=303 prop=303 olap pair=111.4ms serial=196.9ms gain=85.5ms ratio=0.43 s0=5.9ms s1=191.0ms wait=0.2/44.0ms pred gate=device Token # 814: 4.561ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=18874 prop=18874 pred gate=device Token # 815: 115.918ms; value: next_token_ids=tensor([18874], device='cuda:0') mtp accept=1 prop=18874 top1=18874 accp=1.000 next=draft=1190 prop=1190 olap pair=110.5ms serial=196.3ms gain=85.8ms ratio=0.44 s0=4.3ms s1=192.1ms wait=0.1/45.6ms pred gate=device Token # 816: 3.862ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=pair draft=768 prop=768 pred gate=device Token # 817: 116.865ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=18 prop=18 olap pair=111.5ms serial=197.6ms gain=86.1ms ratio=0.44 s0=4.4ms s1=193.2ms wait=0.1/45.7ms pred gate=device Token # 818: 3.759ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 819: 117.216ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=5528 prop=5528 olap pair=111.7ms serial=197.6ms gain=86.0ms ratio=0.43 s0=5.0ms s1=192.6ms wait=0.2/45.2ms pred gate=device Token # 820: 3.929ms; value: next_token_ids=tensor([5528], device='cuda:0') mtp accept=1 prop=5528 top1=5528 accp=1.000 next=pair draft=1190 prop=1190 pred gate=device Token # 821: 117.181ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=draft=768 prop=768 olap pair=111.8ms serial=196.9ms gain=85.1ms ratio=0.43 s0=6.9ms s1=189.9ms wait=0.2/42.9ms pred gate=device Token # 822: 3.884ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 823: 117.533ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=320 prop=320 olap pair=111.7ms serial=196.2ms gain=84.5ms ratio=0.43 s0=7.0ms s1=189.2ms wait=0.2/42.9ms pred gate=device Token # 824: 3.838ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.999 next=pair draft=2636 prop=2636 pred gate=device Token # 825: 117.570ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=1.000 next=draft=55180 prop=55180 olap pair=111.4ms serial=197.2ms gain=85.8ms ratio=0.44 s0=4.6ms s1=192.5ms wait=0.1/45.5ms pred gate=device Token # 826: 4.683ms; value: next_token_ids=tensor([55180], device='cuda:0') mtp accept=1 prop=55180 top1=55180 accp=0.954 next=pair draft=548 prop=548 pred gate=device Token # 827: 118.351ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=1 prop=548 top1=548 accp=0.991 next=draft=768 prop=768 olap pair=112.0ms serial=198.2ms gain=86.2ms ratio=0.43 s0=9.0ms s1=189.2ms wait=0.2/40.5ms pred gate=device Token # 828: 4.790ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=0 prop=768 top1=1237 accp=0.260 next=pair draft=1255 prop=1255 pred gate=device Token # 829: 117.292ms; value: next_token_ids=tensor([1255], device='cuda:0') mtp accept=1 prop=1255 top1=1255 accp=0.997 next=draft=4240 prop=47861 olap pair=110.8ms serial=195.9ms gain=85.1ms ratio=0.43 s0=6.5ms s1=189.4ms wait=0.2/43.2ms pred gate=device Token # 830: 4.750ms; value: next_token_ids=tensor([47861], device='cuda:0') mtp accept=1 prop=47861 top1=47861 accp=0.160 next=pair draft=3257 prop=3257 pred gate=device Token # 831: 116.676ms; value: next_token_ids=tensor([3257], device='cuda:0') mtp accept=1 prop=3257 top1=3257 accp=0.946 next=draft=100791 prop=100791 olap pair=111.2ms serial=195.4ms gain=84.3ms ratio=0.43 s0=6.8ms s1=188.6ms wait=0.2/43.3ms pred gate=device Token # 832: 3.817ms; value: next_token_ids=tensor([100791], device='cuda:0') mtp accept=1 prop=100791 top1=100791 accp=0.991 next=pair draft=1190 prop=1190 pred gate=device Token # 833: 118.307ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=draft=3775 prop=3775 olap pair=112.9ms serial=198.2ms gain=85.4ms ratio=0.43 s0=4.7ms s1=193.6ms wait=0.1/45.3ms pred gate=device Token # 834: 3.824ms; value: next_token_ids=tensor([3775], device='cuda:0') mtp accept=1 prop=3775 top1=3775 accp=0.998 next=pair draft=97100 prop=100791 pred gate=device Token # 835: 117.775ms; value: next_token_ids=tensor([97100], device='cuda:0') mtp accept=0 prop=100791 top1=97100 accp=0.875 next=draft=1190 prop=1190 olap pair=112.3ms serial=197.2ms gain=84.9ms ratio=0.43 s0=8.0ms s1=189.2ms wait=0.2/41.4ms pred gate=device Token # 836: 117.247ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=draft=31993 prop=31993 olap pair=111.7ms serial=197.8ms gain=86.1ms ratio=0.44 s0=4.9ms s1=192.9ms wait=0.1/45.0ms pred gate=device Token # 837: 3.824ms; value: next_token_ids=tensor([31993], device='cuda:0') mtp accept=1 prop=31993 top1=31993 accp=1.000 next=pair draft=10 prop=10 pred gate=device Token # 838: 116.592ms; value: next_token_ids=tensor([100791], device='cuda:0') mtp accept=0 prop=10 top1=100791 accp=0.451 next=draft=1190 prop=1190 olap pair=111.0ms serial=197.6ms gain=86.5ms ratio=0.44 s0=4.0ms s1=193.6ms wait=0.1/46.1ms pred gate=device Token # 839: 116.950ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=draft=768 prop=768 olap pair=111.4ms serial=198.4ms gain=87.0ms ratio=0.44 s0=4.2ms s1=194.2ms wait=0.1/45.7ms pred gate=device Token # 840: 3.843ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.998 next=pair draft=558 prop=558 pred gate=device Token # 841: 117.285ms; value: next_token_ids=tensor([558], device='cuda:0') mtp accept=1 prop=558 top1=558 accp=1.000 next=draft=1190 prop=1190 olap pair=111.8ms serial=198.1ms gain=86.3ms ratio=0.44 s0=4.1ms s1=194.0ms wait=0.1/45.9ms pred gate=device Token # 842: 3.841ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 843: 116.940ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=0.998 next=draft=303 prop=303 olap pair=111.5ms serial=198.5ms gain=87.0ms ratio=0.44 s0=4.1ms s1=194.4ms wait=0.1/45.9ms pred gate=device Token # 844: 3.907ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.994 next=pair draft=2723 prop=2723 pred gate=device Token # 845: 116.772ms; value: next_token_ids=tensor([2723], device='cuda:0') mtp accept=1 prop=2723 top1=2723 accp=1.000 next=draft=1190 prop=1190 olap pair=111.2ms serial=197.0ms gain=85.8ms ratio=0.44 s0=6.2ms s1=190.8ms wait=0.2/43.4ms pred gate=device Token # 846: 3.898ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=pair draft=24 prop=24 pred gate=device Token # 847: 116.993ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=draft=303 prop=303 olap pair=110.8ms serial=195.2ms gain=84.4ms ratio=0.43 s0=8.5ms s1=186.6ms wait=0.2/41.3ms pred gate=device Token # 848: 4.864ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=2115 prop=2115 pred gate=device Token # 849: 118.206ms; value: next_token_ids=tensor([2115], device='cuda:0') mtp accept=1 prop=2115 top1=2115 accp=1.000 next=draft=1190 prop=1190 olap pair=111.8ms serial=196.3ms gain=84.5ms ratio=0.43 s0=8.9ms s1=187.4ms wait=0.2/40.7ms pred gate=device Token # 850: 4.747ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=pair draft=23 prop=23 pred gate=device Token # 851: 117.202ms; value: next_token_ids=tensor([23], device='cuda:0') mtp accept=1 prop=23 top1=23 accp=1.000 next=draft=303 prop=303 olap pair=111.7ms serial=196.2ms gain=84.6ms ratio=0.43 s0=6.8ms s1=189.4ms wait=0.2/43.1ms pred gate=device Token # 852: 3.865ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=28043 prop=28043 pred gate=device Token # 853: 117.576ms; value: next_token_ids=tensor([28043], device='cuda:0') mtp accept=1 prop=28043 top1=28043 accp=1.000 next=draft=1190 prop=1190 olap pair=111.4ms serial=197.1ms gain=85.7ms ratio=0.43 s0=8.2ms s1=188.9ms wait=0.2/41.3ms pred gate=device Token # 854: 4.809ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=pair draft=22 prop=22 pred gate=device Token # 855: 116.638ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=draft=303 prop=303 olap pair=111.2ms serial=196.5ms gain=85.3ms ratio=0.43 s0=8.8ms s1=187.7ms wait=0.2/40.8ms pred gate=device Token # 856: 3.828ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=5528 prop=5528 pred gate=device Token # 857: 117.395ms; value: next_token_ids=tensor([5528], device='cuda:0') mtp accept=1 prop=5528 top1=5528 accp=1.000 next=draft=1190 prop=1190 olap pair=111.6ms serial=196.1ms gain=84.5ms ratio=0.43 s0=7.6ms s1=188.5ms wait=0.2/42.4ms pred gate=device Token # 858: 3.842ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 859: 116.908ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=6787 prop=6787 olap pair=111.4ms serial=196.7ms gain=85.3ms ratio=0.43 s0=5.2ms s1=191.5ms wait=0.1/44.7ms pred gate=device Token # 860: 3.872ms; value: next_token_ids=tensor([6787], device='cuda:0') mtp accept=1 prop=6787 top1=768 accp=0.263 next=pair draft=223 prop=223 pred gate=device Token # 861: 116.969ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.999 next=draft=548 prop=548 olap pair=111.5ms serial=196.3ms gain=84.8ms ratio=0.43 s0=4.5ms s1=191.8ms wait=0.1/45.7ms pred gate=device Token # 862: 3.865ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=1 prop=548 top1=548 accp=0.982 next=pair draft=31 prop=31 pred gate=device Token # 863: 116.915ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=0 prop=31 top1=31 accp=0.819 next=draft=223 prop=223 olap pair=110.8ms serial=195.3ms gain=84.5ms ratio=0.43 s0=4.9ms s1=190.4ms wait=0.1/45.2ms pred gate=device Token # 864: 117.518ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=19 prop=19 olap pair=111.9ms serial=196.9ms gain=85.0ms ratio=0.43 s0=4.8ms s1=192.2ms wait=0.1/45.5ms pred gate=device Token # 865: 3.832ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=13 prop=13 pred gate=device Token # 866: 117.334ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=24 prop=24 olap pair=111.9ms serial=196.9ms gain=85.0ms ratio=0.43 s0=4.8ms s1=192.1ms wait=0.1/45.2ms pred gate=device Token # 867: 3.851ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=pair draft=13 prop=13 pred gate=device Token # 868: 117.045ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=23 prop=23 olap pair=111.6ms serial=197.9ms gain=86.3ms ratio=0.44 s0=5.2ms s1=192.7ms wait=0.1/44.6ms pred gate=device Token # 869: 3.834ms; value: next_token_ids=tensor([23], device='cuda:0') mtp accept=1 prop=23 top1=23 accp=1.000 next=pair draft=13 prop=13 pred gate=device Token # 870: 116.374ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=22 prop=22 olap pair=111.0ms serial=197.3ms gain=86.3ms ratio=0.44 s0=4.2ms s1=193.2ms wait=0.1/46.0ms pred gate=device Token # 871: 3.839ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=pair draft=13 prop=13 pred gate=device Token # 872: 117.093ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=19 prop=19 olap pair=111.7ms serial=197.8ms gain=86.1ms ratio=0.44 s0=4.3ms s1=193.5ms wait=0.1/45.8ms pred gate=device Token # 873: 3.811ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 874: 117.696ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.987 next=draft=223 prop=223 olap pair=111.8ms serial=197.2ms gain=85.5ms ratio=0.43 s0=5.3ms s1=191.9ms wait=0.1/44.9ms pred gate=device Token # 875: 3.902ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=1002 prop=1002 pred gate=device Token # 876: 117.472ms; value: next_token_ids=tensor([1002], device='cuda:0') mtp accept=1 prop=1002 top1=1002 accp=1.000 next=draft=320 prop=320 olap pair=111.8ms serial=196.3ms gain=84.5ms ratio=0.43 s0=7.9ms s1=188.4ms wait=0.2/41.8ms pred gate=device Token # 877: 4.357ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.994 next=pair draft=97100 prop=97100 pred gate=device Token # 878: 116.715ms; value: next_token_ids=tensor([97100], device='cuda:0') mtp accept=1 prop=97100 top1=97100 accp=1.000 next=draft=1190 prop=1190 olap pair=111.1ms serial=195.8ms gain=84.8ms ratio=0.43 s0=5.3ms s1=190.5ms wait=0.1/44.9ms pred gate=device Token # 879: 3.834ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=pair draft=768 prop=768 pred gate=device Token # 880: 117.947ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=1471 prop=1471 olap pair=111.7ms serial=197.0ms gain=85.4ms ratio=0.43 s0=8.5ms s1=188.5ms wait=0.2/41.4ms pred gate=device Token # 881: 4.764ms; value: next_token_ids=tensor([1471], device='cuda:0') mtp accept=1 prop=1471 top1=1471 accp=1.000 next=pair draft=1190 prop=1190 pred gate=device Token # 882: 118.072ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=draft=18 prop=18 olap pair=111.6ms serial=196.6ms gain=85.0ms ratio=0.43 s0=8.3ms s1=188.3ms wait=0.2/41.4ms pred gate=device Token # 883: 4.706ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 884: 118.216ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=3563 prop=3563 olap pair=112.1ms serial=193.6ms gain=81.5ms ratio=0.42 s0=8.6ms s1=185.0ms wait=0.2/40.9ms pred gate=device Token # 885: 3.868ms; value: next_token_ids=tensor([3563], device='cuda:0') mtp accept=1 prop=3563 top1=3563 accp=1.000 next=pair draft=1190 prop=1190 pred gate=device Token # 886: 117.825ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=draft=25 prop=25 olap pair=112.1ms serial=196.3ms gain=84.2ms ratio=0.43 s0=7.8ms s1=188.5ms wait=0.2/41.9ms pred gate=device Token # 887: 3.846ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 888: 117.003ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=53434 prop=53434 olap pair=111.6ms serial=197.5ms gain=85.9ms ratio=0.44 s0=4.9ms s1=192.6ms wait=0.1/45.2ms pred gate=device Token # 889: 3.815ms; value: next_token_ids=tensor([53434], device='cuda:0') mtp accept=1 prop=53434 top1=53434 accp=1.000 next=pair draft=1190 prop=1190 pred gate=device Token # 890: 116.519ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=draft=26 prop=26 olap pair=111.0ms serial=196.9ms gain=85.8ms ratio=0.44 s0=5.8ms s1=191.0ms wait=0.2/44.1ms pred gate=device Token # 891: 3.850ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 892: 117.706ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=18874 prop=18874 olap pair=111.4ms serial=196.9ms gain=85.5ms ratio=0.43 s0=7.8ms s1=189.1ms wait=0.2/41.8ms pred gate=device Token # 893: 5.024ms; value: next_token_ids=tensor([18874], device='cuda:0') mtp accept=1 prop=18874 top1=18874 accp=1.000 next=pair draft=1190 prop=1190 pred gate=device Token # 894: 118.388ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=draft=18 prop=18 olap pair=112.7ms serial=195.6ms gain=82.9ms ratio=0.42 s0=9.8ms s1=185.8ms wait=0.3/39.7ms pred gate=device Token # 895: 3.810ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=pair draft=6787 prop=6787 pred gate=device Token # 896: 119.023ms; value: next_token_ids=tensor([6787], device='cuda:0') mtp accept=1 prop=6787 top1=6787 accp=1.000 next=draft=223 prop=223 olap pair=113.0ms serial=198.4ms gain=85.4ms ratio=0.43 s0=8.7ms s1=189.7ms wait=0.2/40.8ms pred gate=device Token # 897: 3.901ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=548 prop=548 pred gate=device Token # 898: 117.580ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=1 prop=548 top1=548 accp=1.000 next=draft=438 prop=438 olap pair=112.1ms serial=197.3ms gain=85.2ms ratio=0.43 s0=5.4ms s1=191.9ms wait=0.1/44.9ms pred gate=device Token # 899: 3.891ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.989 next=pair draft=223 prop=223 pred gate=device Token # 900: 117.016ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=18 prop=18 olap pair=111.5ms serial=197.9ms gain=86.4ms ratio=0.44 s0=4.8ms s1=193.0ms wait=0.1/45.3ms pred gate=device Token # 901: 3.886ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=pair draft=13 prop=13 pred gate=device Token # 902: 118.115ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=25 prop=25 olap pair=111.8ms serial=198.3ms gain=86.5ms ratio=0.44 s0=4.4ms s1=193.9ms wait=0.1/45.8ms pred gate=device Token # 903: 4.665ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=pair draft=13 prop=13 pred gate=device Token # 904: 116.900ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=26 prop=26 olap pair=111.1ms serial=196.5ms gain=85.4ms ratio=0.43 s0=7.7ms s1=188.8ms wait=0.2/42.2ms pred gate=device Token # 905: 3.824ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=pair draft=13 prop=13 pred gate=device Token # 906: 116.986ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=18 prop=18 olap pair=111.5ms serial=197.9ms gain=86.4ms ratio=0.44 s0=5.9ms s1=192.0ms wait=0.2/43.9ms pred gate=device Token # 907: 3.788ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 908: 118.546ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.902 next=draft=223 prop=223 olap pair=112.2ms serial=199.0ms gain=86.8ms ratio=0.44 s0=7.2ms s1=191.8ms wait=0.2/42.6ms pred gate=device Token # 909: 4.900ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=856 prop=856 pred gate=device Token # 910: 117.385ms; value: next_token_ids=tensor([856], device='cuda:0') mtp accept=1 prop=856 top1=856 accp=1.000 next=draft=320 prop=320 olap pair=111.9ms serial=196.1ms gain=84.3ms ratio=0.43 s0=6.2ms s1=190.0ms wait=0.2/43.7ms pred gate=device Token # 911: 3.838ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=3143 prop=3143 pred gate=device Token # 912: 117.496ms; value: next_token_ids=tensor([3143], device='cuda:0') mtp accept=1 prop=3143 top1=3143 accp=0.842 next=draft=438 prop=438 olap pair=112.1ms serial=197.5ms gain=85.4ms ratio=0.43 s0=4.5ms s1=193.0ms wait=0.1/45.7ms pred gate=device Token # 913: 3.899ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=0 prop=438 top1=768 accp=0.346 next=pair draft=1002 prop=1002 pred gate=device Token # 914: 116.956ms; value: next_token_ids=tensor([1002], device='cuda:0') mtp accept=1 prop=1002 top1=1002 accp=1.000 next=draft=15 prop=15 olap pair=111.5ms serial=196.5ms gain=85.0ms ratio=0.43 s0=4.8ms s1=191.6ms wait=0.1/45.3ms pred gate=device Token # 915: 3.909ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=pair draft=856 prop=856 pred gate=device Token # 916: 117.080ms; value: next_token_ids=tensor([856], device='cuda:0') mtp accept=1 prop=856 top1=856 accp=1.000 next=draft=31 prop=31 olap pair=111.7ms serial=196.5ms gain=84.9ms ratio=0.43 s0=4.8ms s1=191.7ms wait=0.1/45.5ms pred gate=device Token # 917: 3.864ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=0.985 next=pair draft=20 prop=20 pred gate=device Token # 918: 117.587ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=320 prop=320 olap pair=111.3ms serial=195.9ms gain=84.6ms ratio=0.43 s0=7.7ms s1=188.2ms wait=0.2/41.9ms pred gate=device Token # 919: 4.805ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.976 next=pair draft=20 prop=5158 pred gate=device Token # 920: 117.599ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=0 prop=5158 top1=20 accp=0.521 next=draft=1267 prop=1267 olap pair=112.0ms serial=197.5ms gain=85.5ms ratio=0.43 s0=6.8ms s1=190.7ms wait=0.2/42.9ms pred gate=device Token # 921: 117.790ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=draft=223 prop=223 olap pair=111.6ms serial=197.4ms gain=85.8ms ratio=0.43 s0=7.1ms s1=190.3ms wait=0.2/42.7ms pred gate=device Token # 922: 3.931ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=779 prop=779 pred gate=device Token # 923: 118.650ms; value: next_token_ids=tensor([779], device='cuda:0') mtp accept=1 prop=779 top1=779 accp=1.000 next=draft=41772 prop=41772 olap pair=112.4ms serial=197.4ms gain=85.0ms ratio=0.43 s0=7.5ms s1=189.8ms wait=0.2/42.3ms pred gate=device Token # 924: 4.879ms; value: next_token_ids=tensor([41772], device='cuda:0') mtp accept=1 prop=41772 top1=41772 accp=0.485 next=pair draft=18 prop=18 pred gate=device Token # 925: 118.145ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=303 prop=303 olap pair=111.7ms serial=195.9ms gain=84.2ms ratio=0.43 s0=8.2ms s1=187.7ms wait=0.2/41.3ms pred gate=device Token # 926: 4.810ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.769 next=pair draft=2636 prop=2636 pred gate=device Token # 927: 118.112ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=1.000 next=draft=113008 prop=113008 olap pair=111.6ms serial=196.5ms gain=84.8ms ratio=0.43 s0=8.5ms s1=188.0ms wait=0.2/41.1ms pred gate=device Token # 928: 4.869ms; value: next_token_ids=tensor([113008], device='cuda:0') mtp accept=1 prop=113008 top1=113008 accp=1.000 next=pair draft=779 prop=779 pred gate=device Token # 929: 116.992ms; value: next_token_ids=tensor([779], device='cuda:0') mtp accept=1 prop=779 top1=779 accp=1.000 next=draft=124637 prop=124637 olap pair=111.3ms serial=196.6ms gain=85.3ms ratio=0.43 s0=5.2ms s1=191.4ms wait=0.1/44.9ms pred gate=device Token # 930: 3.834ms; value: next_token_ids=tensor([124637], device='cuda:0') mtp accept=1 prop=124637 top1=124637 accp=1.000 next=pair draft=478 prop=478 pred gate=device Token # 931: 116.561ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=draft=58000 prop=58000 olap pair=111.1ms serial=197.3ms gain=86.2ms ratio=0.44 s0=4.4ms s1=193.0ms wait=0.1/45.6ms pred gate=device Token # 932: 3.826ms; value: next_token_ids=tensor([58000], device='cuda:0') mtp accept=1 prop=58000 top1=58000 accp=1.000 next=pair draft=907 prop=907 pred gate=device Token # 933: 117.289ms; value: next_token_ids=tensor([907], device='cuda:0') mtp accept=1 prop=907 top1=907 accp=1.000 next=draft=768 prop=768 olap pair=111.3ms serial=196.5ms gain=85.2ms ratio=0.43 s0=7.3ms s1=189.3ms wait=0.2/42.5ms pred gate=device Token # 934: 3.775ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=4339 prop=4339 pred gate=device Token # 935: 117.041ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=0.829 next=draft=7163 prop=7163 olap pair=110.8ms serial=195.9ms gain=85.0ms ratio=0.43 s0=8.8ms s1=187.1ms wait=0.2/41.0ms pred gate=device Token # 936: 4.748ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.990 next=pair draft=27521 prop=27521 pred gate=device Token # 937: 118.075ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=111.8ms serial=197.5ms gain=85.7ms ratio=0.43 s0=7.6ms s1=189.9ms wait=0.2/42.4ms pred gate=device Token # 938: 4.672ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=1267 prop=1267 pred gate=device Token # 939: 117.379ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=draft=223 prop=223 olap pair=111.8ms serial=196.2ms gain=84.5ms ratio=0.43 s0=7.0ms s1=189.3ms wait=0.2/43.1ms pred gate=device Token # 940: 3.881ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=907 prop=907 pred gate=device Token # 941: 116.947ms; value: next_token_ids=tensor([907], device='cuda:0') mtp accept=1 prop=907 top1=907 accp=1.000 next=draft=320 prop=320 olap pair=110.8ms serial=195.9ms gain=85.1ms ratio=0.43 s0=5.1ms s1=190.8ms wait=0.1/45.0ms pred gate=device Token # 942: 4.688ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=18467 prop=18467 pred gate=device Token # 943: 116.897ms; value: next_token_ids=tensor([18467], device='cuda:0') mtp accept=1 prop=18467 top1=18467 accp=0.962 next=draft=16992 prop=16992 olap pair=111.0ms serial=195.6ms gain=84.6ms ratio=0.43 s0=7.2ms s1=188.4ms wait=0.2/42.8ms pred gate=device Token # 944: 3.795ms; value: next_token_ids=tensor([2541], device='cuda:0') mtp accept=0 prop=16992 top1=2541 accp=0.704 next=pair draft=14034 prop=14034 pred gate=device Token # 945: 117.570ms; value: next_token_ids=tensor([14034], device='cuda:0') mtp accept=1 prop=14034 top1=14034 accp=0.998 next=draft=2821 prop=2821 olap pair=111.4ms serial=195.7ms gain=84.3ms ratio=0.43 s0=8.4ms s1=187.3ms wait=0.2/41.4ms pred gate=device Token # 946: 4.672ms; value: next_token_ids=tensor([9939], device='cuda:0') mtp accept=0 prop=2821 top1=9939 accp=0.341 next=pair draft=320 prop=320 pred gate=device Token # 947: 116.749ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.978 next=draft=907 prop=907 olap pair=111.2ms serial=197.7ms gain=86.5ms ratio=0.44 s0=6.4ms s1=191.3ms wait=0.2/43.6ms pred gate=device Token # 948: 3.722ms; value: next_token_ids=tensor([13103], device='cuda:0') mtp accept=0 prop=907 top1=13103 accp=0.397 next=pair draft=4036 prop=4036 pred gate=device Token # 949: 117.138ms; value: next_token_ids=tensor([32257], device='cuda:0') mtp accept=0 prop=4036 top1=32257 accp=0.010 next=draft=907 prop=907 olap pair=111.0ms serial=196.5ms gain=85.5ms ratio=0.44 s0=6.9ms s1=189.7ms wait=0.2/42.8ms pred gate=device Token # 950: 118.067ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=0 prop=907 top1=7163 accp=0.217 next=draft=27521 prop=27521 olap pair=111.7ms serial=197.7ms gain=86.0ms ratio=0.44 s0=9.0ms s1=188.7ms wait=0.2/40.5ms pred gate=device Token # 951: 117.649ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=111.2ms serial=197.0ms gain=85.8ms ratio=0.44 s0=8.9ms s1=188.1ms wait=0.2/40.7ms pred gate=device Token # 952: 4.545ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=0.789 next=pair draft=438 prop=438 pred gate=device Token # 953: 117.249ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=111.0ms serial=196.4ms gain=85.5ms ratio=0.44 s0=8.9ms s1=187.5ms wait=0.2/40.7ms pred gate=device Token # 954: 4.537ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 955: 116.560ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=0.894 next=draft=64 prop=64 olap pair=111.1ms serial=197.9ms gain=86.8ms ratio=0.44 s0=4.3ms s1=193.6ms wait=0.1/45.8ms pred gate=device Token # 956: 3.816ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=397 prop=397 pred gate=device Token # 957: 116.341ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=1 prop=397 top1=397 accp=1.000 next=draft=982 prop=982 olap pair=111.0ms serial=197.4ms gain=86.4ms ratio=0.44 s0=4.1ms s1=193.3ms wait=0.1/45.9ms pred gate=device Token # 958: 3.855ms; value: next_token_ids=tensor([982], device='cuda:0') mtp accept=1 prop=982 top1=982 accp=0.999 next=pair draft=223 prop=223 pred gate=device Token # 959: 118.470ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=1457 prop=1457 olap pair=112.2ms serial=197.9ms gain=85.6ms ratio=0.43 s0=4.9ms s1=193.0ms wait=0.1/45.1ms pred gate=device Token # 960: 4.636ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=pair draft=940 prop=940 pred gate=device Token # 961: 116.826ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=1.000 next=draft=223 prop=223 olap pair=111.3ms serial=196.1ms gain=84.8ms ratio=0.43 s0=7.8ms s1=188.3ms wait=0.2/42.0ms pred gate=device Token # 962: 3.942ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.998 next=pair draft=19 prop=19 pred gate=device Token # 963: 117.640ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=320 prop=320 olap pair=111.5ms serial=197.7ms gain=86.2ms ratio=0.44 s0=6.2ms s1=191.5ms wait=0.2/43.7ms pred gate=device Token # 964: 4.741ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.956 next=pair draft=4339 prop=4339 pred gate=device Token # 965: 117.118ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=0.989 next=draft=20 prop=20 olap pair=111.5ms serial=197.9ms gain=86.4ms ratio=0.44 s0=6.2ms s1=191.7ms wait=0.2/43.7ms pred gate=device Token # 966: 3.836ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=64 prop=64 pred gate=device Token # 967: 117.454ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=draft=397 prop=397 olap pair=111.6ms serial=198.9ms gain=87.3ms ratio=0.44 s0=4.5ms s1=194.4ms wait=0.1/45.6ms pred gate=device Token # 968: 3.871ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=1 prop=397 top1=397 accp=1.000 next=pair draft=1267 prop=1267 pred gate=device Token # 969: 118.128ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=draft=223 prop=223 olap pair=111.9ms serial=198.4ms gain=86.5ms ratio=0.44 s0=6.1ms s1=192.3ms wait=0.2/43.7ms pred gate=device Token # 970: 4.541ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=907 prop=907 pred gate=device Token # 971: 116.378ms; value: next_token_ids=tensor([907], device='cuda:0') mtp accept=1 prop=907 top1=907 accp=1.000 next=draft=320 prop=303 olap pair=111.0ms serial=197.5ms gain=86.4ms ratio=0.44 s0=4.3ms s1=193.1ms wait=0.1/45.8ms pred gate=device Token # 972: 3.843ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=0 prop=303 top1=320 accp=0.502 next=pair draft=20 prop=20 pred gate=device Token # 973: 116.474ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=301 prop=64 olap pair=111.1ms serial=197.6ms gain=86.5ms ratio=0.44 s0=5.4ms s1=192.1ms wait=0.2/44.5ms pred gate=device Token # 974: 3.806ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=0 prop=64 top1=301 accp=0.891 next=pair draft=61947 prop=61947 pred gate=device Token # 975: 116.976ms; value: next_token_ids=tensor([2311], device='cuda:0') mtp accept=0 prop=61947 top1=2311 accp=0.412 next=draft=907 prop=907 olap pair=111.5ms serial=198.3ms gain=86.8ms ratio=0.44 s0=4.4ms s1=193.9ms wait=0.1/45.4ms pred gate=device Token # 976: 117.530ms; value: next_token_ids=tensor([907], device='cuda:0') mtp accept=1 prop=907 top1=907 accp=0.996 next=draft=15252 prop=61947 olap pair=112.0ms serial=199.5ms gain=87.5ms ratio=0.44 s0=4.0ms s1=195.6ms wait=0.1/46.2ms pred gate=device Token # 977: 4.005ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=0 prop=61947 top1=301 accp=0.199 next=pair draft=61947 prop=61947 pred gate=device Token # 978: 117.372ms; value: next_token_ids=tensor([4479], device='cuda:0') mtp accept=0 prop=61947 top1=4479 accp=0.000 next=draft=768 prop=768 olap pair=111.8ms serial=197.0ms gain=85.1ms ratio=0.43 s0=6.4ms s1=190.6ms wait=0.2/43.6ms pred gate=device Token # 979: 118.125ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.982 next=draft=20 prop=20 olap pair=112.7ms serial=197.1ms gain=84.4ms ratio=0.43 s0=6.9ms s1=190.3ms wait=0.2/43.2ms pred gate=device Token # 980: 3.912ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=0.987 next=pair draft=64 prop=64 pred gate=device Token # 981: 116.677ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=draft=19 prop=19 olap pair=111.2ms serial=195.2ms gain=84.0ms ratio=0.43 s0=7.9ms s1=187.2ms wait=0.2/42.0ms pred gate=device Token # 982: 3.780ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=0.962 next=pair draft=31 prop=31 pred gate=device Token # 983: 117.904ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=20 prop=20 olap pair=111.6ms serial=196.5ms gain=84.9ms ratio=0.43 s0=8.7ms s1=187.8ms wait=0.2/41.1ms pred gate=device Token # 984: 4.486ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 985: 117.159ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.993 next=draft=20 prop=20 olap pair=110.9ms serial=195.8ms gain=84.9ms ratio=0.43 s0=7.1ms s1=188.7ms wait=0.2/42.8ms pred gate=device Token # 986: 4.769ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=64 prop=64 pred gate=device Token # 987: 116.741ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=draft=20 prop=20 olap pair=111.2ms serial=196.7ms gain=85.5ms ratio=0.43 s0=7.3ms s1=189.4ms wait=0.2/42.6ms pred gate=device Token # 988: 3.801ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 989: 117.325ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=22 prop=22 olap pair=111.9ms serial=197.6ms gain=85.7ms ratio=0.43 s0=6.5ms s1=191.0ms wait=0.2/43.4ms pred gate=device Token # 990: 3.770ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 991: 117.242ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=20 prop=20 olap pair=111.8ms serial=196.8ms gain=85.0ms ratio=0.43 s0=5.5ms s1=191.3ms wait=0.1/44.8ms pred gate=device Token # 992: 3.856ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=64 prop=64 pred gate=device Token # 993: 117.832ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=draft=21 prop=21 olap pair=112.5ms serial=197.9ms gain=85.4ms ratio=0.43 s0=6.5ms s1=191.4ms wait=0.2/43.4ms pred gate=device Token # 994: 3.804ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 995: 117.629ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=26 prop=26 olap pair=112.2ms serial=196.5ms gain=84.3ms ratio=0.43 s0=6.8ms s1=189.7ms wait=0.2/43.0ms pred gate=device Token # 996: 3.783ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 997: 118.264ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.985 next=draft=20 prop=20 olap pair=112.1ms serial=198.3ms gain=86.2ms ratio=0.43 s0=5.0ms s1=193.3ms wait=0.1/45.0ms pred gate=device Token # 998: 4.748ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=64 prop=64 pred gate=device Token # 999: 116.709ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=draft=22 prop=22 olap pair=111.2ms serial=196.0ms gain=84.8ms ratio=0.43 s0=8.8ms s1=187.2ms wait=0.2/40.8ms pred gate=device Token # 1000: 3.783ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1001: 116.426ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=926 prop=926 olap pair=111.0ms serial=197.4ms gain=86.4ms ratio=0.44 s0=4.1ms s1=193.4ms wait=0.1/46.1ms pred gate=device Token # 1002: 3.772ms; value: next_token_ids=tensor([926], device='cuda:0') mtp accept=1 prop=926 top1=926 accp=1.000 next=pair draft=1267 prop=1267 pred gate=device Token # 1003: 116.607ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=0.986 next=draft=223 prop=223 olap pair=111.2ms serial=198.0ms gain=86.8ms ratio=0.44 s0=3.9ms s1=194.1ms wait=0.1/46.1ms pred gate=device Token # 1004: 3.871ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.900 next=pair draft=907 prop=907 pred gate=device Token # 1005: 117.142ms; value: next_token_ids=tensor([907], device='cuda:0') mtp accept=1 prop=907 top1=907 accp=1.000 next=draft=31 prop=31 olap pair=111.6ms serial=198.6ms gain=86.9ms ratio=0.44 s0=4.1ms s1=194.5ms wait=0.1/46.2ms pred gate=device Token # 1006: 4.033ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=0.996 next=pair draft=21 prop=21 pred gate=device Token # 1007: 118.005ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=draft=303 prop=303 olap pair=111.8ms serial=196.0ms gain=84.2ms ratio=0.43 s0=7.9ms s1=188.1ms wait=0.2/42.1ms pred gate=device Token # 1008: 4.802ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 1009: 117.019ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=64 prop=64 olap pair=111.3ms serial=196.2ms gain=84.9ms ratio=0.43 s0=7.8ms s1=188.4ms wait=0.2/42.1ms pred gate=device Token # 1010: 3.738ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=23 prop=23 pred gate=device Token # 1011: 117.084ms; value: next_token_ids=tensor([23], device='cuda:0') mtp accept=1 prop=23 top1=23 accp=1.000 next=draft=31 prop=31 olap pair=111.7ms serial=198.5ms gain=86.8ms ratio=0.44 s0=4.7ms s1=193.8ms wait=0.1/45.2ms pred gate=device Token # 1012: 3.865ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=24 prop=24 pred gate=device Token # 1013: 117.066ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=draft=303 prop=303 olap pair=111.6ms serial=198.1ms gain=86.5ms ratio=0.44 s0=4.6ms s1=193.5ms wait=0.1/45.3ms pred gate=device Token # 1014: 3.950ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.981 next=pair draft=20 prop=20 pred gate=device Token # 1015: 116.313ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=64 prop=64 olap pair=110.9ms serial=197.4ms gain=86.5ms ratio=0.44 s0=4.8ms s1=192.5ms wait=0.1/45.3ms pred gate=device Token # 1016: 3.825ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=24 prop=24 pred gate=device Token # 1017: 117.471ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=draft=31 prop=31 olap pair=112.1ms serial=199.1ms gain=86.9ms ratio=0.44 s0=4.7ms s1=194.3ms wait=0.1/45.3ms pred gate=device Token # 1018: 3.876ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=736 prop=736 pred gate=device Token # 1019: 117.389ms; value: next_token_ids=tensor([736], device='cuda:0') mtp accept=1 prop=736 top1=736 accp=1.000 next=draft=303 prop=1267 olap pair=112.1ms serial=198.4ms gain=86.3ms ratio=0.44 s0=4.7ms s1=193.7ms wait=0.1/45.6ms pred gate=device Token # 1020: 3.882ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=1267 top1=303 accp=0.591 next=pair draft=20 prop=20 pred gate=device Token # 1021: 117.123ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=64 prop=64 olap pair=111.7ms serial=197.6ms gain=85.9ms ratio=0.43 s0=4.5ms s1=193.1ms wait=0.1/45.8ms pred gate=device Token # 1022: 3.742ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=25 prop=25 pred gate=device Token # 1023: 116.486ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=draft=31 prop=31 olap pair=111.1ms serial=197.3ms gain=86.1ms ratio=0.44 s0=5.7ms s1=191.6ms wait=0.2/44.2ms pred gate=device