[2026-04-08 07:44:03.167432 INFO duck_llm] 这是一条信息日志 [2026-04-08 07:44:03.167462 WARN duck_llm] 这是一条警告日志 [2026-04-08 07:44:03.167465 ERROR duck_llm] 这是一条错误日志 [2026-04-08 07:44:03.167665 INFO utils] Selected DPDK lcores: master=0, workers=[2, 4, 6, 8], all_performance_core_representatives=[0, 2, 4, 6, 8, 10, 12, 14] EAL: Detected CPU lcores: 32 EAL: Detected NUMA nodes: 1 EAL: Detected shared linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: VFIO support initialized EAL: Using IOMMU type 1 (Type 1) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) [2026-04-08 07:44:05.242793 INFO dpdk_workers] DPDK initialized successfully. Found 4 ports. [2026-04-08 07:44:05.242810 INFO dpdk_workers] Port 0 device name: 0000:01:00.0 [2026-04-08 07:44:05.242813 INFO dpdk_workers] Port 0 IP address: 10.21.1.1 [2026-04-08 07:44:05.242815 INFO dpdk_workers] Port 0 Broadcast address: 10.21.1.255 [2026-04-08 07:44:05.242817 INFO dpdk_workers] Port 1 device name: 0000:01:00.1 [2026-04-08 07:44:05.242819 INFO dpdk_workers] Port 1 IP address: 10.21.2.1 [2026-04-08 07:44:05.242820 INFO dpdk_workers] Port 1 Broadcast address: 10.21.2.255 [2026-04-08 07:44:05.242822 INFO dpdk_workers] Port 2 device name: 0000:01:00.2 [2026-04-08 07:44:05.242823 INFO dpdk_workers] Port 2 IP address: 10.21.3.1 [2026-04-08 07:44:05.242825 INFO dpdk_workers] Port 2 Broadcast address: 10.21.3.255 [2026-04-08 07:44:05.242826 INFO dpdk_workers] Port 3 device name: 0000:01:00.3 [2026-04-08 07:44:05.242828 INFO dpdk_workers] Port 3 IP address: 10.21.4.1 [2026-04-08 07:44:05.242829 INFO dpdk_workers] Port 3 Broadcast address: 10.21.4.255 [2026-04-08 07:44:05.242832 INFO dpdk_workers] Available netifs list: [(10.21.1.255, 0, 10.21.1.1), (10.21.2.255, 1, 10.21.2.1), (10.21.3.255, 2, 10.21.3.1), (10.21.4.255, 3, 10.21.4.1)] [2026-04-08 07:44:05.242837 INFO dpdk_workers] Starting worker #0: (bcast_ip: 10.21.1.255, port_id: 0, lcore_id: 2, host_ip: 10.21.1.1) [2026-04-08 07:44:05.242880 INFO dpdk_workers] Initializing worker port 0 on lcore 2... [2026-04-08 07:44:05.244781 INFO dpdk_workers] Starting worker #1: (bcast_ip: 10.21.2.255, port_id: 1, lcore_id: 4, host_ip: 10.21.2.1) [2026-04-08 07:44:05.244810 INFO dpdk_workers] Starting worker #2: (bcast_ip: 10.21.3.255, port_id: 2, lcore_id: 6, host_ip: 10.21.3.1) [2026-04-08 07:44:05.244822 INFO dpdk_workers] Starting worker #3: (bcast_ip: 10.21.4.255, port_id: 3, lcore_id: 8, host_ip: 10.21.4.1) [2026-04-08 07:44:05.244848 INFO dpdk_workers] Initializing worker port 1 on lcore 4... [2026-04-08 07:44:05.246811 INFO dpdk_workers] Initializing worker port 2 on lcore 6... [2026-04-08 07:44:05.248795 INFO dpdk_workers] Initializing worker port 3 on lcore 8... ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 0). ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 1). ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 3). ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 2). [2026-04-08 07:44:08.792570 INFO dpdk_workers] Worker port 2 initialized successfully. [2026-04-08 07:44:09.651263 INFO dpdk_workers] Worker port 0 initialized successfully. [2026-04-08 07:44:09.653172 INFO dpdk_workers] Worker port 1 initialized successfully. [2026-04-08 07:44:10.493747 INFO dpdk_workers] Worker port 3 initialized successfully. [2026-04-08 07:44:10.493796 INFO dpdk_workers] Workers initialized successfully. 4 workers running. [2026-04-08 07:44:10.494106 INFO utils] Binding master thread to cores (excluding workers): [0, 1, 3, 5, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31] [2026-04-08 07:44:10.494115 INFO utils] set_thread_affinity(tid 1355637, cores [0, 1, 3, 5, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]): 0 [2026-04-08 07:44:10.494936 INFO dpdk_workers] Run command Ping all time: send 1.2 us, recv 811.4 us [2026-04-08 07:44:10.544993 INFO dpdk_workers] Run command Ping all time: send 0.4 us, recv 0.4 us [2026-04-08 07:44:10.595049 INFO dpdk_workers] Run command Ping all time: send 0.2 us, recv 0.3 us [2026-04-08 07:44:10.645105 INFO dpdk_workers] Run command Ping all time: send 0.2 us, recv 0.5 us [2026-04-08 07:44:10.695161 INFO dpdk_workers] Run command Ping all time: send 0.2 us, recv 0.4 us [2026-04-08 07:44:10.745217 INFO dpdk_workers] Run command Ping all time: send 0.2 us, recv 0.4 us [2026-04-08 07:44:10.795272 INFO dpdk_workers] Run command Ping all time: send 0.2 us, recv 0.4 us [2026-04-08 07:44:10.845328 INFO dpdk_workers] Run command Ping all time: send 0.2 us, recv 0.6 us [2026-04-08 07:44:10.895384 INFO dpdk_workers] Run command Ping all time: send 0.3 us, recv 0.4 us [2026-04-08 07:44:10.945440 INFO dpdk_workers] Run command Ping all time: send 0.3 us, recv 0.3 us [2026-04-08 07:44:10.995508 INFO dpdk_workers] Found 32 ducks in duck-ips-multi-netifs.txt [2026-04-08 07:44:10.995511 INFO dpdk_workers] Duck #0: 10.21.1.101 (bcast_ip: 10.21.1.255) [2026-04-08 07:44:10.995514 INFO dpdk_workers] Duck #1: 10.21.1.102 (bcast_ip: 10.21.1.255) [2026-04-08 07:44:10.995516 INFO dpdk_workers] Duck #2: 10.21.1.103 (bcast_ip: 10.21.1.255) [2026-04-08 07:44:10.995518 INFO dpdk_workers] Duck #3: 10.21.1.104 (bcast_ip: 10.21.1.255) [2026-04-08 07:44:10.995520 INFO dpdk_workers] Duck #4: 10.21.1.105 (bcast_ip: 10.21.1.255) [2026-04-08 07:44:10.995522 INFO dpdk_workers] Duck #5: 10.21.1.106 (bcast_ip: 10.21.1.255) [2026-04-08 07:44:10.995524 INFO dpdk_workers] Duck #6: 10.21.1.107 (bcast_ip: 10.21.1.255) [2026-04-08 07:44:10.995526 INFO dpdk_workers] Duck #7: 10.21.1.108 (bcast_ip: 10.21.1.255) [2026-04-08 07:44:10.995528 INFO dpdk_workers] Duck #8: 10.21.2.101 (bcast_ip: 10.21.2.255) [2026-04-08 07:44:10.995530 INFO dpdk_workers] Duck #9: 10.21.2.102 (bcast_ip: 10.21.2.255) [2026-04-08 07:44:10.995531 INFO dpdk_workers] Duck #10: 10.21.2.103 (bcast_ip: 10.21.2.255) [2026-04-08 07:44:10.995533 INFO dpdk_workers] Duck #11: 10.21.2.104 (bcast_ip: 10.21.2.255) [2026-04-08 07:44:10.995535 INFO dpdk_workers] Duck #12: 10.21.2.105 (bcast_ip: 10.21.2.255) [2026-04-08 07:44:10.995537 INFO dpdk_workers] Duck #13: 10.21.2.106 (bcast_ip: 10.21.2.255) [2026-04-08 07:44:10.995539 INFO dpdk_workers] Duck #14: 10.21.2.107 (bcast_ip: 10.21.2.255) [2026-04-08 07:44:10.995540 INFO dpdk_workers] Duck #15: 10.21.2.108 (bcast_ip: 10.21.2.255) [2026-04-08 07:44:10.995542 INFO dpdk_workers] Duck #16: 10.21.3.101 (bcast_ip: 10.21.3.255) [2026-04-08 07:44:10.995544 INFO dpdk_workers] Duck #17: 10.21.3.102 (bcast_ip: 10.21.3.255) [2026-04-08 07:44:10.995546 INFO dpdk_workers] Duck #18: 10.21.3.103 (bcast_ip: 10.21.3.255) [2026-04-08 07:44:10.995548 INFO dpdk_workers] Duck #19: 10.21.3.104 (bcast_ip: 10.21.3.255) [2026-04-08 07:44:10.995549 INFO dpdk_workers] Duck #20: 10.21.3.105 (bcast_ip: 10.21.3.255) [2026-04-08 07:44:10.995551 INFO dpdk_workers] Duck #21: 10.21.3.106 (bcast_ip: 10.21.3.255) [2026-04-08 07:44:10.995553 INFO dpdk_workers] Duck #22: 10.21.3.107 (bcast_ip: 10.21.3.255) [2026-04-08 07:44:10.995555 INFO dpdk_workers] Duck #23: 10.21.3.108 (bcast_ip: 10.21.3.255) [2026-04-08 07:44:10.995557 INFO dpdk_workers] Duck #24: 10.21.4.101 (bcast_ip: 10.21.4.255) [2026-04-08 07:44:10.995558 INFO dpdk_workers] Duck #25: 10.21.4.102 (bcast_ip: 10.21.4.255) [2026-04-08 07:44:10.995560 INFO dpdk_workers] Duck #26: 10.21.4.103 (bcast_ip: 10.21.4.255) [2026-04-08 07:44:10.995562 INFO dpdk_workers] Duck #27: 10.21.4.104 (bcast_ip: 10.21.4.255) [2026-04-08 07:44:10.995564 INFO dpdk_workers] Duck #28: 10.21.4.105 (bcast_ip: 10.21.4.255) [2026-04-08 07:44:10.995566 INFO dpdk_workers] Duck #29: 10.21.4.106 (bcast_ip: 10.21.4.255) [2026-04-08 07:44:10.995568 INFO dpdk_workers] Duck #30: 10.21.4.107 (bcast_ip: 10.21.4.255) [2026-04-08 07:44:10.995572 INFO dpdk_workers] Duck #31: 10.21.4.108 (bcast_ip: 10.21.4.255) [2026-04-08 07:44:10.997191 INFO dpdk_workers] [Worker 0]: 10.21.1.101 [2026-04-08 07:44:10.997195 INFO dpdk_workers] [Worker 0]: 10.21.1.102 [2026-04-08 07:44:10.997196 INFO dpdk_workers] [Worker 0]: 10.21.1.103 [2026-04-08 07:44:10.997198 INFO dpdk_workers] [Worker 0]: 10.21.1.104 [2026-04-08 07:44:10.997199 INFO dpdk_workers] [Worker 0]: 10.21.1.105 [2026-04-08 07:44:10.997201 INFO dpdk_workers] [Worker 0]: 10.21.1.106 [2026-04-08 07:44:10.997202 INFO dpdk_workers] [Worker 0]: 10.21.1.107 [2026-04-08 07:44:10.997204 INFO dpdk_workers] [Worker 0]: 10.21.1.108 [2026-04-08 07:44:10.997521 INFO dpdk_workers] [Worker 1]: 10.21.2.101 [2026-04-08 07:44:10.997524 INFO dpdk_workers] [Worker 1]: 10.21.2.102 [2026-04-08 07:44:10.997525 INFO dpdk_workers] [Worker 1]: 10.21.2.103 [2026-04-08 07:44:10.997527 INFO dpdk_workers] [Worker 1]: 10.21.2.104 [2026-04-08 07:44:10.997528 INFO dpdk_workers] [Worker 1]: 10.21.2.105 [2026-04-08 07:44:10.997530 INFO dpdk_workers] [Worker 1]: 10.21.2.106 [2026-04-08 07:44:10.997531 INFO dpdk_workers] [Worker 1]: 10.21.2.107 [2026-04-08 07:44:10.997532 INFO dpdk_workers] [Worker 1]: 10.21.2.108 [2026-04-08 07:44:10.997588 INFO dpdk_workers] [Worker 2]: 10.21.3.101 [2026-04-08 07:44:10.997590 INFO dpdk_workers] [Worker 2]: 10.21.3.102 [2026-04-08 07:44:10.997592 INFO dpdk_workers] [Worker 2]: 10.21.3.103 [2026-04-08 07:44:10.997593 INFO dpdk_workers] [Worker 2]: 10.21.3.104 [2026-04-08 07:44:10.997595 INFO dpdk_workers] [Worker 2]: 10.21.3.105 [2026-04-08 07:44:10.997596 INFO dpdk_workers] [Worker 2]: 10.21.3.106 [2026-04-08 07:44:10.997597 INFO dpdk_workers] [Worker 2]: 10.21.3.107 [2026-04-08 07:44:10.997599 INFO dpdk_workers] [Worker 2]: 10.21.3.108 [2026-04-08 07:44:11.297639 INFO dpdk_workers] [Worker 3]: 10.21.4.101 [2026-04-08 07:44:11.297642 INFO dpdk_workers] [Worker 3]: 10.21.4.102 [2026-04-08 07:44:11.297643 INFO dpdk_workers] [Worker 3]: 10.21.4.103 [2026-04-08 07:44:11.297645 INFO dpdk_workers] [Worker 3]: 10.21.4.104 [2026-04-08 07:44:11.297646 INFO dpdk_workers] [Worker 3]: 10.21.4.105 [2026-04-08 07:44:11.297647 INFO dpdk_workers] [Worker 3]: 10.21.4.106 [2026-04-08 07:44:11.297649 INFO dpdk_workers] [Worker 3]: 10.21.4.107 [2026-04-08 07:44:11.297651 INFO dpdk_workers] [Worker 3]: 10.21.4.108 [2026-04-08 07:44:11.297660 INFO dpdk_workers] init_ducks done [2026-04-08 07:44:11.303302 INFO dpdk_ducks] Initialized 4 DPDK duck workers [2026-04-08 07:44:11.303305 INFO dpdk_ducks] DPDK duck worker 0: DpdkDuckWorker { worker_idx: 0, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (0, 8) } [2026-04-08 07:44:11.303309 INFO dpdk_ducks] DPDK duck worker 1: DpdkDuckWorker { worker_idx: 1, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (8, 16) } [2026-04-08 07:44:11.303312 INFO dpdk_ducks] DPDK duck worker 2: DpdkDuckWorker { worker_idx: 2, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (16, 24) } [2026-04-08 07:44:11.303314 INFO dpdk_ducks] DPDK duck worker 3: DpdkDuckWorker { worker_idx: 3, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (24, 32) } [2026-04-08 07:44:11.303319 INFO buffer_manager] Initializing buffer manager [2026-04-08 07:44:11.303321 INFO buffer_manager] Buffer manager initialized: ELF BufferAllocator { begin: 0, end: 10485760, current: 0 }, input BufferAllocator { begin: 10485760, end: 104857600, current: 10485760 }, weights BufferAllocator { begin: 104923136, end: 32212254720, current: 104923136 } [2026-04-08 07:44:11.303325 INFO fp8_dpdk_common] fp9 persistent judge enabled by default; set DUCK_FP9_PERSISTENT_JUDGE=0 to disable [2026-04-08 07:44:11.303738 INFO buffer_manager] Added kernel fp9_kernels at (0, 91664) [2026-04-08 07:44:11.303773 INFO fp8_dpdk_common] fp9 persistent judge: opened 32 sessions [2026-04-08 07:44:11.303778 INFO fp8_dpdk_common] fp9 persistent judge: force-opened 32 fresh sessions for new init [2026-04-08 07:44:11.303781 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init(tp_size=32) [2026-04-08 07:44:11.303783 INFO fp8_moe_dpdk] fp8_moe_dpdk: init(tp_size=32) [2026-04-08 07:44:11.667769 INFO weight_cache] weight_cache: header hit tp_size=32 num_slots=62 finished_slots=62 [2026-04-08 07:44:11.995256 INFO buffer_manager] Allocated weights buffer at (104923136, 0) [2026-04-08 07:44:11.995279 INFO buffer_manager] Allocated weights buffer at (104923136, 4128768) [2026-04-08 07:44:11.995281 INFO buffer_manager] Allocated weights buffer at (109051904, 516096) [2026-04-08 07:44:11.995283 INFO buffer_manager] Allocated weights buffer at (109568000, 2016) [2026-04-08 07:44:11.995284 INFO buffer_manager] Allocated weights buffer at (109572096, 4128768) [2026-04-08 07:44:11.995286 INFO buffer_manager] Allocated weights buffer at (113700864, 516096) [2026-04-08 07:44:11.995287 INFO buffer_manager] Allocated weights buffer at (114216960, 2016) [2026-04-08 07:44:11.995289 INFO buffer_manager] Allocated weights buffer at (114221056, 4128768) [2026-04-08 07:44:11.995291 INFO buffer_manager] Allocated weights buffer at (118349824, 516096) [2026-04-08 07:44:11.995292 INFO buffer_manager] Allocated weights buffer at (118865920, 2016) [2026-04-08 07:44:11.995294 INFO buffer_manager] Allocated weights buffer at (118870016, 0) [2026-04-08 07:44:11.995296 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init_layer_cached(layer_idx=0, cache_slot=0) planned desc only [2026-04-08 07:44:12.087689 INFO buffer_manager] Allocated weights buffer at (118870016, 0) [2026-04-08 07:44:12.087710 INFO buffer_manager] Allocated weights buffer at (118870016, 4128768) [2026-04-08 07:44:12.087712 INFO buffer_manager] Allocated weights buffer at (122998784, 516096) [2026-04-08 07:44:12.087713 INFO buffer_manager] Allocated weights buffer at (123514880, 2016) [2026-04-08 07:44:12.087715 INFO buffer_manager] Allocated weights buffer at (123518976, 4128768) [2026-04-08 07:44:12.087716 INFO buffer_manager] Allocated weights buffer at (127647744, 516096) [2026-04-08 07:44:12.087718 INFO buffer_manager] Allocated weights buffer at (128163840, 2016) [2026-04-08 07:44:12.087719 INFO buffer_manager] Allocated weights buffer at (128167936, 4128768) [2026-04-08 07:44:12.087721 INFO buffer_manager] Allocated weights buffer at (132296704, 516096) [2026-04-08 07:44:12.087724 INFO buffer_manager] Allocated weights buffer at (132812800, 2016) [2026-04-08 07:44:12.087726 INFO buffer_manager] Allocated weights buffer at (132816896, 0) [2026-04-08 07:44:12.087728 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init_layer_cached(layer_idx=1, cache_slot=1) planned desc only [2026-04-08 07:44:12.174471 INFO buffer_manager] Allocated weights buffer at (132816896, 0) [2026-04-08 07:44:12.174492 INFO buffer_manager] Allocated weights buffer at (132816896, 4128768) [2026-04-08 07:44:12.174495 INFO buffer_manager] Allocated weights buffer at (136945664, 516096) [2026-04-08 07:44:12.174496 INFO buffer_manager] Allocated weights buffer at (137461760, 2016) [2026-04-08 07:44:12.174502 INFO buffer_manager] Allocated weights buffer at (137465856, 4128768) [2026-04-08 07:44:12.174504 INFO buffer_manager] Allocated weights buffer at (141594624, 516096) [2026-04-08 07:44:12.174505 INFO buffer_manager] Allocated weights buffer at (142110720, 2016) [2026-04-08 07:44:12.174508 INFO buffer_manager] Allocated weights buffer at (142114816, 4128768) [2026-04-08 07:44:12.174510 INFO buffer_manager] Allocated weights buffer at (146243584, 516096) [2026-04-08 07:44:12.174512 INFO buffer_manager] Allocated weights buffer at (146759680, 2016) [2026-04-08 07:44:12.174514 INFO buffer_manager] Allocated weights buffer at (146763776, 0) [2026-04-08 07:44:12.174516 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init_layer_cached(layer_idx=2, cache_slot=2) planned desc only [2026-04-08 07:44:12.203066 INFO buffer_manager] Allocated weights buffer at (146763776, 0) [2026-04-08 07:44:12.203083 INFO buffer_manager] Allocated weights buffer at (146763776, 132120576) [2026-04-08 07:44:12.203085 INFO buffer_manager] Allocated weights buffer at (278884352, 57344) [2026-04-08 07:44:12.203087 INFO buffer_manager] Allocated weights buffer at (278941696, 132120576) [2026-04-08 07:44:12.203088 INFO buffer_manager] Allocated weights buffer at (411062272, 57344) [2026-04-08 07:44:12.203090 INFO buffer_manager] Allocated weights buffer at (411119616, 132120576) [2026-04-08 07:44:12.203091 INFO buffer_manager] Allocated weights buffer at (543240192, 57344) [2026-04-08 07:44:12.203093 INFO buffer_manager] Allocated weights buffer at (543297536, 0) [2026-04-08 07:44:12.203095 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=3, cache_slot=3) planned desc only [2026-04-08 07:44:12.239548 INFO buffer_manager] Allocated weights buffer at (543297536, 0) [2026-04-08 07:44:12.239564 INFO buffer_manager] Allocated weights buffer at (543297536, 132120576) [2026-04-08 07:44:12.239566 INFO buffer_manager] Allocated weights buffer at (675418112, 57344) [2026-04-08 07:44:12.239568 INFO buffer_manager] Allocated weights buffer at (675475456, 132120576) [2026-04-08 07:44:12.239569 INFO buffer_manager] Allocated weights buffer at (807596032, 57344) [2026-04-08 07:44:12.239571 INFO buffer_manager] Allocated weights buffer at (807653376, 132120576) [2026-04-08 07:44:12.239572 INFO buffer_manager] Allocated weights buffer at (939773952, 57344) [2026-04-08 07:44:12.239574 INFO buffer_manager] Allocated weights buffer at (939831296, 0) [2026-04-08 07:44:12.239576 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=4, cache_slot=4) planned desc only [2026-04-08 07:44:12.275873 INFO buffer_manager] Allocated weights buffer at (939831296, 0) [2026-04-08 07:44:12.275888 INFO buffer_manager] Allocated weights buffer at (939831296, 132120576) [2026-04-08 07:44:12.275890 INFO buffer_manager] Allocated weights buffer at (1071951872, 57344) [2026-04-08 07:44:12.275891 INFO buffer_manager] Allocated weights buffer at (1072009216, 132120576) [2026-04-08 07:44:12.275893 INFO buffer_manager] Allocated weights buffer at (1204129792, 57344) [2026-04-08 07:44:12.275894 INFO buffer_manager] Allocated weights buffer at (1204187136, 132120576) [2026-04-08 07:44:12.275896 INFO buffer_manager] Allocated weights buffer at (1336307712, 57344) [2026-04-08 07:44:12.275898 INFO buffer_manager] Allocated weights buffer at (1336365056, 0) [2026-04-08 07:44:12.275900 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=5, cache_slot=5) planned desc only [2026-04-08 07:44:12.312182 INFO buffer_manager] Allocated weights buffer at (1336365056, 0) [2026-04-08 07:44:12.312198 INFO buffer_manager] Allocated weights buffer at (1336365056, 132120576) [2026-04-08 07:44:12.312200 INFO buffer_manager] Allocated weights buffer at (1468485632, 57344) [2026-04-08 07:44:12.312202 INFO buffer_manager] Allocated weights buffer at (1468542976, 132120576) [2026-04-08 07:44:12.312203 INFO buffer_manager] Allocated weights buffer at (1600663552, 57344) [2026-04-08 07:44:12.312205 INFO buffer_manager] Allocated weights buffer at (1600720896, 132120576) [2026-04-08 07:44:12.312213 INFO buffer_manager] Allocated weights buffer at (1732841472, 57344) [2026-04-08 07:44:12.312215 INFO buffer_manager] Allocated weights buffer at (1732898816, 0) [2026-04-08 07:44:12.312216 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=6, cache_slot=6) planned desc only [2026-04-08 07:44:12.348541 INFO buffer_manager] Allocated weights buffer at (1732898816, 0) [2026-04-08 07:44:12.348556 INFO buffer_manager] Allocated weights buffer at (1732898816, 132120576) [2026-04-08 07:44:12.348558 INFO buffer_manager] Allocated weights buffer at (1865019392, 57344) [2026-04-08 07:44:12.348560 INFO buffer_manager] Allocated weights buffer at (1865076736, 132120576) [2026-04-08 07:44:12.348561 INFO buffer_manager] Allocated weights buffer at (1997197312, 57344) [2026-04-08 07:44:12.348563 INFO buffer_manager] Allocated weights buffer at (1997254656, 132120576) [2026-04-08 07:44:12.348564 INFO buffer_manager] Allocated weights buffer at (2129375232, 57344) [2026-04-08 07:44:12.348566 INFO buffer_manager] Allocated weights buffer at (2129432576, 0) [2026-04-08 07:44:12.348568 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=7, cache_slot=7) planned desc only [2026-04-08 07:44:12.384811 INFO buffer_manager] Allocated weights buffer at (2129432576, 0) [2026-04-08 07:44:12.384826 INFO buffer_manager] Allocated weights buffer at (2129432576, 132120576) [2026-04-08 07:44:12.384828 INFO buffer_manager] Allocated weights buffer at (2261553152, 57344) [2026-04-08 07:44:12.384829 INFO buffer_manager] Allocated weights buffer at (2261610496, 132120576) [2026-04-08 07:44:12.384830 INFO buffer_manager] Allocated weights buffer at (2393731072, 57344) [2026-04-08 07:44:12.384832 INFO buffer_manager] Allocated weights buffer at (2393788416, 132120576) [2026-04-08 07:44:12.384834 INFO buffer_manager] Allocated weights buffer at (2525908992, 57344) [2026-04-08 07:44:12.384836 INFO buffer_manager] Allocated weights buffer at (2525966336, 0) [2026-04-08 07:44:12.384838 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=8, cache_slot=8) planned desc only [2026-04-08 07:44:12.421101 INFO buffer_manager] Allocated weights buffer at (2525966336, 0) [2026-04-08 07:44:12.421115 INFO buffer_manager] Allocated weights buffer at (2525966336, 132120576) [2026-04-08 07:44:12.421117 INFO buffer_manager] Allocated weights buffer at (2658086912, 57344) [2026-04-08 07:44:12.421119 INFO buffer_manager] Allocated weights buffer at (2658144256, 132120576) [2026-04-08 07:44:12.421120 INFO buffer_manager] Allocated weights buffer at (2790264832, 57344) [2026-04-08 07:44:12.421122 INFO buffer_manager] Allocated weights buffer at (2790322176, 132120576) [2026-04-08 07:44:12.421123 INFO buffer_manager] Allocated weights buffer at (2922442752, 57344) [2026-04-08 07:44:12.421125 INFO buffer_manager] Allocated weights buffer at (2922500096, 0) [2026-04-08 07:44:12.421127 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=9, cache_slot=9) planned desc only [2026-04-08 07:44:12.457489 INFO buffer_manager] Allocated weights buffer at (2922500096, 0) [2026-04-08 07:44:12.457502 INFO buffer_manager] Allocated weights buffer at (2922500096, 132120576) [2026-04-08 07:44:12.457504 INFO buffer_manager] Allocated weights buffer at (3054620672, 57344) [2026-04-08 07:44:12.457506 INFO buffer_manager] Allocated weights buffer at (3054678016, 132120576) [2026-04-08 07:44:12.457507 INFO buffer_manager] Allocated weights buffer at (3186798592, 57344) [2026-04-08 07:44:12.457509 INFO buffer_manager] Allocated weights buffer at (3186855936, 132120576) [2026-04-08 07:44:12.457510 INFO buffer_manager] Allocated weights buffer at (3318976512, 57344) [2026-04-08 07:44:12.457513 INFO buffer_manager] Allocated weights buffer at (3319033856, 0) [2026-04-08 07:44:12.457515 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=10, cache_slot=10) planned desc only [2026-04-08 07:44:12.493914 INFO buffer_manager] Allocated weights buffer at (3319033856, 0) [2026-04-08 07:44:12.493927 INFO buffer_manager] Allocated weights buffer at (3319033856, 132120576) [2026-04-08 07:44:12.493934 INFO buffer_manager] Allocated weights buffer at (3451154432, 57344) [2026-04-08 07:44:12.493935 INFO buffer_manager] Allocated weights buffer at (3451211776, 132120576) [2026-04-08 07:44:12.493937 INFO buffer_manager] Allocated weights buffer at (3583332352, 57344) [2026-04-08 07:44:12.493939 INFO buffer_manager] Allocated weights buffer at (3583389696, 132120576) [2026-04-08 07:44:12.493941 INFO buffer_manager] Allocated weights buffer at (3715510272, 57344) [2026-04-08 07:44:12.493943 INFO buffer_manager] Allocated weights buffer at (3715567616, 0) [2026-04-08 07:44:12.493944 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=11, cache_slot=11) planned desc only [2026-04-08 07:44:12.530293 INFO buffer_manager] Allocated weights buffer at (3715567616, 0) [2026-04-08 07:44:12.530307 INFO buffer_manager] Allocated weights buffer at (3715567616, 132120576) [2026-04-08 07:44:12.530309 INFO buffer_manager] Allocated weights buffer at (3847688192, 57344) [2026-04-08 07:44:12.530311 INFO buffer_manager] Allocated weights buffer at (3847745536, 132120576) [2026-04-08 07:44:12.530312 INFO buffer_manager] Allocated weights buffer at (3979866112, 57344) [2026-04-08 07:44:12.530314 INFO buffer_manager] Allocated weights buffer at (3979923456, 132120576) [2026-04-08 07:44:12.530315 INFO buffer_manager] Allocated weights buffer at (4112044032, 57344) [2026-04-08 07:44:12.530317 INFO buffer_manager] Allocated weights buffer at (4112101376, 0) [2026-04-08 07:44:12.530319 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=12, cache_slot=12) planned desc only [2026-04-08 07:44:12.566568 INFO buffer_manager] Allocated weights buffer at (4112101376, 0) [2026-04-08 07:44:12.566582 INFO buffer_manager] Allocated weights buffer at (4112101376, 132120576) [2026-04-08 07:44:12.566584 INFO buffer_manager] Allocated weights buffer at (4244221952, 57344) [2026-04-08 07:44:12.566586 INFO buffer_manager] Allocated weights buffer at (4244279296, 132120576) [2026-04-08 07:44:12.566587 INFO buffer_manager] Allocated weights buffer at (4376399872, 57344) [2026-04-08 07:44:12.566589 INFO buffer_manager] Allocated weights buffer at (4376457216, 132120576) [2026-04-08 07:44:12.566590 INFO buffer_manager] Allocated weights buffer at (4508577792, 57344) [2026-04-08 07:44:12.566592 INFO buffer_manager] Allocated weights buffer at (4508635136, 0) [2026-04-08 07:44:12.566595 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=13, cache_slot=13) planned desc only [2026-04-08 07:44:12.602892 INFO buffer_manager] Allocated weights buffer at (4508635136, 0) [2026-04-08 07:44:12.602905 INFO buffer_manager] Allocated weights buffer at (4508635136, 132120576) [2026-04-08 07:44:12.602907 INFO buffer_manager] Allocated weights buffer at (4640755712, 57344) [2026-04-08 07:44:12.602909 INFO buffer_manager] Allocated weights buffer at (4640813056, 132120576) [2026-04-08 07:44:12.602910 INFO buffer_manager] Allocated weights buffer at (4772933632, 57344) [2026-04-08 07:44:12.602912 INFO buffer_manager] Allocated weights buffer at (4772990976, 132120576) [2026-04-08 07:44:12.602913 INFO buffer_manager] Allocated weights buffer at (4905111552, 57344) [2026-04-08 07:44:12.602915 INFO buffer_manager] Allocated weights buffer at (4905168896, 0) [2026-04-08 07:44:12.602917 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=14, cache_slot=14) planned desc only [2026-04-08 07:44:12.639171 INFO buffer_manager] Allocated weights buffer at (4905168896, 0) [2026-04-08 07:44:12.639185 INFO buffer_manager] Allocated weights buffer at (4905168896, 132120576) [2026-04-08 07:44:12.639187 INFO buffer_manager] Allocated weights buffer at (5037289472, 57344) [2026-04-08 07:44:12.639188 INFO buffer_manager] Allocated weights buffer at (5037346816, 132120576) [2026-04-08 07:44:12.639190 INFO buffer_manager] Allocated weights buffer at (5169467392, 57344) [2026-04-08 07:44:12.639191 INFO buffer_manager] Allocated weights buffer at (5169524736, 132120576) [2026-04-08 07:44:12.639193 INFO buffer_manager] Allocated weights buffer at (5301645312, 57344) [2026-04-08 07:44:12.639198 INFO buffer_manager] Allocated weights buffer at (5301702656, 0) [2026-04-08 07:44:12.639200 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=15, cache_slot=15) planned desc only [2026-04-08 07:44:12.675455 INFO buffer_manager] Allocated weights buffer at (5301702656, 0) [2026-04-08 07:44:12.675475 INFO buffer_manager] Allocated weights buffer at (5301702656, 132120576) [2026-04-08 07:44:12.675477 INFO buffer_manager] Allocated weights buffer at (5433823232, 57344) [2026-04-08 07:44:12.675479 INFO buffer_manager] Allocated weights buffer at (5433880576, 132120576) [2026-04-08 07:44:12.675480 INFO buffer_manager] Allocated weights buffer at (5566001152, 57344) [2026-04-08 07:44:12.675482 INFO buffer_manager] Allocated weights buffer at (5566058496, 132120576) [2026-04-08 07:44:12.675483 INFO buffer_manager] Allocated weights buffer at (5698179072, 57344) [2026-04-08 07:44:12.675485 INFO buffer_manager] Allocated weights buffer at (5698236416, 0) [2026-04-08 07:44:12.675487 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=16, cache_slot=16) planned desc only [2026-04-08 07:44:12.711706 INFO buffer_manager] Allocated weights buffer at (5698236416, 0) [2026-04-08 07:44:12.711719 INFO buffer_manager] Allocated weights buffer at (5698236416, 132120576) [2026-04-08 07:44:12.711721 INFO buffer_manager] Allocated weights buffer at (5830356992, 57344) [2026-04-08 07:44:12.711723 INFO buffer_manager] Allocated weights buffer at (5830414336, 132120576) [2026-04-08 07:44:12.711724 INFO buffer_manager] Allocated weights buffer at (5962534912, 57344) [2026-04-08 07:44:12.711726 INFO buffer_manager] Allocated weights buffer at (5962592256, 132120576) [2026-04-08 07:44:12.711728 INFO buffer_manager] Allocated weights buffer at (6094712832, 57344) [2026-04-08 07:44:12.711730 INFO buffer_manager] Allocated weights buffer at (6094770176, 0) [2026-04-08 07:44:12.711731 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=17, cache_slot=17) planned desc only [2026-04-08 07:44:12.747972 INFO buffer_manager] Allocated weights buffer at (6094770176, 0) [2026-04-08 07:44:12.747990 INFO buffer_manager] Allocated weights buffer at (6094770176, 132120576) [2026-04-08 07:44:12.747992 INFO buffer_manager] Allocated weights buffer at (6226890752, 57344) [2026-04-08 07:44:12.747993 INFO buffer_manager] Allocated weights buffer at (6226948096, 132120576) [2026-04-08 07:44:12.747995 INFO buffer_manager] Allocated weights buffer at (6359068672, 57344) [2026-04-08 07:44:12.747996 INFO buffer_manager] Allocated weights buffer at (6359126016, 132120576) [2026-04-08 07:44:12.747998 INFO buffer_manager] Allocated weights buffer at (6491246592, 57344) [2026-04-08 07:44:12.747999 INFO buffer_manager] Allocated weights buffer at (6491303936, 0) [2026-04-08 07:44:12.748001 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=18, cache_slot=18) planned desc only [2026-04-08 07:44:12.784193 INFO buffer_manager] Allocated weights buffer at (6491303936, 0) [2026-04-08 07:44:12.784208 INFO buffer_manager] Allocated weights buffer at (6491303936, 132120576) [2026-04-08 07:44:12.784210 INFO buffer_manager] Allocated weights buffer at (6623424512, 57344) [2026-04-08 07:44:12.784211 INFO buffer_manager] Allocated weights buffer at (6623481856, 132120576) [2026-04-08 07:44:12.784213 INFO buffer_manager] Allocated weights buffer at (6755602432, 57344) [2026-04-08 07:44:12.784214 INFO buffer_manager] Allocated weights buffer at (6755659776, 132120576) [2026-04-08 07:44:12.784216 INFO buffer_manager] Allocated weights buffer at (6887780352, 57344) [2026-04-08 07:44:12.784217 INFO buffer_manager] Allocated weights buffer at (6887837696, 0) [2026-04-08 07:44:12.784220 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=19, cache_slot=19) planned desc only [2026-04-08 07:44:12.820436 INFO buffer_manager] Allocated weights buffer at (6887837696, 0) [2026-04-08 07:44:12.820451 INFO buffer_manager] Allocated weights buffer at (6887837696, 132120576) [2026-04-08 07:44:12.820457 INFO buffer_manager] Allocated weights buffer at (7019958272, 57344) [2026-04-08 07:44:12.820459 INFO buffer_manager] Allocated weights buffer at (7020015616, 132120576) [2026-04-08 07:44:12.820461 INFO buffer_manager] Allocated weights buffer at (7152136192, 57344) [2026-04-08 07:44:12.820463 INFO buffer_manager] Allocated weights buffer at (7152193536, 132120576) [2026-04-08 07:44:12.820464 INFO buffer_manager] Allocated weights buffer at (7284314112, 57344) [2026-04-08 07:44:12.820466 INFO buffer_manager] Allocated weights buffer at (7284371456, 0) [2026-04-08 07:44:12.820467 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=20, cache_slot=20) planned desc only [2026-04-08 07:44:12.856698 INFO buffer_manager] Allocated weights buffer at (7284371456, 0) [2026-04-08 07:44:12.856712 INFO buffer_manager] Allocated weights buffer at (7284371456, 132120576) [2026-04-08 07:44:12.856715 INFO buffer_manager] Allocated weights buffer at (7416492032, 57344) [2026-04-08 07:44:12.856716 INFO buffer_manager] Allocated weights buffer at (7416549376, 132120576) [2026-04-08 07:44:12.856718 INFO buffer_manager] Allocated weights buffer at (7548669952, 57344) [2026-04-08 07:44:12.856719 INFO buffer_manager] Allocated weights buffer at (7548727296, 132120576) [2026-04-08 07:44:12.856721 INFO buffer_manager] Allocated weights buffer at (7680847872, 57344) [2026-04-08 07:44:12.856722 INFO buffer_manager] Allocated weights buffer at (7680905216, 0) [2026-04-08 07:44:12.856724 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=21, cache_slot=21) planned desc only [2026-04-08 07:44:12.892919 INFO buffer_manager] Allocated weights buffer at (7680905216, 0) [2026-04-08 07:44:12.892932 INFO buffer_manager] Allocated weights buffer at (7680905216, 132120576) [2026-04-08 07:44:12.892934 INFO buffer_manager] Allocated weights buffer at (7813025792, 57344) [2026-04-08 07:44:12.892936 INFO buffer_manager] Allocated weights buffer at (7813083136, 132120576) [2026-04-08 07:44:12.892937 INFO buffer_manager] Allocated weights buffer at (7945203712, 57344) [2026-04-08 07:44:12.892938 INFO buffer_manager] Allocated weights buffer at (7945261056, 132120576) [2026-04-08 07:44:12.892940 INFO buffer_manager] Allocated weights buffer at (8077381632, 57344) [2026-04-08 07:44:12.892941 INFO buffer_manager] Allocated weights buffer at (8077438976, 0) [2026-04-08 07:44:12.892944 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=22, cache_slot=22) planned desc only [2026-04-08 07:44:12.929241 INFO buffer_manager] Allocated weights buffer at (8077438976, 0) [2026-04-08 07:44:12.929256 INFO buffer_manager] Allocated weights buffer at (8077438976, 132120576) [2026-04-08 07:44:12.929258 INFO buffer_manager] Allocated weights buffer at (8209559552, 57344) [2026-04-08 07:44:12.929260 INFO buffer_manager] Allocated weights buffer at (8209616896, 132120576) [2026-04-08 07:44:12.929261 INFO buffer_manager] Allocated weights buffer at (8341737472, 57344) [2026-04-08 07:44:12.929263 INFO buffer_manager] Allocated weights buffer at (8341794816, 132120576) [2026-04-08 07:44:12.929265 INFO buffer_manager] Allocated weights buffer at (8473915392, 57344) [2026-04-08 07:44:12.929266 INFO buffer_manager] Allocated weights buffer at (8473972736, 0) [2026-04-08 07:44:12.929268 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=23, cache_slot=23) planned desc only [2026-04-08 07:44:12.965496 INFO buffer_manager] Allocated weights buffer at (8473972736, 0) [2026-04-08 07:44:12.965509 INFO buffer_manager] Allocated weights buffer at (8473972736, 132120576) [2026-04-08 07:44:12.965511 INFO buffer_manager] Allocated weights buffer at (8606093312, 57344) [2026-04-08 07:44:12.965513 INFO buffer_manager] Allocated weights buffer at (8606150656, 132120576) [2026-04-08 07:44:12.965514 INFO buffer_manager] Allocated weights buffer at (8738271232, 57344) [2026-04-08 07:44:12.965516 INFO buffer_manager] Allocated weights buffer at (8738328576, 132120576) [2026-04-08 07:44:12.965517 INFO buffer_manager] Allocated weights buffer at (8870449152, 57344) [2026-04-08 07:44:12.965523 INFO buffer_manager] Allocated weights buffer at (8870506496, 0) [2026-04-08 07:44:12.965525 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=24, cache_slot=24) planned desc only [2026-04-08 07:44:13.001738 INFO buffer_manager] Allocated weights buffer at (8870506496, 0) [2026-04-08 07:44:13.001756 INFO buffer_manager] Allocated weights buffer at (8870506496, 132120576) [2026-04-08 07:44:13.001758 INFO buffer_manager] Allocated weights buffer at (9002627072, 57344) [2026-04-08 07:44:13.001760 INFO buffer_manager] Allocated weights buffer at (9002684416, 132120576) [2026-04-08 07:44:13.001761 INFO buffer_manager] Allocated weights buffer at (9134804992, 57344) [2026-04-08 07:44:13.001763 INFO buffer_manager] Allocated weights buffer at (9134862336, 132120576) [2026-04-08 07:44:13.001765 INFO buffer_manager] Allocated weights buffer at (9266982912, 57344) [2026-04-08 07:44:13.001767 INFO buffer_manager] Allocated weights buffer at (9267040256, 0) [2026-04-08 07:44:13.001769 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=25, cache_slot=25) planned desc only [2026-04-08 07:44:13.037977 INFO buffer_manager] Allocated weights buffer at (9267040256, 0) [2026-04-08 07:44:13.037992 INFO buffer_manager] Allocated weights buffer at (9267040256, 132120576) [2026-04-08 07:44:13.037994 INFO buffer_manager] Allocated weights buffer at (9399160832, 57344) [2026-04-08 07:44:13.037995 INFO buffer_manager] Allocated weights buffer at (9399218176, 132120576) [2026-04-08 07:44:13.037997 INFO buffer_manager] Allocated weights buffer at (9531338752, 57344) [2026-04-08 07:44:13.037998 INFO buffer_manager] Allocated weights buffer at (9531396096, 132120576) [2026-04-08 07:44:13.038000 INFO buffer_manager] Allocated weights buffer at (9663516672, 57344) [2026-04-08 07:44:13.038001 INFO buffer_manager] Allocated weights buffer at (9663574016, 0) [2026-04-08 07:44:13.038003 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=26, cache_slot=26) planned desc only [2026-04-08 07:44:13.074402 INFO buffer_manager] Allocated weights buffer at (9663574016, 0) [2026-04-08 07:44:13.074416 INFO buffer_manager] Allocated weights buffer at (9663574016, 132120576) [2026-04-08 07:44:13.074419 INFO buffer_manager] Allocated weights buffer at (9795694592, 57344) [2026-04-08 07:44:13.074420 INFO buffer_manager] Allocated weights buffer at (9795751936, 132120576) [2026-04-08 07:44:13.074422 INFO buffer_manager] Allocated weights buffer at (9927872512, 57344) [2026-04-08 07:44:13.074424 INFO buffer_manager] Allocated weights buffer at (9927929856, 132120576) [2026-04-08 07:44:13.074425 INFO buffer_manager] Allocated weights buffer at (10060050432, 57344) [2026-04-08 07:44:13.074427 INFO buffer_manager] Allocated weights buffer at (10060107776, 0) [2026-04-08 07:44:13.074430 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=27, cache_slot=27) planned desc only [2026-04-08 07:44:13.110736 INFO buffer_manager] Allocated weights buffer at (10060107776, 0) [2026-04-08 07:44:13.110749 INFO buffer_manager] Allocated weights buffer at (10060107776, 132120576) [2026-04-08 07:44:13.110751 INFO buffer_manager] Allocated weights buffer at (10192228352, 57344) [2026-04-08 07:44:13.110753 INFO buffer_manager] Allocated weights buffer at (10192285696, 132120576) [2026-04-08 07:44:13.110754 INFO buffer_manager] Allocated weights buffer at (10324406272, 57344) [2026-04-08 07:44:13.110756 INFO buffer_manager] Allocated weights buffer at (10324463616, 132120576) [2026-04-08 07:44:13.110757 INFO buffer_manager] Allocated weights buffer at (10456584192, 57344) [2026-04-08 07:44:13.110759 INFO buffer_manager] Allocated weights buffer at (10456641536, 0) [2026-04-08 07:44:13.110761 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=28, cache_slot=28) planned desc only [2026-04-08 07:44:13.147092 INFO buffer_manager] Allocated weights buffer at (10456641536, 0) [2026-04-08 07:44:13.147105 INFO buffer_manager] Allocated weights buffer at (10456641536, 132120576) [2026-04-08 07:44:13.147111 INFO buffer_manager] Allocated weights buffer at (10588762112, 57344) [2026-04-08 07:44:13.147113 INFO buffer_manager] Allocated weights buffer at (10588819456, 132120576) [2026-04-08 07:44:13.147114 INFO buffer_manager] Allocated weights buffer at (10720940032, 57344) [2026-04-08 07:44:13.147116 INFO buffer_manager] Allocated weights buffer at (10720997376, 132120576) [2026-04-08 07:44:13.147118 INFO buffer_manager] Allocated weights buffer at (10853117952, 57344) [2026-04-08 07:44:13.147120 INFO buffer_manager] Allocated weights buffer at (10853175296, 0) [2026-04-08 07:44:13.147122 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=29, cache_slot=29) planned desc only [2026-04-08 07:44:13.183323 INFO buffer_manager] Allocated weights buffer at (10853175296, 0) [2026-04-08 07:44:13.183337 INFO buffer_manager] Allocated weights buffer at (10853175296, 132120576) [2026-04-08 07:44:13.183338 INFO buffer_manager] Allocated weights buffer at (10985295872, 57344) [2026-04-08 07:44:13.183340 INFO buffer_manager] Allocated weights buffer at (10985353216, 132120576) [2026-04-08 07:44:13.183342 INFO buffer_manager] Allocated weights buffer at (11117473792, 57344) [2026-04-08 07:44:13.183343 INFO buffer_manager] Allocated weights buffer at (11117531136, 132120576) [2026-04-08 07:44:13.183345 INFO buffer_manager] Allocated weights buffer at (11249651712, 57344) [2026-04-08 07:44:13.183346 INFO buffer_manager] Allocated weights buffer at (11249709056, 0) [2026-04-08 07:44:13.183349 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=30, cache_slot=30) planned desc only [2026-04-08 07:44:13.219600 INFO buffer_manager] Allocated weights buffer at (11249709056, 0) [2026-04-08 07:44:13.219614 INFO buffer_manager] Allocated weights buffer at (11249709056, 132120576) [2026-04-08 07:44:13.219616 INFO buffer_manager] Allocated weights buffer at (11381829632, 57344) [2026-04-08 07:44:13.219617 INFO buffer_manager] Allocated weights buffer at (11381886976, 132120576) [2026-04-08 07:44:13.219619 INFO buffer_manager] Allocated weights buffer at (11514007552, 57344) [2026-04-08 07:44:13.219620 INFO buffer_manager] Allocated weights buffer at (11514064896, 132120576) [2026-04-08 07:44:13.219622 INFO buffer_manager] Allocated weights buffer at (11646185472, 57344) [2026-04-08 07:44:13.219623 INFO buffer_manager] Allocated weights buffer at (11646242816, 0) [2026-04-08 07:44:13.219626 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=31, cache_slot=31) planned desc only [2026-04-08 07:44:13.255920 INFO buffer_manager] Allocated weights buffer at (11646242816, 0) [2026-04-08 07:44:13.255934 INFO buffer_manager] Allocated weights buffer at (11646242816, 132120576) [2026-04-08 07:44:13.255936 INFO buffer_manager] Allocated weights buffer at (11778363392, 57344) [2026-04-08 07:44:13.255938 INFO buffer_manager] Allocated weights buffer at (11778420736, 132120576) [2026-04-08 07:44:13.255939 INFO buffer_manager] Allocated weights buffer at (11910541312, 57344) [2026-04-08 07:44:13.255941 INFO buffer_manager] Allocated weights buffer at (11910598656, 132120576) [2026-04-08 07:44:13.255942 INFO buffer_manager] Allocated weights buffer at (12042719232, 57344) [2026-04-08 07:44:13.255944 INFO buffer_manager] Allocated weights buffer at (12042776576, 0) [2026-04-08 07:44:13.255945 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=32, cache_slot=32) planned desc only [2026-04-08 07:44:13.292246 INFO buffer_manager] Allocated weights buffer at (12042776576, 0) [2026-04-08 07:44:13.292259 INFO buffer_manager] Allocated weights buffer at (12042776576, 132120576) [2026-04-08 07:44:13.292261 INFO buffer_manager] Allocated weights buffer at (12174897152, 57344) [2026-04-08 07:44:13.292262 INFO buffer_manager] Allocated weights buffer at (12174954496, 132120576) [2026-04-08 07:44:13.292264 INFO buffer_manager] Allocated weights buffer at (12307075072, 57344) [2026-04-08 07:44:13.292265 INFO buffer_manager] Allocated weights buffer at (12307132416, 132120576) [2026-04-08 07:44:13.292267 INFO buffer_manager] Allocated weights buffer at (12439252992, 57344) [2026-04-08 07:44:13.292272 INFO buffer_manager] Allocated weights buffer at (12439310336, 0) [2026-04-08 07:44:13.292274 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=33, cache_slot=33) planned desc only [2026-04-08 07:44:13.328518 INFO buffer_manager] Allocated weights buffer at (12439310336, 0) [2026-04-08 07:44:13.328532 INFO buffer_manager] Allocated weights buffer at (12439310336, 132120576) [2026-04-08 07:44:13.328534 INFO buffer_manager] Allocated weights buffer at (12571430912, 57344) [2026-04-08 07:44:13.328535 INFO buffer_manager] Allocated weights buffer at (12571488256, 132120576) [2026-04-08 07:44:13.328537 INFO buffer_manager] Allocated weights buffer at (12703608832, 57344) [2026-04-08 07:44:13.328538 INFO buffer_manager] Allocated weights buffer at (12703666176, 132120576) [2026-04-08 07:44:13.328540 INFO buffer_manager] Allocated weights buffer at (12835786752, 57344) [2026-04-08 07:44:13.328541 INFO buffer_manager] Allocated weights buffer at (12835844096, 0) [2026-04-08 07:44:13.328543 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=34, cache_slot=34) planned desc only [2026-04-08 07:44:13.364739 INFO buffer_manager] Allocated weights buffer at (12835844096, 0) [2026-04-08 07:44:13.364753 INFO buffer_manager] Allocated weights buffer at (12835844096, 132120576) [2026-04-08 07:44:13.364755 INFO buffer_manager] Allocated weights buffer at (12967964672, 57344) [2026-04-08 07:44:13.364757 INFO buffer_manager] Allocated weights buffer at (12968022016, 132120576) [2026-04-08 07:44:13.364758 INFO buffer_manager] Allocated weights buffer at (13100142592, 57344) [2026-04-08 07:44:13.364760 INFO buffer_manager] Allocated weights buffer at (13100199936, 132120576) [2026-04-08 07:44:13.364761 INFO buffer_manager] Allocated weights buffer at (13232320512, 57344) [2026-04-08 07:44:13.364763 INFO buffer_manager] Allocated weights buffer at (13232377856, 0) [2026-04-08 07:44:13.364765 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=35, cache_slot=35) planned desc only [2026-04-08 07:44:13.401028 INFO buffer_manager] Allocated weights buffer at (13232377856, 0) [2026-04-08 07:44:13.401040 INFO buffer_manager] Allocated weights buffer at (13232377856, 132120576) [2026-04-08 07:44:13.401042 INFO buffer_manager] Allocated weights buffer at (13364498432, 57344) [2026-04-08 07:44:13.401044 INFO buffer_manager] Allocated weights buffer at (13364555776, 132120576) [2026-04-08 07:44:13.401045 INFO buffer_manager] Allocated weights buffer at (13496676352, 57344) [2026-04-08 07:44:13.401047 INFO buffer_manager] Allocated weights buffer at (13496733696, 132120576) [2026-04-08 07:44:13.401048 INFO buffer_manager] Allocated weights buffer at (13628854272, 57344) [2026-04-08 07:44:13.401050 INFO buffer_manager] Allocated weights buffer at (13628911616, 0) [2026-04-08 07:44:13.401051 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=36, cache_slot=36) planned desc only [2026-04-08 07:44:13.437290 INFO buffer_manager] Allocated weights buffer at (13628911616, 0) [2026-04-08 07:44:13.437304 INFO buffer_manager] Allocated weights buffer at (13628911616, 132120576) [2026-04-08 07:44:13.437305 INFO buffer_manager] Allocated weights buffer at (13761032192, 57344) [2026-04-08 07:44:13.437307 INFO buffer_manager] Allocated weights buffer at (13761089536, 132120576) [2026-04-08 07:44:13.437308 INFO buffer_manager] Allocated weights buffer at (13893210112, 57344) [2026-04-08 07:44:13.437310 INFO buffer_manager] Allocated weights buffer at (13893267456, 132120576) [2026-04-08 07:44:13.437312 INFO buffer_manager] Allocated weights buffer at (14025388032, 57344) [2026-04-08 07:44:13.437313 INFO buffer_manager] Allocated weights buffer at (14025445376, 0) [2026-04-08 07:44:13.437315 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=37, cache_slot=37) planned desc only [2026-04-08 07:44:13.473491 INFO buffer_manager] Allocated weights buffer at (14025445376, 0) [2026-04-08 07:44:13.473505 INFO buffer_manager] Allocated weights buffer at (14025445376, 132120576) [2026-04-08 07:44:13.473511 INFO buffer_manager] Allocated weights buffer at (14157565952, 57344) [2026-04-08 07:44:13.473513 INFO buffer_manager] Allocated weights buffer at (14157623296, 132120576) [2026-04-08 07:44:13.473514 INFO buffer_manager] Allocated weights buffer at (14289743872, 57344) [2026-04-08 07:44:13.473516 INFO buffer_manager] Allocated weights buffer at (14289801216, 132120576) [2026-04-08 07:44:13.473518 INFO buffer_manager] Allocated weights buffer at (14421921792, 57344) [2026-04-08 07:44:13.473520 INFO buffer_manager] Allocated weights buffer at (14421979136, 0) [2026-04-08 07:44:13.473522 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=38, cache_slot=38) planned desc only [2026-04-08 07:44:13.509886 INFO buffer_manager] Allocated weights buffer at (14421979136, 0) [2026-04-08 07:44:13.509903 INFO buffer_manager] Allocated weights buffer at (14421979136, 132120576) [2026-04-08 07:44:13.509905 INFO buffer_manager] Allocated weights buffer at (14554099712, 57344) [2026-04-08 07:44:13.509906 INFO buffer_manager] Allocated weights buffer at (14554157056, 132120576) [2026-04-08 07:44:13.509908 INFO buffer_manager] Allocated weights buffer at (14686277632, 57344) [2026-04-08 07:44:13.509909 INFO buffer_manager] Allocated weights buffer at (14686334976, 132120576) [2026-04-08 07:44:13.509911 INFO buffer_manager] Allocated weights buffer at (14818455552, 57344) [2026-04-08 07:44:13.509912 INFO buffer_manager] Allocated weights buffer at (14818512896, 0) [2026-04-08 07:44:13.509914 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=39, cache_slot=39) planned desc only [2026-04-08 07:44:13.546291 INFO buffer_manager] Allocated weights buffer at (14818512896, 0) [2026-04-08 07:44:13.546306 INFO buffer_manager] Allocated weights buffer at (14818512896, 132120576) [2026-04-08 07:44:13.546308 INFO buffer_manager] Allocated weights buffer at (14950633472, 57344) [2026-04-08 07:44:13.546310 INFO buffer_manager] Allocated weights buffer at (14950690816, 132120576) [2026-04-08 07:44:13.546311 INFO buffer_manager] Allocated weights buffer at (15082811392, 57344) [2026-04-08 07:44:13.546312 INFO buffer_manager] Allocated weights buffer at (15082868736, 132120576) [2026-04-08 07:44:13.546314 INFO buffer_manager] Allocated weights buffer at (15214989312, 57344) [2026-04-08 07:44:13.546316 INFO buffer_manager] Allocated weights buffer at (15215046656, 0) [2026-04-08 07:44:13.546317 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=40, cache_slot=40) planned desc only [2026-04-08 07:44:13.582494 INFO buffer_manager] Allocated weights buffer at (15215046656, 0) [2026-04-08 07:44:13.582509 INFO buffer_manager] Allocated weights buffer at (15215046656, 132120576) [2026-04-08 07:44:13.582511 INFO buffer_manager] Allocated weights buffer at (15347167232, 57344) [2026-04-08 07:44:13.582512 INFO buffer_manager] Allocated weights buffer at (15347224576, 132120576) [2026-04-08 07:44:13.582514 INFO buffer_manager] Allocated weights buffer at (15479345152, 57344) [2026-04-08 07:44:13.582515 INFO buffer_manager] Allocated weights buffer at (15479402496, 132120576) [2026-04-08 07:44:13.582517 INFO buffer_manager] Allocated weights buffer at (15611523072, 57344) [2026-04-08 07:44:13.582518 INFO buffer_manager] Allocated weights buffer at (15611580416, 0) [2026-04-08 07:44:13.582521 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=41, cache_slot=41) planned desc only [2026-04-08 07:44:13.618646 INFO buffer_manager] Allocated weights buffer at (15611580416, 0) [2026-04-08 07:44:13.618659 INFO buffer_manager] Allocated weights buffer at (15611580416, 132120576) [2026-04-08 07:44:13.618661 INFO buffer_manager] Allocated weights buffer at (15743700992, 57344) [2026-04-08 07:44:13.618663 INFO buffer_manager] Allocated weights buffer at (15743758336, 132120576) [2026-04-08 07:44:13.618664 INFO buffer_manager] Allocated weights buffer at (15875878912, 57344) [2026-04-08 07:44:13.618666 INFO buffer_manager] Allocated weights buffer at (15875936256, 132120576) [2026-04-08 07:44:13.618667 INFO buffer_manager] Allocated weights buffer at (16008056832, 57344) [2026-04-08 07:44:13.618680 INFO buffer_manager] Allocated weights buffer at (16008114176, 0) [2026-04-08 07:44:13.618681 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=42, cache_slot=42) planned desc only [2026-04-08 07:44:13.654947 INFO buffer_manager] Allocated weights buffer at (16008114176, 0) [2026-04-08 07:44:13.654961 INFO buffer_manager] Allocated weights buffer at (16008114176, 132120576) [2026-04-08 07:44:13.654963 INFO buffer_manager] Allocated weights buffer at (16140234752, 57344) [2026-04-08 07:44:13.654965 INFO buffer_manager] Allocated weights buffer at (16140292096, 132120576) [2026-04-08 07:44:13.654966 INFO buffer_manager] Allocated weights buffer at (16272412672, 57344) [2026-04-08 07:44:13.654968 INFO buffer_manager] Allocated weights buffer at (16272470016, 132120576) [2026-04-08 07:44:13.654969 INFO buffer_manager] Allocated weights buffer at (16404590592, 57344) [2026-04-08 07:44:13.654971 INFO buffer_manager] Allocated weights buffer at (16404647936, 0) [2026-04-08 07:44:13.654973 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=43, cache_slot=43) planned desc only [2026-04-08 07:44:13.691196 INFO buffer_manager] Allocated weights buffer at (16404647936, 0) [2026-04-08 07:44:13.691215 INFO buffer_manager] Allocated weights buffer at (16404647936, 132120576) [2026-04-08 07:44:13.691217 INFO buffer_manager] Allocated weights buffer at (16536768512, 57344) [2026-04-08 07:44:13.691218 INFO buffer_manager] Allocated weights buffer at (16536825856, 132120576) [2026-04-08 07:44:13.691220 INFO buffer_manager] Allocated weights buffer at (16668946432, 57344) [2026-04-08 07:44:13.691222 INFO buffer_manager] Allocated weights buffer at (16669003776, 132120576) [2026-04-08 07:44:13.691223 INFO buffer_manager] Allocated weights buffer at (16801124352, 57344) [2026-04-08 07:44:13.691225 INFO buffer_manager] Allocated weights buffer at (16801181696, 0) [2026-04-08 07:44:13.691227 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=44, cache_slot=44) planned desc only [2026-04-08 07:44:13.727288 INFO buffer_manager] Allocated weights buffer at (16801181696, 0) [2026-04-08 07:44:13.727301 INFO buffer_manager] Allocated weights buffer at (16801181696, 132120576) [2026-04-08 07:44:13.727303 INFO buffer_manager] Allocated weights buffer at (16933302272, 57344) [2026-04-08 07:44:13.727305 INFO buffer_manager] Allocated weights buffer at (16933359616, 132120576) [2026-04-08 07:44:13.727306 INFO buffer_manager] Allocated weights buffer at (17065480192, 57344) [2026-04-08 07:44:13.727308 INFO buffer_manager] Allocated weights buffer at (17065537536, 132120576) [2026-04-08 07:44:13.727309 INFO buffer_manager] Allocated weights buffer at (17197658112, 57344) [2026-04-08 07:44:13.727310 INFO buffer_manager] Allocated weights buffer at (17197715456, 0) [2026-04-08 07:44:13.727312 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=45, cache_slot=45) planned desc only [2026-04-08 07:44:13.763471 INFO buffer_manager] Allocated weights buffer at (17197715456, 0) [2026-04-08 07:44:13.763486 INFO buffer_manager] Allocated weights buffer at (17197715456, 132120576) [2026-04-08 07:44:13.763488 INFO buffer_manager] Allocated weights buffer at (17329836032, 57344) [2026-04-08 07:44:13.763489 INFO buffer_manager] Allocated weights buffer at (17329893376, 132120576) [2026-04-08 07:44:13.763491 INFO buffer_manager] Allocated weights buffer at (17462013952, 57344) [2026-04-08 07:44:13.763492 INFO buffer_manager] Allocated weights buffer at (17462071296, 132120576) [2026-04-08 07:44:13.763494 INFO buffer_manager] Allocated weights buffer at (17594191872, 57344) [2026-04-08 07:44:13.763495 INFO buffer_manager] Allocated weights buffer at (17594249216, 0) [2026-04-08 07:44:13.763498 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=46, cache_slot=46) planned desc only [2026-04-08 07:44:13.799620 INFO buffer_manager] Allocated weights buffer at (17594249216, 0) [2026-04-08 07:44:13.799634 INFO buffer_manager] Allocated weights buffer at (17594249216, 132120576) [2026-04-08 07:44:13.799640 INFO buffer_manager] Allocated weights buffer at (17726369792, 57344) [2026-04-08 07:44:13.799642 INFO buffer_manager] Allocated weights buffer at (17726427136, 132120576) [2026-04-08 07:44:13.799644 INFO buffer_manager] Allocated weights buffer at (17858547712, 57344) [2026-04-08 07:44:13.799646 INFO buffer_manager] Allocated weights buffer at (17858605056, 132120576) [2026-04-08 07:44:13.799648 INFO buffer_manager] Allocated weights buffer at (17990725632, 57344) [2026-04-08 07:44:13.799650 INFO buffer_manager] Allocated weights buffer at (17990782976, 0) [2026-04-08 07:44:13.799651 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=47, cache_slot=47) planned desc only [2026-04-08 07:44:13.835840 INFO buffer_manager] Allocated weights buffer at (17990782976, 0) [2026-04-08 07:44:13.835853 INFO buffer_manager] Allocated weights buffer at (17990782976, 132120576) [2026-04-08 07:44:13.835855 INFO buffer_manager] Allocated weights buffer at (18122903552, 57344) [2026-04-08 07:44:13.835857 INFO buffer_manager] Allocated weights buffer at (18122960896, 132120576) [2026-04-08 07:44:13.835858 INFO buffer_manager] Allocated weights buffer at (18255081472, 57344) [2026-04-08 07:44:13.835860 INFO buffer_manager] Allocated weights buffer at (18255138816, 132120576) [2026-04-08 07:44:13.835861 INFO buffer_manager] Allocated weights buffer at (18387259392, 57344) [2026-04-08 07:44:13.835863 INFO buffer_manager] Allocated weights buffer at (18387316736, 0) [2026-04-08 07:44:13.835865 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=48, cache_slot=48) planned desc only [2026-04-08 07:44:13.871980 INFO buffer_manager] Allocated weights buffer at (18387316736, 0) [2026-04-08 07:44:13.871994 INFO buffer_manager] Allocated weights buffer at (18387316736, 132120576) [2026-04-08 07:44:13.871996 INFO buffer_manager] Allocated weights buffer at (18519437312, 57344) [2026-04-08 07:44:13.871998 INFO buffer_manager] Allocated weights buffer at (18519494656, 132120576) [2026-04-08 07:44:13.871999 INFO buffer_manager] Allocated weights buffer at (18651615232, 57344) [2026-04-08 07:44:13.872001 INFO buffer_manager] Allocated weights buffer at (18651672576, 132120576) [2026-04-08 07:44:13.872003 INFO buffer_manager] Allocated weights buffer at (18783793152, 57344) [2026-04-08 07:44:13.872004 INFO buffer_manager] Allocated weights buffer at (18783850496, 0) [2026-04-08 07:44:13.872006 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=49, cache_slot=49) planned desc only [2026-04-08 07:44:13.908079 INFO buffer_manager] Allocated weights buffer at (18783850496, 0) [2026-04-08 07:44:13.908093 INFO buffer_manager] Allocated weights buffer at (18783850496, 132120576) [2026-04-08 07:44:13.908095 INFO buffer_manager] Allocated weights buffer at (18915971072, 57344) [2026-04-08 07:44:13.908101 INFO buffer_manager] Allocated weights buffer at (18916028416, 132120576) [2026-04-08 07:44:13.908103 INFO buffer_manager] Allocated weights buffer at (19048148992, 57344) [2026-04-08 07:44:13.908105 INFO buffer_manager] Allocated weights buffer at (19048206336, 132120576) [2026-04-08 07:44:13.908107 INFO buffer_manager] Allocated weights buffer at (19180326912, 57344) [2026-04-08 07:44:13.908109 INFO buffer_manager] Allocated weights buffer at (19180384256, 0) [2026-04-08 07:44:13.908111 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=50, cache_slot=50) planned desc only [2026-04-08 07:44:13.944197 INFO buffer_manager] Allocated weights buffer at (19180384256, 0) [2026-04-08 07:44:13.944210 INFO buffer_manager] Allocated weights buffer at (19180384256, 132120576) [2026-04-08 07:44:13.944212 INFO buffer_manager] Allocated weights buffer at (19312504832, 57344) [2026-04-08 07:44:13.944214 INFO buffer_manager] Allocated weights buffer at (19312562176, 132120576) [2026-04-08 07:44:13.944216 INFO buffer_manager] Allocated weights buffer at (19444682752, 57344) [2026-04-08 07:44:13.944217 INFO buffer_manager] Allocated weights buffer at (19444740096, 132120576) [2026-04-08 07:44:13.944223 INFO buffer_manager] Allocated weights buffer at (19576860672, 57344) [2026-04-08 07:44:13.944225 INFO buffer_manager] Allocated weights buffer at (19576918016, 0) [2026-04-08 07:44:13.944226 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=51, cache_slot=51) planned desc only [2026-04-08 07:44:13.980345 INFO buffer_manager] Allocated weights buffer at (19576918016, 0) [2026-04-08 07:44:13.980358 INFO buffer_manager] Allocated weights buffer at (19576918016, 132120576) [2026-04-08 07:44:13.980360 INFO buffer_manager] Allocated weights buffer at (19709038592, 57344) [2026-04-08 07:44:13.980362 INFO buffer_manager] Allocated weights buffer at (19709095936, 132120576) [2026-04-08 07:44:13.980363 INFO buffer_manager] Allocated weights buffer at (19841216512, 57344) [2026-04-08 07:44:13.980365 INFO buffer_manager] Allocated weights buffer at (19841273856, 132120576) [2026-04-08 07:44:13.980366 INFO buffer_manager] Allocated weights buffer at (19973394432, 57344) [2026-04-08 07:44:13.980368 INFO buffer_manager] Allocated weights buffer at (19973451776, 0) [2026-04-08 07:44:13.980370 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=52, cache_slot=52) planned desc only [2026-04-08 07:44:14.016732 INFO buffer_manager] Allocated weights buffer at (19973451776, 0) [2026-04-08 07:44:14.016746 INFO buffer_manager] Allocated weights buffer at (19973451776, 132120576) [2026-04-08 07:44:14.016748 INFO buffer_manager] Allocated weights buffer at (20105572352, 57344) [2026-04-08 07:44:14.016749 INFO buffer_manager] Allocated weights buffer at (20105629696, 132120576) [2026-04-08 07:44:14.016751 INFO buffer_manager] Allocated weights buffer at (20237750272, 57344) [2026-04-08 07:44:14.016753 INFO buffer_manager] Allocated weights buffer at (20237807616, 132120576) [2026-04-08 07:44:14.016754 INFO buffer_manager] Allocated weights buffer at (20369928192, 57344) [2026-04-08 07:44:14.016756 INFO buffer_manager] Allocated weights buffer at (20369985536, 0) [2026-04-08 07:44:14.016758 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=53, cache_slot=53) planned desc only [2026-04-08 07:44:14.053006 INFO buffer_manager] Allocated weights buffer at (20369985536, 0) [2026-04-08 07:44:14.053020 INFO buffer_manager] Allocated weights buffer at (20369985536, 132120576) [2026-04-08 07:44:14.053022 INFO buffer_manager] Allocated weights buffer at (20502106112, 57344) [2026-04-08 07:44:14.053024 INFO buffer_manager] Allocated weights buffer at (20502163456, 132120576) [2026-04-08 07:44:14.053025 INFO buffer_manager] Allocated weights buffer at (20634284032, 57344) [2026-04-08 07:44:14.053027 INFO buffer_manager] Allocated weights buffer at (20634341376, 132120576) [2026-04-08 07:44:14.053028 INFO buffer_manager] Allocated weights buffer at (20766461952, 57344) [2026-04-08 07:44:14.053030 INFO buffer_manager] Allocated weights buffer at (20766519296, 0) [2026-04-08 07:44:14.053031 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=54, cache_slot=54) planned desc only [2026-04-08 07:44:14.089549 INFO buffer_manager] Allocated weights buffer at (20766519296, 0) [2026-04-08 07:44:14.089562 INFO buffer_manager] Allocated weights buffer at (20766519296, 132120576) [2026-04-08 07:44:14.089564 INFO buffer_manager] Allocated weights buffer at (20898639872, 57344) [2026-04-08 07:44:14.089565 INFO buffer_manager] Allocated weights buffer at (20898697216, 132120576) [2026-04-08 07:44:14.089567 INFO buffer_manager] Allocated weights buffer at (21030817792, 57344) [2026-04-08 07:44:14.089568 INFO buffer_manager] Allocated weights buffer at (21030875136, 132120576) [2026-04-08 07:44:14.089570 INFO buffer_manager] Allocated weights buffer at (21162995712, 57344) [2026-04-08 07:44:14.089572 INFO buffer_manager] Allocated weights buffer at (21163053056, 0) [2026-04-08 07:44:14.089574 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=55, cache_slot=55) planned desc only [2026-04-08 07:44:14.125907 INFO buffer_manager] Allocated weights buffer at (21163053056, 0) [2026-04-08 07:44:14.125924 INFO buffer_manager] Allocated weights buffer at (21163053056, 132120576) [2026-04-08 07:44:14.125926 INFO buffer_manager] Allocated weights buffer at (21295173632, 57344) [2026-04-08 07:44:14.125928 INFO buffer_manager] Allocated weights buffer at (21295230976, 132120576) [2026-04-08 07:44:14.125929 INFO buffer_manager] Allocated weights buffer at (21427351552, 57344) [2026-04-08 07:44:14.125932 INFO buffer_manager] Allocated weights buffer at (21427408896, 132120576) [2026-04-08 07:44:14.125933 INFO buffer_manager] Allocated weights buffer at (21559529472, 57344) [2026-04-08 07:44:14.125936 INFO buffer_manager] Allocated weights buffer at (21559586816, 0) [2026-04-08 07:44:14.125938 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=56, cache_slot=56) planned desc only [2026-04-08 07:44:14.162432 INFO buffer_manager] Allocated weights buffer at (21559586816, 0) [2026-04-08 07:44:14.162444 INFO buffer_manager] Allocated weights buffer at (21559586816, 132120576) [2026-04-08 07:44:14.162447 INFO buffer_manager] Allocated weights buffer at (21691707392, 57344) [2026-04-08 07:44:14.162448 INFO buffer_manager] Allocated weights buffer at (21691764736, 132120576) [2026-04-08 07:44:14.162450 INFO buffer_manager] Allocated weights buffer at (21823885312, 57344) [2026-04-08 07:44:14.162451 INFO buffer_manager] Allocated weights buffer at (21823942656, 132120576) [2026-04-08 07:44:14.162453 INFO buffer_manager] Allocated weights buffer at (21956063232, 57344) [2026-04-08 07:44:14.162454 INFO buffer_manager] Allocated weights buffer at (21956120576, 0) [2026-04-08 07:44:14.162455 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=57, cache_slot=57) planned desc only [2026-04-08 07:44:14.198742 INFO buffer_manager] Allocated weights buffer at (21956120576, 0) [2026-04-08 07:44:14.198755 INFO buffer_manager] Allocated weights buffer at (21956120576, 132120576) [2026-04-08 07:44:14.198757 INFO buffer_manager] Allocated weights buffer at (22088241152, 57344) [2026-04-08 07:44:14.198758 INFO buffer_manager] Allocated weights buffer at (22088298496, 132120576) [2026-04-08 07:44:14.198760 INFO buffer_manager] Allocated weights buffer at (22220419072, 57344) [2026-04-08 07:44:14.198762 INFO buffer_manager] Allocated weights buffer at (22220476416, 132120576) [2026-04-08 07:44:14.198763 INFO buffer_manager] Allocated weights buffer at (22352596992, 57344) [2026-04-08 07:44:14.198765 INFO buffer_manager] Allocated weights buffer at (22352654336, 0) [2026-04-08 07:44:14.198767 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=58, cache_slot=58) planned desc only [2026-04-08 07:44:14.235081 INFO buffer_manager] Allocated weights buffer at (22352654336, 0) [2026-04-08 07:44:14.235094 INFO buffer_manager] Allocated weights buffer at (22352654336, 132120576) [2026-04-08 07:44:14.235096 INFO buffer_manager] Allocated weights buffer at (22484774912, 57344) [2026-04-08 07:44:14.235097 INFO buffer_manager] Allocated weights buffer at (22484832256, 132120576) [2026-04-08 07:44:14.235099 INFO buffer_manager] Allocated weights buffer at (22616952832, 57344) [2026-04-08 07:44:14.235100 INFO buffer_manager] Allocated weights buffer at (22617010176, 132120576) [2026-04-08 07:44:14.235102 INFO buffer_manager] Allocated weights buffer at (22749130752, 57344) [2026-04-08 07:44:14.235103 INFO buffer_manager] Allocated weights buffer at (22749188096, 0) [2026-04-08 07:44:14.235105 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=59, cache_slot=59) planned desc only [2026-04-08 07:44:14.271728 INFO buffer_manager] Allocated weights buffer at (22749188096, 0) [2026-04-08 07:44:14.271741 INFO buffer_manager] Allocated weights buffer at (22749188096, 132120576) [2026-04-08 07:44:14.271743 INFO buffer_manager] Allocated weights buffer at (22881308672, 57344) [2026-04-08 07:44:14.271744 INFO buffer_manager] Allocated weights buffer at (22881366016, 132120576) [2026-04-08 07:44:14.271746 INFO buffer_manager] Allocated weights buffer at (23013486592, 57344) [2026-04-08 07:44:14.271748 INFO buffer_manager] Allocated weights buffer at (23013543936, 132120576) [2026-04-08 07:44:14.271753 INFO buffer_manager] Allocated weights buffer at (23145664512, 57344) [2026-04-08 07:44:14.271755 INFO buffer_manager] Allocated weights buffer at (23145721856, 0) [2026-04-08 07:44:14.271756 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=60, cache_slot=60) planned desc only [2026-04-08 07:44:14.634793 INFO buffer_manager] Allocated weights buffer at (23145721856, 0) [2026-04-08 07:44:14.634815 INFO buffer_manager] Allocated weights buffer at (23145721856, 132120576) [2026-04-08 07:44:14.634817 INFO buffer_manager] Allocated weights buffer at (23277842432, 57344) [2026-04-08 07:44:14.634819 INFO buffer_manager] Allocated weights buffer at (23277899776, 132120576) [2026-04-08 07:44:14.634820 INFO buffer_manager] Allocated weights buffer at (23410020352, 57344) [2026-04-08 07:44:14.634822 INFO buffer_manager] Allocated weights buffer at (23410077696, 132120576) [2026-04-08 07:44:14.634824 INFO buffer_manager] Allocated weights buffer at (23542198272, 57344) [2026-04-08 07:44:14.634825 INFO buffer_manager] Allocated weights buffer at (23542255616, 0) [2026-04-08 07:44:14.634827 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=61, cache_slot=61) planned desc only [2026-04-08 07:44:23.325809 INFO fp8_dpdk_common] fp9 fast path forced on by default in the current kernel build [2026-04-08 07:44:23.342421 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=70, expert_tiles=70, avg_tile_batch=1.26, prepare=180.435µs, send=3.490491ms, judge_wait=11.263989ms, fetch=945.465µs, reduce=107ns; duck time-ns stats: p50=11.048734ms, p90=11.067102ms, max=11.079534ms; kernel_model: matmul=0.242221 GFLOP (21.862 GFLOP/s @ duck_max), param_stream=0.096338G (8.695 Gparam/s @ duck_max), weight_stream=103.404 MiB (9.786 GB/s @ duck_max) [2026-04-08 07:44:23.358360 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=66, expert_tiles=66, avg_tile_batch=1.33, prepare=142.059µs, send=613.376µs, judge_wait=10.580287ms, fetch=675.084µs, reduce=20ns; duck time-ns stats: p50=10.402533ms, p90=10.430037ms, max=10.458705ms; kernel_model: matmul=0.242221 GFLOP (23.160 GFLOP/s @ duck_max), param_stream=0.090833G (8.685 Gparam/s @ duck_max), weight_stream=97.495 MiB (9.775 GB/s @ duck_max) [2026-04-08 07:44:23.373760 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=70, expert_tiles=70, avg_tile_batch=1.26, prepare=37.511µs, send=612.944µs, judge_wait=11.882228ms, fetch=702.228µs, reduce=20ns; duck time-ns stats: p50=11.725403ms, p90=11.758197ms, max=11.766354ms; kernel_model: matmul=0.242221 GFLOP (20.586 GFLOP/s @ duck_max), param_stream=0.096338G (8.188 Gparam/s @ duck_max), weight_stream=103.404 MiB (9.215 GB/s @ duck_max) [2026-04-08 07:44:23.387852 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=69, expert_tiles=69, avg_tile_batch=1.28, prepare=26.681µs, send=612.157µs, judge_wait=11.109548ms, fetch=639.813µs, reduce=22ns; duck time-ns stats: p50=10.95144ms, p90=10.96406ms, max=10.985116ms; kernel_model: matmul=0.242221 GFLOP (22.050 GFLOP/s @ duck_max), param_stream=0.094962G (8.645 Gparam/s @ duck_max), weight_stream=101.927 MiB (9.729 GB/s @ duck_max) [2026-04-08 07:44:23.401445 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=68, expert_tiles=68, avg_tile_batch=1.29, prepare=31.05µs, send=611.528µs, judge_wait=10.540467ms, fetch=638.462µs, reduce=20ns; duck time-ns stats: p50=10.354015ms, p90=10.388189ms, max=10.432984ms; kernel_model: matmul=0.242221 GFLOP (23.217 GFLOP/s @ duck_max), param_stream=0.093585G (8.970 Gparam/s @ duck_max), weight_stream=100.450 MiB (10.096 GB/s @ duck_max) [2026-04-08 07:44:23.415817 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=66, expert_tiles=66, avg_tile_batch=1.33, prepare=63.322µs, send=613.264µs, judge_wait=11.317764ms, fetch=637.426µs, reduce=21ns; duck time-ns stats: p50=11.157006ms, p90=11.179627ms, max=11.190749ms; kernel_model: matmul=0.242221 GFLOP (21.645 GFLOP/s @ duck_max), param_stream=0.090833G (8.117 Gparam/s @ duck_max), weight_stream=97.495 MiB (9.135 GB/s @ duck_max) [2026-04-08 07:44:23.429392 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=64, expert_tiles=64, avg_tile_batch=1.38, prepare=16.139µs, send=612.262µs, judge_wait=10.584665ms, fetch=637.661µs, reduce=23ns; duck time-ns stats: p50=10.435172ms, p90=10.449276ms, max=10.478732ms; kernel_model: matmul=0.242221 GFLOP (23.115 GFLOP/s @ duck_max), param_stream=0.088080G (8.406 Gparam/s @ duck_max), weight_stream=94.541 MiB (9.460 GB/s @ duck_max) [2026-04-08 07:44:23.442826 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=60, expert_tiles=62, avg_tile_batch=1.42, prepare=16.61µs, send=611.944µs, judge_wait=10.418674ms, fetch=639.799µs, reduce=19ns; duck time-ns stats: p50=10.264932ms, p90=10.283051ms, max=10.293531ms; kernel_model: matmul=0.242221 GFLOP (23.531 GFLOP/s @ duck_max), param_stream=0.085328G (8.289 Gparam/s @ duck_max), weight_stream=91.587 MiB (9.330 GB/s @ duck_max) [2026-04-08 07:44:23.456501 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=65, expert_tiles=65, avg_tile_batch=1.35, prepare=18.215µs, send=612.455µs, judge_wait=10.639096ms, fetch=636.326µs, reduce=20ns; duck time-ns stats: p50=10.476894ms, p90=10.504292ms, max=10.517979ms; kernel_model: matmul=0.242221 GFLOP (23.029 GFLOP/s @ duck_max), param_stream=0.089457G (8.505 Gparam/s @ duck_max), weight_stream=96.018 MiB (9.572 GB/s @ duck_max) [2026-04-08 07:44:23.470371 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=62, expert_tiles=63, avg_tile_batch=1.40, prepare=15.81µs, send=612.219µs, judge_wait=10.887966ms, fetch=637.691µs, reduce=19ns; duck time-ns stats: p50=10.734302ms, p90=10.753886ms, max=10.775335ms; kernel_model: matmul=0.242221 GFLOP (22.479 GFLOP/s @ duck_max), param_stream=0.086704G (8.047 Gparam/s @ duck_max), weight_stream=93.064 MiB (9.056 GB/s @ duck_max) [2026-04-08 07:44:23.483363 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=49, expert_tiles=52, avg_tile_batch=1.69, prepare=17.258µs, send=612.465µs, judge_wait=10.034557ms, fetch=635.01µs, reduce=20ns; duck time-ns stats: p50=9.893147ms, p90=9.927718ms, max=9.930436ms; kernel_model: matmul=0.242221 GFLOP (24.392 GFLOP/s @ duck_max), param_stream=0.071565G (7.207 Gparam/s @ duck_max), weight_stream=76.815 MiB (8.111 GB/s @ duck_max) [2026-04-08 07:44:23.496604 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=50, expert_tiles=52, avg_tile_batch=1.69, prepare=15.463µs, send=612.888µs, judge_wait=10.25419ms, fetch=638.35µs, reduce=19ns; duck time-ns stats: p50=10.099388ms, p90=10.13132ms, max=10.146524ms; kernel_model: matmul=0.242221 GFLOP (23.872 GFLOP/s @ duck_max), param_stream=0.071565G (7.053 Gparam/s @ duck_max), weight_stream=76.815 MiB (7.938 GB/s @ duck_max) [2026-04-08 07:44:23.508746 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=48, expert_tiles=51, avg_tile_batch=1.73, prepare=15.688µs, send=612.863µs, judge_wait=9.18678ms, fetch=635.377µs, reduce=20ns; duck time-ns stats: p50=9.065895ms, p90=9.084166ms, max=9.086981ms; kernel_model: matmul=0.242221 GFLOP (26.656 GFLOP/s @ duck_max), param_stream=0.070189G (7.724 Gparam/s @ duck_max), weight_stream=75.337 MiB (8.693 GB/s @ duck_max) [2026-04-08 07:44:23.521322 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=49, expert_tiles=50, avg_tile_batch=1.76, prepare=15.573µs, send=610.392µs, judge_wait=9.612609ms, fetch=634.042µs, reduce=19ns; duck time-ns stats: p50=9.482855ms, p90=9.50185ms, max=9.512374ms; kernel_model: matmul=0.242221 GFLOP (25.464 GFLOP/s @ duck_max), param_stream=0.068813G (7.234 Gparam/s @ duck_max), weight_stream=73.860 MiB (8.142 GB/s @ duck_max) [2026-04-08 07:44:23.533723 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=50, expert_tiles=51, avg_tile_batch=1.73, prepare=15.726µs, send=610.874µs, judge_wait=9.423227ms, fetch=643.793µs, reduce=20ns; duck time-ns stats: p50=9.245059ms, p90=9.271587ms, max=9.311395ms; kernel_model: matmul=0.242221 GFLOP (26.013 GFLOP/s @ duck_max), param_stream=0.070189G (7.538 Gparam/s @ duck_max), weight_stream=75.337 MiB (8.484 GB/s @ duck_max) [2026-04-08 07:44:23.546565 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=54, expert_tiles=55, avg_tile_batch=1.60, prepare=15.628µs, send=612.452µs, judge_wait=9.887146ms, fetch=636.867µs, reduce=20ns; duck time-ns stats: p50=9.743471ms, p90=9.754918ms, max=9.769571ms; kernel_model: matmul=0.242221 GFLOP (24.793 GFLOP/s @ duck_max), param_stream=0.075694G (7.748 Gparam/s @ duck_max), weight_stream=81.246 MiB (8.720 GB/s @ duck_max) [2026-04-08 07:44:23.559717 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=56, expert_tiles=57, avg_tile_batch=1.54, prepare=15.754µs, send=609.955µs, judge_wait=10.158648ms, fetch=639.232µs, reduce=13ns; duck time-ns stats: p50=9.986458ms, p90=10.024045ms, max=10.047964ms; kernel_model: matmul=0.242221 GFLOP (24.106 GFLOP/s @ duck_max), param_stream=0.078447G (7.807 Gparam/s @ duck_max), weight_stream=84.201 MiB (8.787 GB/s @ duck_max) [2026-04-08 07:44:23.572549 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=55, expert_tiles=56, avg_tile_batch=1.57, prepare=15.962µs, send=613.197µs, judge_wait=9.861991ms, fetch=635.959µs, reduce=23ns; duck time-ns stats: p50=9.730446ms, p90=9.743661ms, max=9.748127ms; kernel_model: matmul=0.242221 GFLOP (24.848 GFLOP/s @ duck_max), param_stream=0.077070G (7.906 Gparam/s @ duck_max), weight_stream=82.723 MiB (8.898 GB/s @ duck_max) [2026-04-08 07:44:23.586074 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=47, expert_tiles=50, avg_tile_batch=1.76, prepare=15.512µs, send=611.675µs, judge_wait=10.53299ms, fetch=638.909µs, reduce=19ns; duck time-ns stats: p50=10.384153ms, p90=10.403925ms, max=10.415667ms; kernel_model: matmul=0.242221 GFLOP (23.255 GFLOP/s @ duck_max), param_stream=0.068813G (6.607 Gparam/s @ duck_max), weight_stream=73.860 MiB (7.436 GB/s @ duck_max) [2026-04-08 07:44:23.599818 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=50, expert_tiles=53, avg_tile_batch=1.66, prepare=16.033µs, send=612.103µs, judge_wait=10.761915ms, fetch=637.762µs, reduce=20ns; duck time-ns stats: p50=10.58969ms, p90=10.62114ms, max=10.636958ms; kernel_model: matmul=0.242221 GFLOP (22.772 GFLOP/s @ duck_max), param_stream=0.072942G (6.857 Gparam/s @ duck_max), weight_stream=78.292 MiB (7.718 GB/s @ duck_max) [2026-04-08 07:44:23.612081 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=49, expert_tiles=51, avg_tile_batch=1.73, prepare=15.79µs, send=612.86µs, judge_wait=9.290817ms, fetch=638.015µs, reduce=21ns; duck time-ns stats: p50=9.167891ms, p90=9.177765ms, max=9.190391ms; kernel_model: matmul=0.242221 GFLOP (26.356 GFLOP/s @ duck_max), param_stream=0.070189G (7.637 Gparam/s @ duck_max), weight_stream=75.337 MiB (8.596 GB/s @ duck_max) [2026-04-08 07:44:23.626173 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=41, expert_tiles=46, avg_tile_batch=1.91, prepare=16.029µs, send=613.129µs, judge_wait=11.102627ms, fetch=639.62µs, reduce=20ns; duck time-ns stats: p50=10.963265ms, p90=10.97447ms, max=10.987878ms; kernel_model: matmul=0.242221 GFLOP (22.044 GFLOP/s @ duck_max), param_stream=0.063308G (5.762 Gparam/s @ duck_max), weight_stream=67.951 MiB (6.485 GB/s @ duck_max) [2026-04-08 07:44:23.638261 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=44, expert_tiles=47, avg_tile_batch=1.87, prepare=18.889µs, send=612.898µs, judge_wait=9.090592ms, fetch=636.084µs, reduce=20ns; duck time-ns stats: p50=8.947347ms, p90=8.96385ms, max=8.970291ms; kernel_model: matmul=0.242221 GFLOP (27.003 GFLOP/s @ duck_max), param_stream=0.064684G (7.211 Gparam/s @ duck_max), weight_stream=69.429 MiB (8.116 GB/s @ duck_max) [2026-04-08 07:44:23.651609 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=58, expert_tiles=60, avg_tile_batch=1.47, prepare=61.279µs, send=612.582µs, judge_wait=10.227742ms, fetch=637.59µs, reduce=19ns; duck time-ns stats: p50=10.085922ms, p90=10.104567ms, max=10.106227ms; kernel_model: matmul=0.242221 GFLOP (23.968 GFLOP/s @ duck_max), param_stream=0.082575G (8.171 Gparam/s @ duck_max), weight_stream=88.632 MiB (9.196 GB/s @ duck_max) [2026-04-08 07:44:23.663878 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=43, expert_tiles=45, avg_tile_batch=1.96, prepare=15.73µs, send=612.33µs, judge_wait=9.302803ms, fetch=635.216µs, reduce=21ns; duck time-ns stats: p50=9.138402ms, p90=9.163367ms, max=9.187304ms; kernel_model: matmul=0.242221 GFLOP (26.365 GFLOP/s @ duck_max), param_stream=0.061932G (6.741 Gparam/s @ duck_max), weight_stream=66.474 MiB (7.587 GB/s @ duck_max) [2026-04-08 07:44:23.675480 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=43, expert_tiles=45, avg_tile_batch=1.96, prepare=15.662µs, send=616.289µs, judge_wait=8.64557ms, fetch=634.957µs, reduce=20ns; duck time-ns stats: p50=8.496192ms, p90=8.509186ms, max=8.528261ms; kernel_model: matmul=0.242221 GFLOP (28.402 GFLOP/s @ duck_max), param_stream=0.061932G (7.262 Gparam/s @ duck_max), weight_stream=66.474 MiB (8.173 GB/s @ duck_max) [2026-04-08 07:44:23.688169 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=52, expert_tiles=56, avg_tile_batch=1.57, prepare=15.379µs, send=612.16µs, judge_wait=9.664755ms, fetch=635.683µs, reduce=20ns; duck time-ns stats: p50=9.525016ms, p90=9.54293ms, max=9.548824ms; kernel_model: matmul=0.242221 GFLOP (25.367 GFLOP/s @ duck_max), param_stream=0.077070G (8.071 Gparam/s @ duck_max), weight_stream=82.723 MiB (9.084 GB/s @ duck_max) [2026-04-08 07:44:23.701820 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=51, expert_tiles=53, avg_tile_batch=1.66, prepare=15.58µs, send=613.307µs, judge_wait=10.670018ms, fetch=639.738µs, reduce=20ns; duck time-ns stats: p50=10.511894ms, p90=10.544415ms, max=10.555102ms; kernel_model: matmul=0.242221 GFLOP (22.948 GFLOP/s @ duck_max), param_stream=0.072942G (6.911 Gparam/s @ duck_max), weight_stream=78.292 MiB (7.778 GB/s @ duck_max) [2026-04-08 07:44:23.713927 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=49, expert_tiles=49, avg_tile_batch=1.80, prepare=16.046µs, send=612.265µs, judge_wait=9.087439ms, fetch=636.431µs, reduce=20ns; duck time-ns stats: p50=8.932416ms, p90=8.957174ms, max=8.972151ms; kernel_model: matmul=0.242221 GFLOP (26.997 GFLOP/s @ duck_max), param_stream=0.067437G (7.516 Gparam/s @ duck_max), weight_stream=72.383 MiB (8.459 GB/s @ duck_max) [2026-04-08 07:44:23.726296 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=47, expert_tiles=51, avg_tile_batch=1.73, prepare=16.121µs, send=612.758µs, judge_wait=9.388854ms, fetch=639.193µs, reduce=20ns; duck time-ns stats: p50=9.201976ms, p90=9.22519ms, max=9.269712ms; kernel_model: matmul=0.242221 GFLOP (26.130 GFLOP/s @ duck_max), param_stream=0.070189G (7.572 Gparam/s @ duck_max), weight_stream=75.337 MiB (8.522 GB/s @ duck_max) [2026-04-08 07:44:23.738759 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=50, expert_tiles=52, avg_tile_batch=1.69, prepare=16.32µs, send=613.202µs, judge_wait=9.48117ms, fetch=637.457µs, reduce=19ns; duck time-ns stats: p50=9.334368ms, p90=9.357186ms, max=9.36341ms; kernel_model: matmul=0.242221 GFLOP (25.869 GFLOP/s @ duck_max), param_stream=0.071565G (7.643 Gparam/s @ duck_max), weight_stream=76.815 MiB (8.602 GB/s @ duck_max) [2026-04-08 07:44:23.751295 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=46, expert_tiles=48, avg_tile_batch=1.83, prepare=16.097µs, send=612.421µs, judge_wait=9.523344ms, fetch=634.611µs, reduce=28ns; duck time-ns stats: p50=9.374733ms, p90=9.394265ms, max=9.421053ms; kernel_model: matmul=0.242221 GFLOP (25.711 GFLOP/s @ duck_max), param_stream=0.066060G (7.012 Gparam/s @ duck_max), weight_stream=70.906 MiB (7.892 GB/s @ duck_max) [2026-04-08 07:44:23.764138 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=56, expert_tiles=58, avg_tile_batch=1.52, prepare=17.244µs, send=611.703µs, judge_wait=9.877121ms, fetch=636.802µs, reduce=19ns; duck time-ns stats: p50=9.742185ms, p90=9.765333ms, max=9.773752ms; kernel_model: matmul=0.242221 GFLOP (24.783 GFLOP/s @ duck_max), param_stream=0.079823G (8.167 Gparam/s @ duck_max), weight_stream=85.678 MiB (9.192 GB/s @ duck_max) [2026-04-08 07:44:23.777467 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=56, expert_tiles=59, avg_tile_batch=1.49, prepare=15.881µs, send=611.78µs, judge_wait=10.328293ms, fetch=638.31µs, reduce=20ns; duck time-ns stats: p50=10.171162ms, p90=10.199428ms, max=10.218562ms; kernel_model: matmul=0.242221 GFLOP (23.704 GFLOP/s @ duck_max), param_stream=0.081199G (7.946 Gparam/s @ duck_max), weight_stream=87.155 MiB (8.943 GB/s @ duck_max) [2026-04-08 07:44:23.790632 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=50, expert_tiles=54, avg_tile_batch=1.63, prepare=15.749µs, send=612.946µs, judge_wait=10.15278ms, fetch=641.092µs, reduce=19ns; duck time-ns stats: p50=10.02095ms, p90=10.03348ms, max=10.042169ms; kernel_model: matmul=0.242221 GFLOP (24.120 GFLOP/s @ duck_max), param_stream=0.074318G (7.401 Gparam/s @ duck_max), weight_stream=79.769 MiB (8.329 GB/s @ duck_max) [2026-04-08 07:44:23.803992 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=46, expert_tiles=50, avg_tile_batch=1.76, prepare=15.321µs, send=611.558µs, judge_wait=10.391442ms, fetch=638.143µs, reduce=20ns; duck time-ns stats: p50=10.235902ms, p90=10.259849ms, max=10.288044ms; kernel_model: matmul=0.242221 GFLOP (23.544 GFLOP/s @ duck_max), param_stream=0.068813G (6.689 Gparam/s @ duck_max), weight_stream=73.860 MiB (7.528 GB/s @ duck_max) [2026-04-08 07:44:23.818090 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=57, expert_tiles=58, avg_tile_batch=1.52, prepare=16.207µs, send=614.835µs, judge_wait=10.892517ms, fetch=639.492µs, reduce=20ns; duck time-ns stats: p50=10.738298ms, p90=10.759676ms, max=10.771124ms; kernel_model: matmul=0.242221 GFLOP (22.488 GFLOP/s @ duck_max), param_stream=0.079823G (7.411 Gparam/s @ duck_max), weight_stream=85.678 MiB (8.341 GB/s @ duck_max) [2026-04-08 07:44:23.833171 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=52, expert_tiles=55, avg_tile_batch=1.60, prepare=39.339µs, send=613.298µs, judge_wait=10.037271ms, fetch=639.84µs, reduce=20ns; duck time-ns stats: p50=9.879252ms, p90=9.892578ms, max=9.91059ms; kernel_model: matmul=0.242221 GFLOP (24.441 GFLOP/s @ duck_max), param_stream=0.075694G (7.638 Gparam/s @ duck_max), weight_stream=81.246 MiB (8.596 GB/s @ duck_max) [2026-04-08 07:44:23.847701 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=49, expert_tiles=49, avg_tile_batch=1.80, prepare=28.108µs, send=613.115µs, judge_wait=9.649797ms, fetch=637.51µs, reduce=20ns; duck time-ns stats: p50=9.484442ms, p90=9.501322ms, max=9.515821ms; kernel_model: matmul=0.242221 GFLOP (25.455 GFLOP/s @ duck_max), param_stream=0.067437G (7.087 Gparam/s @ duck_max), weight_stream=72.383 MiB (7.976 GB/s @ duck_max) [2026-04-08 07:44:23.860914 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=52, expert_tiles=56, avg_tile_batch=1.57, prepare=45.646µs, send=611.778µs, judge_wait=9.92121ms, fetch=766.202µs, reduce=26ns; duck time-ns stats: p50=9.787683ms, p90=9.803497ms, max=9.815423ms; kernel_model: matmul=0.242221 GFLOP (24.678 GFLOP/s @ duck_max), param_stream=0.077070G (7.852 Gparam/s @ duck_max), weight_stream=82.723 MiB (8.837 GB/s @ duck_max) [2026-04-08 07:44:23.874154 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=55, expert_tiles=58, avg_tile_batch=1.52, prepare=29.571µs, send=612.28µs, judge_wait=10.21515ms, fetch=639.445µs, reduce=19ns; duck time-ns stats: p50=10.054066ms, p90=10.079153ms, max=10.092178ms; kernel_model: matmul=0.242221 GFLOP (24.001 GFLOP/s @ duck_max), param_stream=0.079823G (7.909 Gparam/s @ duck_max), weight_stream=85.678 MiB (8.902 GB/s @ duck_max) [2026-04-08 07:44:23.887619 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=55, expert_tiles=57, avg_tile_batch=1.54, prepare=16.555µs, send=611.469µs, judge_wait=10.484315ms, fetch=638.422µs, reduce=19ns; duck time-ns stats: p50=10.335555ms, p90=10.353185ms, max=10.362051ms; kernel_model: matmul=0.242221 GFLOP (23.376 GFLOP/s @ duck_max), param_stream=0.078447G (7.571 Gparam/s @ duck_max), weight_stream=84.201 MiB (8.521 GB/s @ duck_max) [2026-04-08 07:44:23.900649 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=52, expert_tiles=52, avg_tile_batch=1.69, prepare=15.48µs, send=612.92µs, judge_wait=10.032629ms, fetch=637.499µs, reduce=20ns; duck time-ns stats: p50=9.865661ms, p90=9.901675ms, max=9.922571ms; kernel_model: matmul=0.242221 GFLOP (24.411 GFLOP/s @ duck_max), param_stream=0.071565G (7.212 Gparam/s @ duck_max), weight_stream=76.815 MiB (8.117 GB/s @ duck_max) [2026-04-08 07:44:23.914097 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=56, expert_tiles=58, avg_tile_batch=1.52, prepare=16.289µs, send=612.832µs, judge_wait=10.470071ms, fetch=636.977µs, reduce=19ns; duck time-ns stats: p50=10.344161ms, p90=10.36185ms, max=10.36336ms; kernel_model: matmul=0.242221 GFLOP (23.373 GFLOP/s @ duck_max), param_stream=0.079823G (7.702 Gparam/s @ duck_max), weight_stream=85.678 MiB (8.669 GB/s @ duck_max) [2026-04-08 07:44:23.928144 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=56, expert_tiles=58, avg_tile_batch=1.52, prepare=16.994µs, send=612.056µs, judge_wait=10.960568ms, fetch=640.464µs, reduce=22ns; duck time-ns stats: p50=10.802893ms, p90=10.824739ms, max=10.848588ms; kernel_model: matmul=0.242221 GFLOP (22.327 GFLOP/s @ duck_max), param_stream=0.079823G (7.358 Gparam/s @ duck_max), weight_stream=85.678 MiB (8.281 GB/s @ duck_max) [2026-04-08 07:44:23.941356 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=54, expert_tiles=57, avg_tile_batch=1.54, prepare=16.651µs, send=612.129µs, judge_wait=10.147898ms, fetch=635.815µs, reduce=26ns; duck time-ns stats: p50=10.009605ms, p90=10.024319ms, max=10.039426ms; kernel_model: matmul=0.242221 GFLOP (24.127 GFLOP/s @ duck_max), param_stream=0.078447G (7.814 Gparam/s @ duck_max), weight_stream=84.201 MiB (8.794 GB/s @ duck_max) [2026-04-08 07:44:23.955268 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=63, expert_tiles=63, avg_tile_batch=1.40, prepare=16.479µs, send=616.58µs, judge_wait=10.625195ms, fetch=639.659µs, reduce=21ns; duck time-ns stats: p50=10.47398ms, p90=10.490132ms, max=10.501085ms; kernel_model: matmul=0.242221 GFLOP (23.066 GFLOP/s @ duck_max), param_stream=0.086704G (8.257 Gparam/s @ duck_max), weight_stream=93.064 MiB (9.293 GB/s @ duck_max) [2026-04-08 07:44:23.970458 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=64, expert_tiles=66, avg_tile_batch=1.33, prepare=31.337µs, send=612.068µs, judge_wait=10.546646ms, fetch=647.176µs, reduce=21ns; duck time-ns stats: p50=10.399585ms, p90=10.424263ms, max=10.433607ms; kernel_model: matmul=0.242221 GFLOP (23.215 GFLOP/s @ duck_max), param_stream=0.090833G (8.706 Gparam/s @ duck_max), weight_stream=97.495 MiB (9.798 GB/s @ duck_max) [2026-04-08 07:44:23.984435 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=65, expert_tiles=67, avg_tile_batch=1.31, prepare=20.184µs, send=611.139µs, judge_wait=10.928813ms, fetch=636.834µs, reduce=20ns; duck time-ns stats: p50=10.766872ms, p90=10.793538ms, max=10.804163ms; kernel_model: matmul=0.242221 GFLOP (22.419 GFLOP/s @ duck_max), param_stream=0.092209G (8.535 Gparam/s @ duck_max), weight_stream=98.973 MiB (9.606 GB/s @ duck_max) [2026-04-08 07:44:23.998486 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=58, expert_tiles=59, avg_tile_batch=1.49, prepare=16.084µs, send=612.328µs, judge_wait=11.070463ms, fetch=639.504µs, reduce=20ns; duck time-ns stats: p50=10.928339ms, p90=10.944022ms, max=10.947208ms; kernel_model: matmul=0.242221 GFLOP (22.126 GFLOP/s @ duck_max), param_stream=0.081199G (7.417 Gparam/s @ duck_max), weight_stream=87.155 MiB (8.348 GB/s @ duck_max) [2026-04-08 07:44:24.011484 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=51, expert_tiles=53, avg_tile_batch=1.66, prepare=16.072µs, send=611.311µs, judge_wait=10.037615ms, fetch=637.419µs, reduce=20ns; duck time-ns stats: p50=9.888607ms, p90=9.916257ms, max=9.920284ms; kernel_model: matmul=0.242221 GFLOP (24.417 GFLOP/s @ duck_max), param_stream=0.072942G (7.353 Gparam/s @ duck_max), weight_stream=78.292 MiB (8.275 GB/s @ duck_max) [2026-04-08 07:44:24.025455 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=55, expert_tiles=58, avg_tile_batch=1.52, prepare=16.218µs, send=611.504µs, judge_wait=11.000712ms, fetch=637.514µs, reduce=20ns; duck time-ns stats: p50=10.8559ms, p90=10.874499ms, max=10.888768ms; kernel_model: matmul=0.242221 GFLOP (22.245 GFLOP/s @ duck_max), param_stream=0.079823G (7.331 Gparam/s @ duck_max), weight_stream=85.678 MiB (8.251 GB/s @ duck_max) [2026-04-08 07:44:24.039469 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=54, expert_tiles=55, avg_tile_batch=1.60, prepare=20.465µs, send=612.359µs, judge_wait=11.038934ms, fetch=640.574µs, reduce=19ns; duck time-ns stats: p50=10.892394ms, p90=10.915446ms, max=10.927724ms; kernel_model: matmul=0.242221 GFLOP (22.166 GFLOP/s @ duck_max), param_stream=0.075694G (6.927 Gparam/s @ duck_max), weight_stream=81.246 MiB (7.796 GB/s @ duck_max) [2026-04-08 07:44:24.052293 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=53, expert_tiles=54, avg_tile_batch=1.63, prepare=18.404µs, send=612.192µs, judge_wait=9.826904ms, fetch=639.011µs, reduce=17ns; duck time-ns stats: p50=9.68248ms, p90=9.701283ms, max=9.722572ms; kernel_model: matmul=0.242221 GFLOP (24.913 GFLOP/s @ duck_max), param_stream=0.074318G (7.644 Gparam/s @ duck_max), weight_stream=79.769 MiB (8.603 GB/s @ duck_max) [2026-04-08 07:44:24.065883 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=54, expert_tiles=55, avg_tile_batch=1.60, prepare=16.19µs, send=611.93µs, judge_wait=10.627867ms, fetch=637.584µs, reduce=15ns; duck time-ns stats: p50=10.483444ms, p90=10.505273ms, max=10.518652ms; kernel_model: matmul=0.242221 GFLOP (23.028 GFLOP/s @ duck_max), param_stream=0.075694G (7.196 Gparam/s @ duck_max), weight_stream=81.246 MiB (8.099 GB/s @ duck_max) [2026-04-08 07:44:24.079519 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=56, expert_tiles=57, avg_tile_batch=1.54, prepare=16.264µs, send=612.056µs, judge_wait=10.69113ms, fetch=638.878µs, reduce=19ns; duck time-ns stats: p50=10.540089ms, p90=10.559748ms, max=10.568199ms; kernel_model: matmul=0.242221 GFLOP (22.920 GFLOP/s @ duck_max), param_stream=0.078447G (7.423 Gparam/s @ duck_max), weight_stream=84.201 MiB (8.354 GB/s @ duck_max) [2026-04-08 07:44:24.092636 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=56, expert_tiles=58, avg_tile_batch=1.52, prepare=16.31µs, send=612.69µs, judge_wait=10.138926ms, fetch=637.118µs, reduce=18ns; duck time-ns stats: p50=9.984407ms, p90=9.999896ms, max=10.015742ms; kernel_model: matmul=0.242221 GFLOP (24.184 GFLOP/s @ duck_max), param_stream=0.079823G (7.970 Gparam/s @ duck_max), weight_stream=85.678 MiB (8.970 GB/s @ duck_max) [2026-04-08 07:44:24.106478 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=11, top_k=8, tasks=88, unique_experts=57, expert_tiles=58, avg_tile_batch=1.52, prepare=15.802µs, send=611.977µs, judge_wait=10.882758ms, fetch=639.705µs, reduce=20ns; duck time-ns stats: p50=10.724307ms, p90=10.744127ms, max=10.759269ms; kernel_model: matmul=0.242221 GFLOP (22.513 GFLOP/s @ duck_max), param_stream=0.079823G (7.419 Gparam/s @ duck_max), weight_stream=85.678 MiB (8.350 GB/s @ duck_max) [2026-04-08 07:44:24.146305 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=10, top_k=8, tasks=80, unique_experts=58, expert_tiles=58, avg_tile_batch=1.38, prepare=130.557µs, send=829.803µs, judge_wait=9.65883ms, fetch=627.492µs, reduce=20ns; duck time-ns stats: p50=9.517466ms, p90=9.544486ms, max=9.553125ms; kernel_model: matmul=0.220201 GFLOP (23.050 GFLOP/s @ duck_max), param_stream=0.079823G (8.356 Gparam/s @ duck_max), weight_stream=85.678 MiB (9.404 GB/s @ duck_max) [2026-04-08 07:44:24.154530 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.008358ms; phases: prepare=4.681µs, send=62.007µs, judge_wait=812.865µs, fetch=91.456µs, reduce=20ns, writeback=424ns; duck time-ns stats: p50=726.713µs, p90=733.17µs, max=736.608µs; effective_read: activated_experts=8, params=0.011010G (14.947 Gparam/s @ duck_max), memory=11.818 MiB (16.823 GB/s @ duck_max), judge_gap=76.257µs, judge_ratio=1.104x [2026-04-08 07:44:24.894461 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.554202ms; phases: prepare=5.626µs, send=632.612µs, judge_wait=781.338µs, fetch=96.954µs, reduce=20ns, writeback=488ns; duck time-ns stats: p50=695.441µs, p90=700.207µs, max=703.93µs; effective_read: activated_experts=8, params=0.011010G (15.641 Gparam/s @ duck_max), memory=11.818 MiB (17.604 GB/s @ duck_max), judge_gap=77.408µs, judge_ratio=1.110x Token # 1: 763.898ms; value: next_token_ids=tensor([1415], device='cuda:0') mtp accept=1 prop=1415 top1=1415 accp=1.000 next=draft=112036 prop=112036 olap pair=709.7ms serial=1311.2ms gain=601.5ms ratio=0.46 s0=624.5ms s1=686.7ms wait=0.2/43.1ms pred gate=device [2026-04-08 07:44:24.898530 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.03683ms; phases: prepare=3.327µs, send=62.748µs, judge_wait=841.165µs, fetch=92.704µs, reduce=20ns, writeback=554ns; duck time-ns stats: p50=756.38µs, p90=762.133µs, max=766.191µs; effective_read: activated_experts=8, params=0.011010G (14.370 Gparam/s @ duck_max), memory=11.818 MiB (16.173 GB/s @ duck_max), judge_gap=74.974µs, judge_ratio=1.098x Token # 2: 3.852ms; value: next_token_ids=tensor([112036], device='cuda:0') mtp accept=1 prop=112036 top1=112036 accp=1.000 next=pair draft=49672 prop=49672 pred gate=device [2026-04-08 07:44:25.012497 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 973.252µs; phases: prepare=3.686µs, send=62.059µs, judge_wait=777.866µs, fetch=91.492µs, reduce=20ns, writeback=622ns; duck time-ns stats: p50=695.45µs, p90=702.642µs, max=704.307µs; effective_read: activated_experts=8, params=0.011010G (15.632 Gparam/s @ duck_max), memory=11.818 MiB (17.594 GB/s @ duck_max), judge_gap=73.559µs, judge_ratio=1.104x Token # 3: 114.139ms; value: next_token_ids=tensor([2284], device='cuda:0') mtp accept=0 prop=49672 top1=49672 accp=0.797 next=draft=7163 prop=7163 olap pair=108.8ms serial=192.9ms gain=84.1ms ratio=0.44 s0=4.3ms s1=188.6ms wait=0.1/50.6ms pred gate=device [2026-04-08 07:44:25.127319 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 980.529µs; phases: prepare=3.669µs, send=61.816µs, judge_wait=784.188µs, fetch=92.423µs, reduce=24ns, writeback=723ns; duck time-ns stats: p50=702.256µs, p90=709.193µs, max=711.459µs; effective_read: activated_experts=8, params=0.011010G (15.475 Gparam/s @ duck_max), memory=11.818 MiB (17.417 GB/s @ duck_max), judge_gap=72.729µs, judge_ratio=1.102x Token # 4: 114.855ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/46.0ms pred gate=device [2026-04-08 07:44:25.131243 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 970.994µs; phases: prepare=3.085µs, send=60.738µs, judge_wait=779.486µs, fetch=90.537µs, reduce=19ns, writeback=460ns; duck time-ns stats: p50=692.251µs, p90=698.833µs, max=706.791µs; effective_read: activated_experts=8, params=0.011010G (15.578 Gparam/s @ duck_max), memory=11.818 MiB (17.532 GB/s @ duck_max), judge_gap=72.695µs, judge_ratio=1.103x Token # 5: 3.786ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device [2026-04-08 07:44:25.245615 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 977.875µs; phases: prepare=3.758µs, send=62.243µs, judge_wait=783.654µs, fetch=91.128µs, reduce=19ns, writeback=539ns; duck time-ns stats: p50=696.598µs, p90=704.621µs, max=708.431µs; effective_read: activated_experts=8, params=0.011010G (15.541 Gparam/s @ duck_max), memory=11.818 MiB (17.492 GB/s @ duck_max), judge_gap=75.223µs, judge_ratio=1.106x Token # 6: 114.425ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=draft=389 prop=389 olap pair=109.2ms serial=193.8ms gain=84.7ms ratio=0.44 s0=5.7ms s1=188.2ms wait=0.2/43.5ms pred gate=device [2026-04-08 07:44:25.249427 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 975.044µs; phases: prepare=3.378µs, send=61.781µs, judge_wait=781.721µs, fetch=91.179µs, reduce=20ns, writeback=488ns; duck time-ns stats: p50=698.495µs, p90=702.4µs, max=708.912µs; effective_read: activated_experts=8, params=0.011010G (15.531 Gparam/s @ duck_max), memory=11.818 MiB (17.480 GB/s @ duck_max), judge_gap=72.809µs, judge_ratio=1.103x Token # 7: 3.698ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=0.982 next=pair draft=1703 prop=1703 pred gate=device [2026-04-08 07:44:25.364298 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.035736ms; phases: prepare=3.257µs, send=61.045µs, judge_wait=834.74µs, fetch=98.822µs, reduce=25ns, writeback=603ns; duck time-ns stats: p50=755.45µs, p90=760.02µs, max=761.222µs; effective_read: activated_experts=8, params=0.011010G (14.464 Gparam/s @ duck_max), memory=11.818 MiB (16.279 GB/s @ duck_max), judge_gap=73.518µs, judge_ratio=1.097x Token # 8: 115.051ms; value: next_token_ids=tensor([1703], device='cuda:0') mtp accept=1 prop=1703 top1=1703 accp=1.000 next=draft=996 prop=996 olap pair=109.6ms serial=193.6ms gain=84.0ms ratio=0.43 s0=7.2ms s1=186.4ms wait=0.2/42.1ms pred gate=device [2026-04-08 07:44:25.368225 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 970.177µs; phases: prepare=3.301µs, send=60.604µs, judge_wait=778.381µs, fetch=91.114µs, reduce=24ns, writeback=570ns; duck time-ns stats: p50=695.559µs, p90=702.8µs, max=705.201µs; effective_read: activated_experts=8, params=0.011010G (15.613 Gparam/s @ duck_max), memory=11.818 MiB (17.572 GB/s @ duck_max), judge_gap=73.18µs, judge_ratio=1.104x Token # 9: 3.822ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=pair draft=3467 prop=3467 pred gate=device Token # 10: 114.547ms; value: next_token_ids=tensor([3467], device='cuda:0') mtp accept=1 prop=3467 top1=3467 accp=1.000 next=draft=5409 prop=5409 olap pair=109.3ms serial=192.3ms gain=82.9ms ratio=0.43 s0=5.2ms s1=187.1ms wait=0.1/44.5ms pred gate=device Token # 11: 3.768ms; value: next_token_ids=tensor([5409], device='cuda:0') mtp accept=1 prop=5409 top1=5409 accp=0.994 next=pair draft=223 prop=223 pred gate=device Token # 12: 113.969ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=39932 accp=0.455 next=draft=39932 prop=39932 olap pair=108.0ms serial=189.6ms gain=81.6ms ratio=0.43 s0=5.5ms s1=184.1ms wait=0.2/44.3ms pred gate=device Token # 13: 4.527ms; value: next_token_ids=tensor([39932], device='cuda:0') mtp accept=1 prop=39932 top1=39932 accp=0.998 next=pair draft=5640 prop=5640 pred gate=device Token # 14: 114.572ms; value: next_token_ids=tensor([5640], device='cuda:0') mtp accept=1 prop=5640 top1=5640 accp=0.672 next=draft=1959 prop=1959 olap pair=108.5ms serial=191.3ms gain=82.8ms ratio=0.43 s0=6.8ms s1=184.5ms wait=0.2/42.5ms pred gate=device Token # 15: 4.530ms; value: next_token_ids=tensor([1959], device='cuda:0') mtp accept=1 prop=1959 top1=1959 accp=0.673 next=pair draft=8283 prop=8283 pred gate=device Token # 16: 114.301ms; value: next_token_ids=tensor([8283], device='cuda:0') mtp accept=1 prop=8283 top1=8283 accp=0.983 next=draft=98938 prop=98938 olap pair=108.5ms serial=191.2ms gain=82.8ms ratio=0.43 s0=8.7ms s1=182.6ms wait=0.2/40.4ms pred gate=device Token # 17: 3.750ms; value: next_token_ids=tensor([98938], device='cuda:0') mtp accept=1 prop=98938 top1=98938 accp=0.970 next=pair draft=1703 prop=1703 pred gate=device Token # 18: 114.411ms; value: next_token_ids=tensor([1703], device='cuda:0') mtp accept=1 prop=1703 top1=1703 accp=0.596 next=draft=996 prop=996 olap pair=109.1ms serial=192.9ms gain=83.8ms ratio=0.43 s0=5.3ms s1=187.6ms wait=0.1/44.3ms pred gate=device Token # 19: 3.690ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 20: 112.836ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.741 next=draft=4754 prop=4754 olap pair=107.7ms serial=191.0ms gain=83.4ms ratio=0.44 s0=4.2ms s1=186.8ms wait=0.1/45.1ms pred gate=device Token # 21: 3.772ms; value: next_token_ids=tensor([4754], device='cuda:0') mtp accept=1 prop=4754 top1=7346 accp=0.351 next=pair draft=768 prop=768 pred gate=device Token # 22: 113.480ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.740 next=draft=7163 prop=7163 olap pair=108.3ms serial=191.1ms gain=82.7ms ratio=0.43 s0=4.3ms s1=186.8ms wait=0.1/45.5ms pred gate=device Token # 23: 3.735ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.999 next=pair draft=27521 prop=27521 pred gate=device Token # 24: 114.725ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=109.6ms serial=194.3ms gain=84.7ms ratio=0.44 s0=6.1ms s1=188.2ms wait=0.2/43.4ms pred gate=device Token # 25: 3.784ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=0 prop=19698 top1=19698 accp=0.534 next=pair draft=438 prop=438 pred gate=device Token # 26: 115.472ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.830 next=draft=223 prop=223 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.7ms s1=189.6ms wait=0.1/44.9ms pred gate=device Token # 27: 4.640ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.989 next=pair draft=20 prop=20 pred gate=device Token # 28: 115.983ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=64 prop=64 olap pair=109.7ms serial=193.7ms gain=84.1ms ratio=0.43 s0=8.7ms s1=185.0ms wait=0.2/40.4ms pred gate=device Token # 29: 4.596ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=0.999 next=pair draft=397 prop=397 pred gate=device Token # 30: 114.943ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=1 prop=397 top1=397 accp=1.000 next=draft=303 prop=303 olap pair=109.8ms serial=194.5ms gain=84.7ms ratio=0.44 s0=4.6ms s1=189.9ms wait=0.1/45.0ms pred gate=device Token # 31: 3.722ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=0 prop=303 top1=303 accp=0.873 next=pair draft=2636 prop=2636 pred gate=device Token # 32: 113.555ms; value: next_token_ids=tensor([14149], device='cuda:0') mtp accept=0 prop=2636 top1=14149 accp=0.381 next=draft=303 prop=303 olap pair=108.3ms serial=192.2ms gain=83.9ms ratio=0.44 s0=4.3ms s1=187.9ms wait=0.1/44.4ms pred gate=device Token # 33: 113.779ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.976 next=draft=20 prop=20 olap pair=108.5ms serial=191.8ms gain=83.3ms ratio=0.43 s0=4.2ms s1=187.6ms wait=0.1/45.1ms pred gate=device Token # 34: 3.704ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=64 prop=64 pred gate=device Token # 35: 113.940ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=draft=397 prop=553 olap pair=108.7ms serial=192.1ms gain=83.4ms ratio=0.43 s0=4.8ms s1=187.3ms wait=0.1/44.9ms pred gate=device Token # 36: 3.743ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=0 prop=553 top1=397 accp=0.828 next=pair draft=438 prop=438 pred gate=device Token # 37: 113.894ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=108.6ms serial=192.6ms gain=84.0ms ratio=0.44 s0=4.4ms s1=188.2ms wait=0.1/44.5ms pred gate=device Token # 38: 3.702ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 39: 114.518ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.999 next=draft=27521 prop=27521 olap pair=109.2ms serial=193.6ms gain=84.4ms ratio=0.44 s0=4.4ms s1=189.3ms wait=0.1/44.8ms pred gate=device Token # 40: 3.778ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=24 prop=24 pred gate=device Token # 41: 113.409ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=draft=320 prop=320 olap pair=108.2ms serial=192.1ms gain=83.8ms ratio=0.44 s0=4.0ms s1=188.0ms wait=0.1/45.6ms pred gate=device Token # 42: 3.783ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.958 next=pair draft=2636 prop=2636 pred gate=device Token # 43: 113.441ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.977 next=draft=7163 prop=223 olap pair=108.0ms serial=190.5ms gain=82.4ms ratio=0.43 s0=6.1ms s1=184.4ms wait=0.2/43.4ms pred gate=device Token # 44: 3.801ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.663 next=pair draft=7163 prop=7163 pred gate=device Token # 45: 114.173ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=108.9ms serial=193.4ms gain=84.5ms ratio=0.44 s0=4.5ms s1=188.9ms wait=0.1/44.8ms pred gate=device Token # 46: 3.713ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 47: 112.886ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=0.991 next=draft=438 prop=438 olap pair=107.7ms serial=191.1ms gain=83.4ms ratio=0.44 s0=4.3ms s1=186.8ms wait=0.1/44.8ms pred gate=device Token # 48: 3.798ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.693 next=pair draft=223 prop=223 pred gate=device Token # 49: 114.229ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=7163 prop=7163 olap pair=109.1ms serial=193.8ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.6ms wait=0.1/45.2ms pred gate=device Token # 50: 3.691ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.994 next=pair draft=27521 prop=27521 pred gate=device Token # 51: 112.891ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=24 prop=24 olap pair=107.7ms serial=191.1ms gain=83.5ms ratio=0.44 s0=4.2ms s1=186.9ms wait=0.1/45.0ms pred gate=device Token # 52: 3.798ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=0.998 next=pair draft=940 prop=940 pred gate=device Token # 53: 113.729ms; value: next_token_ids=tensor([982], device='cuda:0') mtp accept=0 prop=940 top1=982 accp=0.000 next=draft=223 prop=223 olap pair=108.5ms serial=192.5ms gain=84.0ms ratio=0.44 s0=4.3ms s1=188.2ms wait=0.1/44.4ms pred gate=device Token # 54: 113.504ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=1457 prop=1457 olap pair=108.1ms serial=191.7ms gain=83.6ms ratio=0.44 s0=5.0ms s1=186.7ms wait=0.1/44.1ms pred gate=device Token # 55: 3.701ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=pair draft=940 prop=940 pred gate=device Token # 56: 113.230ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=0.995 next=draft=223 prop=223 olap pair=108.0ms serial=191.8ms gain=83.8ms ratio=0.44 s0=4.4ms s1=187.4ms wait=0.1/44.7ms pred gate=device Token # 57: 3.761ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 58: 113.622ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=1148 prop=1148 olap pair=108.4ms serial=192.7ms gain=84.2ms ratio=0.44 s0=4.4ms s1=188.3ms wait=0.1/44.6ms pred gate=device Token # 59: 3.726ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=0.997 next=pair draft=422 prop=422 pred gate=device Token # 60: 113.149ms; value: next_token_ids=tensor([422], device='cuda:0') mtp accept=1 prop=422 top1=422 accp=0.966 next=draft=57573 prop=303 olap pair=107.9ms serial=191.6ms gain=83.6ms ratio=0.44 s0=4.4ms s1=187.1ms wait=0.1/44.9ms pred gate=device Token # 61: 3.776ms; value: next_token_ids=tensor([57573], device='cuda:0') mtp accept=0 prop=303 top1=57573 accp=0.783 next=pair draft=768 prop=768 pred gate=device Token # 62: 112.883ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.981 next=draft=7163 prop=7163 olap pair=107.7ms serial=191.2ms gain=83.5ms ratio=0.44 s0=4.0ms s1=187.2ms wait=0.1/45.8ms pred gate=device Token # 63: 3.693ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.999 next=pair draft=27521 prop=27521 pred gate=device Token # 64: 112.771ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=107.5ms serial=190.9ms gain=83.3ms ratio=0.44 s0=4.1ms s1=186.7ms wait=0.1/45.4ms pred gate=device Token # 65: 3.690ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=0.860 next=pair draft=223 prop=223 pred gate=device Token # 66: 112.223ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.847 next=draft=17839 prop=17839 olap pair=107.1ms serial=190.1ms gain=83.0ms ratio=0.44 s0=4.4ms s1=185.7ms wait=0.1/44.7ms pred gate=device Token # 67: 3.699ms; value: next_token_ids=tensor([2431], device='cuda:0') mtp accept=0 prop=17839 top1=2431 accp=0.405 next=pair draft=15120 prop=15120 pred gate=device Token # 68: 113.075ms; value: next_token_ids=tensor([15120], device='cuda:0') mtp accept=1 prop=15120 top1=17839 accp=0.308 next=draft=223 prop=223 olap pair=107.9ms serial=191.6ms gain=83.7ms ratio=0.44 s0=4.4ms s1=187.2ms wait=0.1/44.9ms pred gate=device Token # 69: 3.719ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 70: 112.380ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=0.771 next=draft=64 prop=64 olap pair=107.2ms serial=190.2ms gain=83.0ms ratio=0.44 s0=4.4ms s1=185.8ms wait=0.1/44.8ms pred gate=device Token # 71: 3.670ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=397 prop=397 pred gate=device Token # 72: 112.918ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=1 prop=397 top1=397 accp=1.000 next=draft=940 prop=940 olap pair=107.7ms serial=191.3ms gain=83.6ms ratio=0.44 s0=4.4ms s1=186.9ms wait=0.1/44.7ms pred gate=device Token # 73: 3.767ms; value: next_token_ids=tensor([982], device='cuda:0') mtp accept=0 prop=940 top1=982 accp=0.044 next=pair draft=223 prop=223 pred gate=device Token # 74: 112.915ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.996 next=draft=1457 prop=1457 olap pair=107.7ms serial=191.4ms gain=83.7ms ratio=0.44 s0=4.3ms s1=187.1ms wait=0.1/45.0ms pred gate=device Token # 75: 3.693ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=pair draft=940 prop=940 pred gate=device Token # 76: 112.977ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=0.965 next=draft=223 prop=223 olap pair=107.7ms serial=191.3ms gain=83.6ms ratio=0.44 s0=4.4ms s1=186.9ms wait=0.1/44.8ms pred gate=device Token # 77: 3.749ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 78: 113.306ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=1148 prop=1148 olap pair=108.1ms serial=191.6ms gain=83.5ms ratio=0.44 s0=4.7ms s1=187.0ms wait=0.1/44.4ms pred gate=device Token # 79: 3.708ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=1.000 next=pair draft=14785 prop=14785 pred gate=device Token # 80: 112.059ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=0 prop=14785 top1=14785 accp=0.758 next=draft=64 prop=64 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.9ms s1=184.6ms wait=0.1/43.9ms pred gate=device Token # 81: 112.547ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=draft=397 prop=397 olap pair=107.3ms serial=190.4ms gain=83.1ms ratio=0.44 s0=4.9ms s1=185.5ms wait=0.1/44.7ms pred gate=device Token # 82: 3.746ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=1 prop=397 top1=397 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 83: 112.568ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.990 next=draft=223 prop=223 olap pair=107.4ms serial=190.4ms gain=83.0ms ratio=0.44 s0=4.9ms s1=185.5ms wait=0.1/44.2ms pred gate=device Token # 84: 3.707ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 85: 113.313ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=0.963 next=draft=14 prop=14 olap pair=108.0ms serial=191.5ms gain=83.6ms ratio=0.44 s0=4.3ms s1=187.3ms wait=0.1/45.1ms pred gate=device Token # 86: 3.808ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=1.000 next=pair draft=30633 prop=30633 pred gate=device Token # 87: 113.139ms; value: next_token_ids=tensor([30633], device='cuda:0') mtp accept=1 prop=30633 top1=30633 accp=1.000 next=draft=14 prop=14 olap pair=108.0ms serial=191.5ms gain=83.5ms ratio=0.44 s0=4.3ms s1=187.2ms wait=0.1/44.9ms pred gate=device Token # 88: 3.796ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=1.000 next=pair draft=25174 prop=25174 pred gate=device Token # 89: 113.298ms; value: next_token_ids=tensor([25174], device='cuda:0') mtp accept=1 prop=25174 top1=25174 accp=1.000 next=draft=320 prop=303 olap pair=108.1ms serial=192.0ms gain=83.9ms ratio=0.44 s0=4.0ms s1=187.9ms wait=0.1/45.8ms pred gate=device Token # 90: 3.777ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=0 prop=303 top1=320 accp=0.727 next=pair draft=63760 prop=63760 pred gate=device Token # 91: 113.574ms; value: next_token_ids=tensor([63760], device='cuda:0') mtp accept=1 prop=63760 top1=63760 accp=0.878 next=draft=223 prop=223 olap pair=108.3ms serial=191.6ms gain=83.2ms ratio=0.43 s0=4.0ms s1=187.5ms wait=0.1/45.9ms pred gate=device Token # 92: 3.755ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=0 prop=223 top1=1457 accp=0.166 next=pair draft=5289 prop=5289 pred gate=device Token # 93: 113.019ms; value: next_token_ids=tensor([5289], device='cuda:0') mtp accept=1 prop=5289 top1=389 accp=0.233 next=draft=223 prop=19 olap pair=107.8ms serial=191.2ms gain=83.4ms ratio=0.44 s0=4.0ms s1=187.2ms wait=0.1/45.9ms pred gate=device Token # 94: 3.792ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=0 prop=19 top1=223 accp=0.741 next=pair draft=14 prop=14 pred gate=device Token # 95: 113.106ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=1.000 next=draft=27521 prop=27521 olap pair=107.9ms serial=191.1ms gain=83.3ms ratio=0.44 s0=3.9ms s1=187.3ms wait=0.1/46.1ms pred gate=device Token # 96: 3.757ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=14 prop=14 pred gate=device Token # 97: 112.576ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=1.000 next=draft=6391 prop=6391 olap pair=107.3ms serial=190.5ms gain=83.2ms ratio=0.44 s0=3.9ms s1=186.6ms wait=0.1/46.2ms pred gate=device Token # 98: 3.711ms; value: next_token_ids=tensor([6391], device='cuda:0') mtp accept=1 prop=6391 top1=6391 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 99: 112.284ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.791 next=draft=4272 prop=4272 olap pair=107.0ms serial=190.0ms gain=83.0ms ratio=0.44 s0=3.9ms s1=186.2ms wait=0.1/46.3ms pred gate=device Token # 100: 3.736ms; value: next_token_ids=tensor([4272], device='cuda:0') mtp accept=1 prop=4272 top1=4272 accp=0.710 next=pair draft=1107 prop=1107 pred gate=device Token # 101: 113.957ms; value: next_token_ids=tensor([1107], device='cuda:0') mtp accept=1 prop=1107 top1=1107 accp=0.999 next=draft=19 prop=19 olap pair=108.0ms serial=191.1ms gain=83.1ms ratio=0.44 s0=5.7ms s1=185.4ms wait=0.1/44.1ms pred gate=device Token # 102: 4.374ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=5289 prop=5289 pred gate=device Token # 103: 113.357ms; value: next_token_ids=tensor([5289], device='cuda:0') mtp accept=1 prop=5289 top1=5289 accp=1.000 next=draft=7163 prop=7163 olap pair=108.2ms serial=192.1ms gain=83.9ms ratio=0.44 s0=4.2ms s1=187.9ms wait=0.1/45.4ms pred gate=device Token # 104: 3.747ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.986 next=pair draft=14 prop=14 pred gate=device Token # 105: 113.399ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=1.000 next=draft=27521 prop=27521 olap pair=108.2ms serial=190.9ms gain=82.7ms ratio=0.43 s0=4.2ms s1=186.7ms wait=0.1/45.8ms pred gate=device Token # 106: 3.726ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=14 prop=14 pred gate=device Token # 107: 113.753ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=1.000 next=draft=19698 prop=19698 olap pair=108.6ms serial=191.9ms gain=83.3ms ratio=0.43 s0=4.1ms s1=187.8ms wait=0.1/45.8ms pred gate=device Token # 108: 3.702ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 109: 112.632ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.983 next=draft=26865 prop=26865 olap pair=107.4ms serial=190.6ms gain=83.1ms ratio=0.44 s0=4.2ms s1=186.3ms wait=0.1/45.0ms pred gate=device Token # 110: 3.680ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=0 prop=26865 top1=2636 accp=0.480 next=pair draft=26865 prop=223 pred gate=device Token # 111: 112.776ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.543 next=draft=7163 prop=7163 olap pair=107.5ms serial=190.3ms gain=82.8ms ratio=0.44 s0=6.4ms s1=183.9ms wait=0.2/43.3ms pred gate=device Token # 112: 3.716ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 113: 112.902ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=107.6ms serial=190.1ms gain=82.5ms ratio=0.43 s0=4.0ms s1=186.1ms wait=0.1/46.1ms pred gate=device Token # 114: 3.728ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 115: 112.592ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=107.3ms serial=190.3ms gain=83.0ms ratio=0.44 s0=3.9ms s1=186.5ms wait=0.1/46.2ms pred gate=device Token # 116: 3.756ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=1457 prop=1457 pred gate=device Token # 117: 112.857ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=draft=982 prop=982 olap pair=107.6ms serial=191.2ms gain=83.5ms ratio=0.44 s0=3.8ms s1=187.4ms wait=0.1/46.4ms pred gate=device Token # 118: 3.829ms; value: next_token_ids=tensor([982], device='cuda:0') mtp accept=1 prop=982 top1=982 accp=0.983 next=pair draft=223 prop=223 pred gate=device Token # 119: 113.486ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=20 prop=20 olap pair=108.2ms serial=192.3ms gain=84.0ms ratio=0.44 s0=4.0ms s1=188.2ms wait=0.1/45.9ms pred gate=device Token # 120: 3.816ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=64 prop=64 pred gate=device Token # 121: 113.568ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=draft=397 prop=397 olap pair=108.3ms serial=191.4ms gain=83.1ms ratio=0.43 s0=6.6ms s1=184.8ms wait=0.2/43.1ms pred gate=device Token # 122: 3.740ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=1 prop=397 top1=397 accp=1.000 next=pair draft=940 prop=940 pred gate=device Token # 123: 112.946ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=1.000 next=draft=223 prop=223 olap pair=107.7ms serial=191.4ms gain=83.7ms ratio=0.44 s0=3.7ms s1=187.6ms wait=0.1/46.5ms pred gate=device Token # 124: 3.855ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 125: 112.693ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=320 prop=320 olap pair=107.6ms serial=191.0ms gain=83.4ms ratio=0.44 s0=3.7ms s1=187.3ms wait=0.1/46.3ms pred gate=device Token # 126: 3.702ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.675 next=pair draft=1207 prop=1207 pred gate=device Token # 127: 112.869ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=1 prop=1207 top1=1207 accp=0.919 next=draft=13103 prop=13103 olap pair=107.7ms serial=189.9ms gain=82.2ms ratio=0.43 s0=4.1ms s1=185.8ms wait=0.1/45.9ms pred gate=device Token # 128: 3.725ms; value: next_token_ids=tensor([31826], device='cuda:0') mtp accept=0 prop=13103 top1=13103 accp=0.909 next=pair draft=1057 prop=1057 pred gate=device Token # 129: 112.620ms; value: next_token_ids=tensor([1057], device='cuda:0') mtp accept=1 prop=1057 top1=1057 accp=0.945 next=draft=4618 prop=4618 olap pair=107.4ms serial=190.2ms gain=82.8ms ratio=0.44 s0=4.2ms s1=186.0ms wait=0.1/45.6ms pred gate=device Token # 130: 3.748ms; value: next_token_ids=tensor([4618], device='cuda:0') mtp accept=1 prop=4618 top1=4618 accp=0.852 next=pair draft=5870 prop=5870 pred gate=device Token # 131: 113.579ms; value: next_token_ids=tensor([5870], device='cuda:0') mtp accept=1 prop=5870 top1=5870 accp=0.975 next=draft=320 prop=320 olap pair=108.3ms serial=190.2ms gain=81.9ms ratio=0.43 s0=4.3ms s1=185.9ms wait=0.1/45.8ms pred gate=device Token # 132: 3.741ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.972 next=pair draft=13103 prop=13103 pred gate=device Token # 133: 113.807ms; value: next_token_ids=tensor([13103], device='cuda:0') mtp accept=1 prop=13103 top1=13103 accp=0.872 next=draft=19585 prop=55717 olap pair=107.8ms serial=189.6ms gain=81.8ms ratio=0.43 s0=6.7ms s1=182.9ms wait=0.2/43.1ms pred gate=device Token # 134: 4.599ms; value: next_token_ids=tensor([65048], device='cuda:0') mtp accept=0 prop=55717 top1=1877 accp=0.628 next=pair draft=223 prop=223 pred gate=device Token # 135: 112.996ms; value: next_token_ids=tensor([2456], device='cuda:0') mtp accept=0 prop=223 top1=2456 accp=0.000 next=draft=2609 prop=2609 olap pair=107.5ms serial=190.2ms gain=82.6ms ratio=0.43 s0=5.9ms s1=184.2ms wait=0.2/43.9ms pred gate=device Token # 136: 114.280ms; value: next_token_ids=tensor([2609], device='cuda:0') mtp accept=1 prop=2609 top1=2609 accp=1.000 next=draft=996 prop=996 olap pair=107.9ms serial=190.5ms gain=82.6ms ratio=0.43 s0=8.3ms s1=182.2ms wait=0.2/41.1ms pred gate=device Token # 137: 4.676ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=pair draft=1316 prop=1316 pred gate=device Token # 138: 113.071ms; value: next_token_ids=tensor([1316], device='cuda:0') mtp accept=1 prop=1316 top1=5262 accp=0.452 next=draft=14034 prop=14034 olap pair=107.7ms serial=190.7ms gain=83.0ms ratio=0.44 s0=5.7ms s1=185.0ms wait=0.1/44.1ms pred gate=device Token # 139: 4.861ms; value: next_token_ids=tensor([8831], device='cuda:0') mtp accept=0 prop=14034 top1=8831 accp=0.115 next=pair draft=8458 prop=8458 pred gate=device Token # 140: 113.712ms; value: next_token_ids=tensor([8458], device='cuda:0') mtp accept=1 prop=8458 top1=8458 accp=1.000 next=draft=116427 prop=996 olap pair=108.3ms serial=191.2ms gain=83.0ms ratio=0.43 s0=7.7ms s1=183.6ms wait=0.2/41.7ms pred gate=device Token # 141: 3.817ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=0.317 next=pair draft=5262 prop=5262 pred gate=device Token # 142: 113.621ms; value: next_token_ids=tensor([5262], device='cuda:0') mtp accept=1 prop=5262 top1=5262 accp=1.000 next=draft=1148 prop=1148 olap pair=107.6ms serial=189.3ms gain=81.6ms ratio=0.43 s0=6.0ms s1=183.2ms wait=0.2/43.7ms pred gate=device Token # 143: 4.555ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=1.000 next=pair draft=14785 prop=14785 pred gate=device Token # 144: 113.281ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=0 prop=14785 top1=20 accp=0.242 next=draft=64 prop=64 olap pair=107.9ms serial=189.2ms gain=81.3ms ratio=0.43 s0=7.3ms s1=181.9ms wait=0.2/41.8ms pred gate=device Token # 145: 113.111ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=draft=397 prop=397 olap pair=107.7ms serial=189.4ms gain=81.6ms ratio=0.43 s0=4.4ms s1=185.0ms wait=0.1/45.5ms pred gate=device Token # 146: 3.717ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=1 prop=397 top1=397 accp=1.000 next=pair draft=940 prop=940 pred gate=device Token # 147: 112.652ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=0.869 next=draft=223 prop=223 olap pair=107.3ms serial=190.4ms gain=83.0ms ratio=0.44 s0=4.3ms s1=186.0ms wait=0.1/45.0ms pred gate=device Token # 148: 3.784ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.996 next=pair draft=19 prop=19 pred gate=device Token # 149: 112.969ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=1148 prop=1148 olap pair=107.7ms serial=191.4ms gain=83.6ms ratio=0.44 s0=4.3ms s1=187.1ms wait=0.1/44.9ms pred gate=device Token # 150: 3.706ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=0.992 next=pair draft=14149 prop=14149 pred gate=device Token # 151: 112.186ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=0 prop=14149 top1=20 accp=0.077 next=draft=64 prop=64 olap pair=106.9ms serial=189.6ms gain=82.6ms ratio=0.44 s0=4.1ms s1=185.5ms wait=0.1/45.8ms pred gate=device Token # 152: 112.855ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=draft=397 prop=397 olap pair=107.6ms serial=191.0ms gain=83.5ms ratio=0.44 s0=4.0ms s1=187.1ms wait=0.1/46.1ms pred gate=device Token # 153: 3.763ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=1 prop=397 top1=397 accp=1.000 next=pair draft=940 prop=940 pred gate=device Token # 154: 112.354ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=0.849 next=draft=223 prop=223 olap pair=107.1ms serial=190.2ms gain=83.1ms ratio=0.44 s0=3.8ms s1=186.4ms wait=0.1/46.3ms pred gate=device Token # 155: 3.730ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 156: 112.867ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=438 prop=438 olap pair=107.6ms serial=189.9ms gain=82.3ms ratio=0.43 s0=4.2ms s1=185.7ms wait=0.1/45.7ms pred gate=device Token # 157: 3.753ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.962 next=pair draft=223 prop=223 pred gate=device Token # 158: 112.373ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=7163 prop=7163 olap pair=107.1ms serial=189.8ms gain=82.7ms ratio=0.44 s0=4.3ms s1=185.5ms wait=0.1/45.2ms pred gate=device Token # 159: 3.818ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.997 next=pair draft=27521 prop=27521 pred gate=device Token # 160: 112.747ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=25 prop=25 olap pair=107.5ms serial=190.1ms gain=82.6ms ratio=0.43 s0=5.8ms s1=184.4ms wait=0.2/43.7ms pred gate=device Token # 161: 3.747ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=0.996 next=pair draft=303 prop=303 pred gate=device Token # 162: 112.320ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.980 next=draft=2099 prop=1207 olap pair=107.1ms serial=190.1ms gain=83.0ms ratio=0.44 s0=3.9ms s1=186.2ms wait=0.1/46.2ms pred gate=device Token # 163: 3.733ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=1 prop=1207 top1=1207 accp=0.276 next=pair draft=4407 prop=4407 pred gate=device Token # 164: 113.845ms; value: next_token_ids=tensor([7145], device='cuda:0') mtp accept=0 prop=4407 top1=55474 accp=0.304 next=draft=8283 prop=50746 olap pair=108.7ms serial=191.5ms gain=82.8ms ratio=0.43 s0=4.8ms s1=186.7ms wait=0.1/45.0ms pred gate=device Token # 165: 113.060ms; value: next_token_ids=tensor([8283], device='cuda:0') mtp accept=0 prop=50746 top1=8283 accp=0.723 next=draft=389 prop=389 olap pair=107.7ms serial=189.8ms gain=82.0ms ratio=0.43 s0=4.2ms s1=185.6ms wait=0.1/46.0ms pred gate=device Token # 166: 112.401ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=0.966 next=draft=7163 prop=7163 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.2ms wait=0.1/46.3ms pred gate=device Token # 167: 3.724ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.977 next=pair draft=27521 prop=27521 pred gate=device Token # 168: 112.360ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=0.996 next=draft=19698 prop=19698 olap pair=107.1ms serial=190.1ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.3ms wait=0.1/46.3ms pred gate=device Token # 169: 3.773ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 170: 112.063ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.982 next=draft=29521 prop=2636 olap pair=106.9ms serial=189.7ms gain=82.8ms ratio=0.44 s0=3.8ms s1=185.9ms wait=0.1/46.3ms pred gate=device Token # 171: 3.704ms; value: next_token_ids=tensor([1877], device='cuda:0') mtp accept=0 prop=2636 top1=1877 accp=0.124 next=pair draft=29521 prop=29521 pred gate=device Token # 172: 112.479ms; value: next_token_ids=tensor([29521], device='cuda:0') mtp accept=1 prop=29521 top1=29521 accp=0.856 next=draft=320 prop=320 olap pair=107.2ms serial=190.2ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.4ms wait=0.1/46.4ms pred gate=device Token # 173: 3.854ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.946 next=pair draft=4029 prop=4029 pred gate=device Token # 174: 112.457ms; value: next_token_ids=tensor([13103], device='cuda:0') mtp accept=0 prop=4029 top1=4029 accp=0.523 next=draft=19585 prop=19585 olap pair=107.3ms serial=190.4ms gain=83.1ms ratio=0.44 s0=4.4ms s1=186.0ms wait=0.1/44.9ms pred gate=device Token # 175: 112.216ms; value: next_token_ids=tensor([19585], device='cuda:0') mtp accept=1 prop=19585 top1=19585 accp=0.864 next=draft=223 prop=223 olap pair=106.9ms serial=189.7ms gain=82.8ms ratio=0.44 s0=4.1ms s1=185.6ms wait=0.1/45.8ms pred gate=device Token # 176: 3.728ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.961 next=pair draft=20 prop=20 pred gate=device Token # 177: 112.221ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=0.905 next=draft=64 prop=64 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=5.0ms s1=184.5ms wait=0.1/44.8ms pred gate=device Token # 178: 3.767ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=48776 prop=48776 pred gate=device Token # 179: 112.316ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=0 prop=48776 top1=397 accp=0.063 next=draft=982 prop=982 olap pair=107.1ms serial=190.1ms gain=83.0ms ratio=0.44 s0=3.9ms s1=186.2ms wait=0.1/46.2ms pred gate=device Token # 180: 112.387ms; value: next_token_ids=tensor([982], device='cuda:0') mtp accept=1 prop=982 top1=982 accp=0.762 next=draft=223 prop=223 olap pair=107.1ms serial=190.1ms gain=83.0ms ratio=0.44 s0=4.2ms s1=185.9ms wait=0.1/45.5ms pred gate=device Token # 181: 3.767ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=1457 prop=1457 pred gate=device Token # 182: 112.891ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=draft=940 prop=940 olap pair=107.7ms serial=191.3ms gain=83.6ms ratio=0.44 s0=4.5ms s1=186.8ms wait=0.1/44.8ms pred gate=device Token # 183: 3.870ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=0.991 next=pair draft=223 prop=223 pred gate=device Token # 184: 112.705ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=19 prop=19 olap pair=107.4ms serial=190.5ms gain=83.1ms ratio=0.44 s0=3.9ms s1=186.6ms wait=0.1/46.1ms pred gate=device Token # 185: 3.724ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 186: 112.269ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.904 next=draft=223 prop=223 olap pair=107.1ms serial=190.1ms gain=83.1ms ratio=0.44 s0=3.8ms s1=186.3ms wait=0.1/46.3ms pred gate=device Token # 187: 3.741ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 188: 112.264ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=0.858 next=draft=64 prop=64 olap pair=107.0ms serial=189.8ms gain=82.7ms ratio=0.44 s0=4.1ms s1=185.7ms wait=0.1/45.8ms pred gate=device Token # 189: 3.724ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=397 prop=397 pred gate=device Token # 190: 112.756ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=1 prop=397 top1=397 accp=1.000 next=draft=982 prop=982 olap pair=107.5ms serial=190.8ms gain=83.4ms ratio=0.44 s0=3.9ms s1=186.9ms wait=0.1/46.1ms pred gate=device Token # 191: 3.803ms; value: next_token_ids=tensor([982], device='cuda:0') mtp accept=1 prop=982 top1=982 accp=1.000 next=pair draft=343 prop=343 pred gate=device Token # 192: 112.342ms; value: next_token_ids=tensor([343], device='cuda:0') mtp accept=1 prop=343 top1=223 accp=0.448 next=draft=23 prop=23 olap pair=107.1ms serial=190.1ms gain=83.0ms ratio=0.44 s0=3.9ms s1=186.1ms wait=0.1/46.1ms pred gate=device Token # 193: 3.736ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=0 prop=23 top1=1457 accp=0.044 next=pair draft=11 prop=11 pred gate=device Token # 194: 112.779ms; value: next_token_ids=tensor([11], device='cuda:0') mtp accept=1 prop=11 top1=11 accp=0.965 next=draft=940 prop=940 olap pair=107.5ms serial=190.9ms gain=83.4ms ratio=0.44 s0=3.9ms s1=187.0ms wait=0.1/46.2ms pred gate=device Token # 195: 3.754ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 196: 112.702ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.915 next=draft=19 prop=19 olap pair=107.4ms serial=190.7ms gain=83.3ms ratio=0.44 s0=3.9ms s1=186.8ms wait=0.1/46.1ms pred gate=device Token # 197: 3.759ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 198: 112.336ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.856 next=draft=4029 prop=4029 olap pair=107.1ms serial=190.1ms gain=83.0ms ratio=0.44 s0=3.9ms s1=186.2ms wait=0.1/46.0ms pred gate=device Token # 199: 3.750ms; value: next_token_ids=tensor([4029], device='cuda:0') mtp accept=1 prop=4029 top1=4029 accp=0.857 next=pair draft=15206 prop=15206 pred gate=device Token # 200: 111.984ms; value: next_token_ids=tensor([15206], device='cuda:0') mtp accept=1 prop=15206 top1=15206 accp=0.666 next=draft=223 prop=223 olap pair=106.8ms serial=189.6ms gain=82.8ms ratio=0.44 s0=4.0ms s1=185.6ms wait=0.1/45.9ms pred gate=device Token # 201: 3.780ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.990 next=pair draft=20 prop=20 pred gate=device Token # 202: 112.994ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=0.773 next=draft=64 prop=64 olap pair=106.9ms serial=188.4ms gain=81.5ms ratio=0.43 s0=7.8ms s1=180.7ms wait=0.2/41.5ms pred gate=device Token # 203: 4.655ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=48776 prop=48776 pred gate=device Token # 204: 113.823ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=0 prop=48776 top1=397 accp=0.496 next=draft=982 prop=982 olap pair=107.7ms serial=189.8ms gain=82.0ms ratio=0.43 s0=8.6ms s1=181.2ms wait=0.2/40.5ms pred gate=device Token # 205: 113.270ms; value: next_token_ids=tensor([982], device='cuda:0') mtp accept=1 prop=982 top1=982 accp=0.969 next=draft=223 prop=223 olap pair=107.0ms serial=188.4ms gain=81.4ms ratio=0.43 s0=8.5ms s1=179.9ms wait=0.2/40.5ms pred gate=device Token # 206: 4.596ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=1457 prop=1457 pred gate=device Token # 207: 113.609ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=0.981 next=draft=940 prop=940 olap pair=107.5ms serial=189.3ms gain=81.8ms ratio=0.43 s0=8.6ms s1=180.8ms wait=0.2/40.6ms pred gate=device Token # 208: 4.644ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=0.831 next=pair draft=223 prop=223 pred gate=device Token # 209: 113.691ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=19 prop=19 olap pair=107.5ms serial=189.3ms gain=81.8ms ratio=0.43 s0=8.6ms s1=180.7ms wait=0.2/40.5ms pred gate=device Token # 210: 4.598ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=438 prop=303 pred gate=device Token # 211: 113.475ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=0 prop=303 top1=438 accp=0.821 next=draft=1207 prop=1207 olap pair=107.3ms serial=189.1ms gain=81.8ms ratio=0.43 s0=8.5ms s1=180.5ms wait=0.2/40.6ms pred gate=device Token # 212: 112.371ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=1 prop=1207 top1=1207 accp=0.887 next=draft=13103 prop=13103 olap pair=106.9ms serial=188.4ms gain=81.5ms ratio=0.43 s0=8.2ms s1=180.2ms wait=0.2/40.9ms pred gate=device Token # 213: 3.712ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=0 prop=13103 top1=1457 accp=0.202 next=pair draft=389 prop=389 pred gate=device Token # 214: 112.496ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=438 accp=0.666 next=draft=20 prop=20 olap pair=107.3ms serial=190.4ms gain=83.1ms ratio=0.44 s0=3.9ms s1=186.5ms wait=0.1/46.2ms pred gate=device Token # 215: 3.743ms; value: next_token_ids=tensor([553], device='cuda:0') mtp accept=0 prop=20 top1=553 accp=0.192 next=pair draft=64 prop=64 pred gate=device Token # 216: 112.267ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=draft=20 prop=20 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=3.9ms s1=185.9ms wait=0.1/46.0ms pred gate=device Token # 217: 3.727ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 218: 112.223ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=0 prop=303 top1=320 accp=0.521 next=draft=13103 prop=2636 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.2ms s1=185.5ms wait=0.1/45.2ms pred gate=device Token # 219: 112.229ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.402 next=draft=223 prop=19585 olap pair=106.9ms serial=189.6ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.7ms wait=0.1/46.2ms pred gate=device Token # 220: 3.747ms; value: next_token_ids=tensor([19585], device='cuda:0') mtp accept=1 prop=19585 top1=19585 accp=0.390 next=pair draft=223 prop=223 pred gate=device Token # 221: 112.322ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.518 next=draft=553 prop=553 olap pair=107.0ms serial=189.8ms gain=82.7ms ratio=0.44 s0=4.0ms s1=185.7ms wait=0.1/45.6ms pred gate=device Token # 222: 3.797ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=0 prop=553 top1=553 accp=0.885 next=pair draft=64 prop=64 pred gate=device Token # 223: 112.698ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=draft=397 prop=397 olap pair=107.4ms serial=190.6ms gain=83.2ms ratio=0.44 s0=3.9ms s1=186.7ms wait=0.1/46.1ms pred gate=device Token # 224: 3.721ms; value: next_token_ids=tensor([397], device='cuda:0') mtp accept=1 prop=397 top1=397 accp=1.000 next=pair draft=982 prop=982 pred gate=device Token # 225: 113.138ms; value: next_token_ids=tensor([982], device='cuda:0') mtp accept=1 prop=982 top1=982 accp=1.000 next=draft=223 prop=223 olap pair=107.0ms serial=189.5ms gain=82.5ms ratio=0.44 s0=4.8ms s1=184.7ms wait=0.1/45.2ms pred gate=device Token # 226: 4.408ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=553 prop=553 pred gate=device Token # 227: 113.286ms; value: next_token_ids=tensor([553], device='cuda:0') mtp accept=1 prop=553 top1=553 accp=1.000 next=draft=64 prop=64 olap pair=108.0ms serial=189.8ms gain=81.8ms ratio=0.43 s0=4.3ms s1=185.5ms wait=0.1/45.8ms pred gate=device Token # 228: 3.813ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 229: 112.726ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=940 prop=940 olap pair=107.5ms serial=190.1ms gain=82.6ms ratio=0.43 s0=4.9ms s1=185.2ms wait=0.1/44.8ms pred gate=device Token # 230: 3.763ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 231: 113.243ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.994 next=draft=19 prop=19 olap pair=107.9ms serial=190.6ms gain=82.7ms ratio=0.43 s0=4.3ms s1=186.3ms wait=0.1/45.6ms pred gate=device Token # 232: 3.783ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=320 prop=438 pred gate=device Token # 233: 112.617ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.287 next=draft=343 prop=343 olap pair=107.4ms serial=189.6ms gain=82.2ms ratio=0.43 s0=4.2ms s1=185.4ms wait=0.1/45.7ms pred gate=device Token # 234: 3.790ms; value: next_token_ids=tensor([343], device='cuda:0') mtp accept=1 prop=343 top1=343 accp=0.999 next=pair draft=20 prop=20 pred gate=device Token # 235: 112.494ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=0.997 next=draft=64 prop=64 olap pair=107.2ms serial=190.4ms gain=83.2ms ratio=0.44 s0=3.9ms s1=186.5ms wait=0.1/46.2ms pred gate=device Token # 236: 3.839ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=553 prop=553 pred gate=device Token # 237: 112.376ms; value: next_token_ids=tensor([553], device='cuda:0') mtp accept=1 prop=553 top1=553 accp=1.000 next=draft=982 prop=982 olap pair=107.2ms serial=190.3ms gain=83.1ms ratio=0.44 s0=3.9ms s1=186.3ms wait=0.1/46.0ms pred gate=device Token # 238: 3.851ms; value: next_token_ids=tensor([982], device='cuda:0') mtp accept=1 prop=982 top1=982 accp=0.971 next=pair draft=223 prop=223 pred gate=device Token # 239: 113.243ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=553 prop=553 olap pair=107.9ms serial=191.3ms gain=83.4ms ratio=0.44 s0=4.5ms s1=186.8ms wait=0.1/44.8ms pred gate=device Token # 240: 3.785ms; value: next_token_ids=tensor([553], device='cuda:0') mtp accept=1 prop=553 top1=553 accp=1.000 next=pair draft=21590 prop=21590 pred gate=device Token # 241: 112.561ms; value: next_token_ids=tensor([21590], device='cuda:0') mtp accept=1 prop=21590 top1=21590 accp=1.000 next=draft=20 prop=20 olap pair=107.3ms serial=189.2ms gain=81.9ms ratio=0.43 s0=4.3ms s1=185.0ms wait=0.1/45.5ms pred gate=device Token # 242: 3.776ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=0.950 next=pair draft=1148 prop=1148 pred gate=device Token # 243: 112.554ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=0 prop=1148 top1=940 accp=0.395 next=draft=223 prop=223 olap pair=107.3ms serial=190.4ms gain=83.1ms ratio=0.44 s0=4.1ms s1=186.4ms wait=0.1/45.8ms pred gate=device Token # 244: 112.322ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.999 next=draft=19 prop=19 olap pair=107.0ms serial=190.0ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.2ms wait=0.1/46.2ms pred gate=device Token # 245: 3.881ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=1148 prop=1148 pred gate=device Token # 246: 112.365ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=1.000 next=draft=14149 prop=14149 olap pair=107.1ms serial=190.3ms gain=83.1ms ratio=0.44 s0=3.9ms s1=186.3ms wait=0.1/46.1ms pred gate=device Token # 247: 3.778ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=0 prop=14149 top1=20 accp=0.004 next=pair draft=64 prop=64 pred gate=device Token # 248: 112.287ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=draft=553 prop=553 olap pair=107.0ms serial=189.9ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.1ms wait=0.1/46.3ms pred gate=device Token # 249: 3.730ms; value: next_token_ids=tensor([553], device='cuda:0') mtp accept=1 prop=553 top1=553 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 250: 112.481ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=0.925 next=draft=5769 prop=5769 olap pair=107.2ms serial=190.1ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.3ms wait=0.1/46.3ms pred gate=device Token # 251: 3.809ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=1.000 next=pair draft=22 prop=22 pred gate=device Token # 252: 112.131ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=draft=303 prop=303 olap pair=106.9ms serial=189.7ms gain=82.8ms ratio=0.44 s0=3.8ms s1=185.9ms wait=0.1/46.2ms pred gate=device Token # 253: 3.785ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=63760 prop=63760 pred gate=device Token # 254: 112.267ms; value: next_token_ids=tensor([63760], device='cuda:0') mtp accept=1 prop=63760 top1=63760 accp=0.971 next=draft=5769 prop=5769 olap pair=107.0ms serial=190.0ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.2ms wait=0.1/46.4ms pred gate=device Token # 255: 3.767ms; value: next_token_ids=tensor([553], device='cuda:0') mtp accept=0 prop=5769 top1=553 accp=0.160 next=pair draft=859 prop=389 pred gate=device Token # 256: 112.186ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=0 prop=389 top1=31 accp=0.031 next=draft=5769 prop=5769 olap pair=106.9ms serial=189.7ms gain=82.8ms ratio=0.44 s0=3.8ms s1=185.9ms wait=0.1/46.4ms pred gate=device Token # 257: 112.832ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=1.000 next=draft=1484 prop=1484 olap pair=107.4ms serial=190.8ms gain=83.4ms ratio=0.44 s0=3.8ms s1=187.0ms wait=0.1/46.3ms pred gate=device Token # 258: 3.750ms; value: next_token_ids=tensor([1484], device='cuda:0') mtp accept=1 prop=1484 top1=1484 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 259: 112.377ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=14260 prop=14260 olap pair=107.2ms serial=190.1ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.3ms wait=0.1/46.3ms pred gate=device Token # 260: 3.736ms; value: next_token_ids=tensor([14260], device='cuda:0') mtp accept=1 prop=14260 top1=14260 accp=1.000 next=pair draft=389 prop=31 pred gate=device Token # 261: 112.330ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=0.419 next=draft=7163 prop=7163 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=4.1ms s1=185.9ms wait=0.1/45.5ms pred gate=device Token # 262: 3.774ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.782 next=pair draft=27521 prop=27521 pred gate=device Token # 263: 111.945ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=6391 prop=6391 olap pair=106.7ms serial=189.2ms gain=82.5ms ratio=0.44 s0=4.6ms s1=184.6ms wait=0.1/44.7ms pred gate=device Token # 264: 3.771ms; value: next_token_ids=tensor([6391], device='cuda:0') mtp accept=1 prop=6391 top1=6391 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 265: 112.115ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.985 next=draft=1107 prop=1107 olap pair=106.8ms serial=189.3ms gain=82.5ms ratio=0.44 s0=4.9ms s1=184.5ms wait=0.1/44.3ms pred gate=device Token # 266: 3.762ms; value: next_token_ids=tensor([4272], device='cuda:0') mtp accept=0 prop=1107 top1=1107 accp=0.799 next=pair draft=1107 prop=1107 pred gate=device Token # 267: 112.302ms; value: next_token_ids=tensor([1107], device='cuda:0') mtp accept=1 prop=1107 top1=1107 accp=0.999 next=draft=19 prop=19 olap pair=107.0ms serial=189.9ms gain=82.9ms ratio=0.44 s0=4.8ms s1=185.1ms wait=0.1/44.3ms pred gate=device Token # 268: 3.758ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=5289 prop=31 pred gate=device Token # 269: 112.395ms; value: next_token_ids=tensor([5289], device='cuda:0') mtp accept=0 prop=31 top1=5289 accp=0.611 next=draft=7163 prop=7163 olap pair=107.2ms serial=189.8ms gain=82.6ms ratio=0.44 s0=4.5ms s1=185.3ms wait=0.1/44.8ms pred gate=device Token # 270: 112.563ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.1ms serial=190.2ms gain=83.1ms ratio=0.44 s0=4.4ms s1=185.8ms wait=0.1/45.0ms pred gate=device Token # 271: 3.767ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 272: 112.553ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=draft=320 prop=320 olap pair=107.3ms serial=190.7ms gain=83.4ms ratio=0.44 s0=4.4ms s1=186.3ms wait=0.1/45.0ms pred gate=device Token # 273: 3.802ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.971 next=pair draft=2636 prop=2636 pred gate=device Token # 274: 112.034ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.999 next=draft=223 prop=223 olap pair=106.8ms serial=189.6ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.2ms wait=0.1/44.9ms pred gate=device Token # 275: 3.917ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.993 next=pair draft=7163 prop=7163 pred gate=device Token # 276: 112.613ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.3ms serial=190.6ms gain=83.3ms ratio=0.44 s0=4.2ms s1=186.3ms wait=0.1/45.2ms pred gate=device Token # 277: 3.768ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 278: 112.373ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=draft=438 prop=438 olap pair=107.1ms serial=189.0ms gain=81.9ms ratio=0.43 s0=4.2ms s1=184.7ms wait=0.1/45.6ms pred gate=device Token # 279: 3.805ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=pair draft=223 prop=343 pred gate=device Token # 280: 112.280ms; value: next_token_ids=tensor([343], device='cuda:0') mtp accept=1 prop=343 top1=343 accp=0.607 next=draft=5769 prop=5769 olap pair=107.0ms serial=190.0ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.1ms wait=0.1/46.4ms pred gate=device Token # 281: 3.810ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=1.000 next=pair draft=22 prop=22 pred gate=device Token # 282: 112.023ms; value: next_token_ids=tensor([1484], device='cuda:0') mtp accept=0 prop=22 top1=1484 accp=0.182 next=draft=21590 prop=21590 olap pair=106.7ms serial=189.5ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.7ms wait=0.1/46.3ms pred gate=device Token # 283: 112.697ms; value: next_token_ids=tensor([21590], device='cuda:0') mtp accept=1 prop=21590 top1=21590 accp=0.999 next=draft=20 prop=20 olap pair=107.3ms serial=189.6ms gain=82.3ms ratio=0.43 s0=4.0ms s1=185.6ms wait=0.1/46.2ms pred gate=device Token # 284: 3.748ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=940 prop=940 pred gate=device Token # 285: 112.561ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=1.000 next=draft=223 prop=223 olap pair=107.3ms serial=189.6ms gain=82.4ms ratio=0.43 s0=4.3ms s1=185.4ms wait=0.1/45.4ms pred gate=device Token # 286: 3.838ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 287: 112.492ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=320 prop=320 olap pair=107.2ms serial=190.3ms gain=83.0ms ratio=0.44 s0=4.2ms s1=186.1ms wait=0.1/45.3ms pred gate=device Token # 288: 3.748ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.985 next=pair draft=2636 prop=2636 pred gate=device Token # 289: 112.173ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.622 next=draft=19585 prop=19585 olap pair=106.9ms serial=189.4ms gain=82.4ms ratio=0.44 s0=4.2ms s1=185.2ms wait=0.1/45.7ms pred gate=device Token # 290: 3.809ms; value: next_token_ids=tensor([19585], device='cuda:0') mtp accept=1 prop=19585 top1=19585 accp=0.984 next=pair draft=313 prop=313 pred gate=device Token # 291: 112.735ms; value: next_token_ids=tensor([5870], device='cuda:0') mtp accept=0 prop=313 top1=5870 accp=0.055 next=draft=545 prop=545 olap pair=107.5ms serial=190.6ms gain=83.1ms ratio=0.44 s0=4.3ms s1=186.3ms wait=0.1/44.9ms pred gate=device Token # 292: 112.328ms; value: next_token_ids=tensor([545], device='cuda:0') mtp accept=1 prop=545 top1=313 accp=0.502 next=draft=313 prop=313 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.3ms wait=0.1/45.1ms pred gate=device Token # 293: 3.905ms; value: next_token_ids=tensor([313], device='cuda:0') mtp accept=1 prop=313 top1=313 accp=0.998 next=pair draft=64 prop=64 pred gate=device Token # 294: 112.580ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=draft=20 prop=20 olap pair=107.3ms serial=190.4ms gain=83.1ms ratio=0.44 s0=4.4ms s1=185.9ms wait=0.1/44.9ms pred gate=device Token # 295: 3.754ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=940 prop=13 pred gate=device Token # 296: 112.406ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=940 accp=0.953 next=draft=19 prop=19 olap pair=107.1ms serial=190.1ms gain=83.0ms ratio=0.44 s0=4.4ms s1=185.7ms wait=0.1/45.1ms pred gate=device Token # 297: 3.731ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=223 prop=303 pred gate=device Token # 298: 112.228ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=223 accp=0.786 next=draft=4569 prop=4569 olap pair=106.9ms serial=189.8ms gain=82.8ms ratio=0.44 s0=3.9ms s1=185.9ms wait=0.1/46.1ms pred gate=device Token # 299: 3.742ms; value: next_token_ids=tensor([4569], device='cuda:0') mtp accept=1 prop=4569 top1=4569 accp=1.000 next=pair draft=313 prop=313 pred gate=device Token # 300: 112.615ms; value: next_token_ids=tensor([313], device='cuda:0') mtp accept=1 prop=313 top1=313 accp=1.000 next=draft=31 prop=31 olap pair=107.3ms serial=190.4ms gain=83.1ms ratio=0.44 s0=3.9ms s1=186.5ms wait=0.1/46.3ms pred gate=device Token # 301: 3.854ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=5769 prop=5769 pred gate=device Token # 302: 112.588ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=1.000 next=draft=1484 prop=1484 olap pair=107.2ms serial=190.1ms gain=82.9ms ratio=0.44 s0=4.0ms s1=186.1ms wait=0.1/46.0ms pred gate=device Token # 303: 3.815ms; value: next_token_ids=tensor([1484], device='cuda:0') mtp accept=1 prop=1484 top1=1484 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 304: 112.682ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.987 next=draft=556 prop=556 olap pair=107.4ms serial=189.9ms gain=82.5ms ratio=0.43 s0=4.3ms s1=185.7ms wait=0.1/45.4ms pred gate=device Token # 305: 3.749ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=0 prop=556 top1=2636 accp=0.164 next=pair draft=313 prop=313 pred gate=device Token # 306: 112.208ms; value: next_token_ids=tensor([47725], device='cuda:0') mtp accept=0 prop=313 top1=47725 accp=0.007 next=draft=768 prop=768 olap pair=107.0ms serial=189.9ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.0ms wait=0.1/46.3ms pred gate=device Token # 307: 112.025ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.983 next=draft=7163 prop=7163 olap pair=106.8ms serial=189.4ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.6ms wait=0.1/46.4ms pred gate=device Token # 308: 3.805ms; value: next_token_ids=tensor([80], device='cuda:0') mtp accept=0 prop=7163 top1=80 accp=0.393 next=pair draft=64 prop=64 pred gate=device Token # 309: 112.394ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=draft=20 prop=20 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.2ms wait=0.1/46.3ms pred gate=device Token # 310: 3.763ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=13 prop=13 pred gate=device Token # 311: 112.346ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=19 prop=19 olap pair=107.0ms serial=189.9ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.2ms wait=0.1/46.4ms pred gate=device Token # 312: 3.717ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 313: 112.110ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.996 next=draft=389 prop=389 olap pair=106.9ms serial=189.9ms gain=82.9ms ratio=0.44 s0=3.7ms s1=186.1ms wait=0.1/46.4ms pred gate=device Token # 314: 3.805ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=0.569 next=pair draft=1703 prop=1703 pred gate=device Token # 315: 112.197ms; value: next_token_ids=tensor([1703], device='cuda:0') mtp accept=1 prop=1703 top1=1703 accp=0.989 next=draft=996 prop=996 olap pair=106.9ms serial=189.8ms gain=82.9ms ratio=0.44 s0=3.7ms s1=186.1ms wait=0.1/46.4ms pred gate=device Token # 316: 3.782ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=pair draft=3467 prop=3467 pred gate=device Token # 317: 112.098ms; value: next_token_ids=tensor([3467], device='cuda:0') mtp accept=1 prop=3467 top1=3467 accp=1.000 next=draft=1148 prop=1148 olap pair=106.8ms serial=189.1ms gain=82.3ms ratio=0.44 s0=6.7ms s1=182.4ms wait=0.2/42.7ms pred gate=device Token # 318: 3.762ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=0.999 next=pair draft=3660 prop=3660 pred gate=device Token # 319: 112.457ms; value: next_token_ids=tensor([3660], device='cuda:0') mtp accept=1 prop=3660 top1=3660 accp=0.986 next=draft=313 prop=313 olap pair=107.2ms serial=190.3ms gain=83.1ms ratio=0.44 s0=4.0ms s1=186.3ms wait=0.1/46.0ms pred gate=device Token # 320: 3.835ms; value: next_token_ids=tensor([313], device='cuda:0') mtp accept=1 prop=313 top1=313 accp=0.935 next=pair draft=31 prop=31 pred gate=device Token # 321: 112.354ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=5769 prop=5769 olap pair=107.1ms serial=190.1ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.3ms wait=0.1/46.4ms pred gate=device Token # 322: 3.745ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=1.000 next=pair draft=1484 prop=1484 pred gate=device Token # 323: 112.098ms; value: next_token_ids=tensor([1484], device='cuda:0') mtp accept=1 prop=1484 top1=1484 accp=1.000 next=draft=303 prop=303 olap pair=106.8ms serial=189.6ms gain=82.8ms ratio=0.44 s0=3.8ms s1=185.8ms wait=0.1/46.3ms pred gate=device Token # 324: 3.811ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.958 next=pair draft=39932 prop=39932 pred gate=device Token # 325: 112.247ms; value: next_token_ids=tensor([39932], device='cuda:0') mtp accept=1 prop=39932 top1=39932 accp=0.783 next=draft=5640 prop=5640 olap pair=107.0ms serial=190.0ms gain=83.0ms ratio=0.44 s0=4.3ms s1=185.7ms wait=0.1/45.0ms pred gate=device Token # 326: 3.763ms; value: next_token_ids=tensor([5640], device='cuda:0') mtp accept=1 prop=5640 top1=5640 accp=0.999 next=pair draft=223 prop=223 pred gate=device Token # 327: 112.346ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.564 next=draft=7163 prop=7163 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=4.5ms s1=185.5ms wait=0.1/44.9ms pred gate=device Token # 328: 3.800ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=0 prop=7163 top1=5769 accp=0.361 next=pair draft=1484 prop=1484 pred gate=device Token # 329: 111.996ms; value: next_token_ids=tensor([1484], device='cuda:0') mtp accept=1 prop=1484 top1=1484 accp=1.000 next=draft=64 prop=64 olap pair=106.8ms serial=189.2ms gain=82.5ms ratio=0.44 s0=4.9ms s1=184.3ms wait=0.1/44.5ms pred gate=device Token # 330: 3.788ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 331: 115.046ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=13 prop=13 olap pair=109.8ms serial=194.9ms gain=85.1ms ratio=0.44 s0=4.9ms s1=190.0ms wait=0.1/44.5ms pred gate=device Token # 332: 3.928ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 333: 112.225ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=223 prop=223 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.0ms s1=185.5ms wait=0.1/46.2ms pred gate=device Token # 334: 3.769ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.968 next=pair draft=98938 prop=98938 pred gate=device Token # 335: 112.084ms; value: next_token_ids=tensor([98938], device='cuda:0') mtp accept=1 prop=98938 top1=98938 accp=0.998 next=draft=1703 prop=1703 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=4.1ms s1=185.3ms wait=0.1/45.7ms pred gate=device Token # 336: 3.865ms; value: next_token_ids=tensor([1703], device='cuda:0') mtp accept=1 prop=1703 top1=1703 accp=1.000 next=pair draft=996 prop=996 pred gate=device Token # 337: 112.195ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=draft=478 prop=478 olap pair=107.0ms serial=189.8ms gain=82.9ms ratio=0.44 s0=4.1ms s1=185.7ms wait=0.1/45.9ms pred gate=device Token # 338: 3.776ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.710 next=pair draft=4754 prop=4754 pred gate=device Token # 339: 111.981ms; value: next_token_ids=tensor([4754], device='cuda:0') mtp accept=1 prop=4754 top1=4754 accp=0.826 next=draft=768 prop=768 olap pair=106.8ms serial=189.6ms gain=82.8ms ratio=0.44 s0=4.0ms s1=185.6ms wait=0.1/46.0ms pred gate=device Token # 340: 3.689ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.988 next=pair draft=80 prop=80 pred gate=device Token # 341: 112.297ms; value: next_token_ids=tensor([80], device='cuda:0') mtp accept=1 prop=80 top1=80 accp=0.992 next=draft=64 prop=64 olap pair=107.1ms serial=190.1ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.3ms wait=0.1/46.4ms pred gate=device Token # 342: 3.755ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 343: 112.189ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=13 prop=13 olap pair=107.1ms serial=189.9ms gain=82.8ms ratio=0.44 s0=3.9ms s1=186.0ms wait=0.1/46.3ms pred gate=device Token # 344: 3.768ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 345: 112.001ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=223 prop=223 olap pair=106.8ms serial=189.1ms gain=82.3ms ratio=0.44 s0=4.3ms s1=184.8ms wait=0.1/45.2ms pred gate=device Token # 346: 3.720ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=37343 prop=37343 pred gate=device Token # 347: 112.216ms; value: next_token_ids=tensor([37343], device='cuda:0') mtp accept=1 prop=37343 top1=37343 accp=0.907 next=draft=996 prop=996 olap pair=107.1ms serial=189.9ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.1ms wait=0.1/46.3ms pred gate=device Token # 348: 3.697ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=0.645 next=pair draft=15206 prop=15206 pred gate=device Token # 349: 112.328ms; value: next_token_ids=tensor([44284], device='cuda:0') mtp accept=0 prop=15206 top1=44284 accp=0.450 next=draft=1703 prop=1703 olap pair=107.2ms serial=190.1ms gain=82.9ms ratio=0.44 s0=4.0ms s1=186.0ms wait=0.1/46.0ms pred gate=device Token # 350: 112.412ms; value: next_token_ids=tensor([1703], device='cuda:0') mtp accept=1 prop=1703 top1=1703 accp=0.921 next=draft=996 prop=996 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.2ms wait=0.1/46.4ms pred gate=device Token # 351: 3.675ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 352: 112.413ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.997 next=draft=1207 prop=1207 olap pair=107.2ms serial=190.1ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.3ms wait=0.1/46.3ms pred gate=device Token # 353: 3.695ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=1 prop=1207 top1=1207 accp=0.759 next=pair draft=18317 prop=18317 pred gate=device Token # 354: 112.303ms; value: next_token_ids=tensor([18317], device='cuda:0') mtp accept=1 prop=18317 top1=18317 accp=0.936 next=draft=11097 prop=11097 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.2ms wait=0.1/46.6ms pred gate=device Token # 355: 3.736ms; value: next_token_ids=tensor([11097], device='cuda:0') mtp accept=1 prop=11097 top1=11097 accp=0.989 next=pair draft=320 prop=320 pred gate=device Token # 356: 113.165ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.941 next=draft=3660 prop=3660 olap pair=107.2ms serial=188.8ms gain=81.6ms ratio=0.43 s0=8.5ms s1=180.2ms wait=0.2/41.0ms pred gate=device Token # 357: 4.560ms; value: next_token_ids=tensor([3660], device='cuda:0') mtp accept=1 prop=3660 top1=8040 accp=0.327 next=pair draft=313 prop=10542 pred gate=device Token # 358: 113.174ms; value: next_token_ids=tensor([313], device='cuda:0') mtp accept=0 prop=10542 top1=313 accp=0.420 next=draft=31 prop=31 olap pair=107.0ms serial=188.4ms gain=81.4ms ratio=0.43 s0=8.6ms s1=179.8ms wait=0.2/40.9ms pred gate=device Token # 359: 113.087ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=19 prop=19 olap pair=107.1ms serial=188.6ms gain=81.5ms ratio=0.43 s0=8.7ms s1=179.9ms wait=0.2/40.7ms pred gate=device Token # 360: 3.867ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=0.999 next=pair draft=14 prop=14 pred gate=device Token # 361: 112.313ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=0.959 next=draft=223 prop=223 olap pair=107.1ms serial=190.2ms gain=83.0ms ratio=0.44 s0=4.4ms s1=185.8ms wait=0.1/45.2ms pred gate=device Token # 362: 3.706ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.807 next=pair draft=20 prop=20 pred gate=device Token # 363: 112.153ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=0 prop=20 top1=19 accp=0.244 next=draft=64 prop=64 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=4.3ms s1=185.5ms wait=0.1/45.5ms pred gate=device Token # 364: 112.180ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=0.999 next=draft=20 prop=20 olap pair=106.9ms serial=189.6ms gain=82.6ms ratio=0.44 s0=4.7ms s1=184.9ms wait=0.1/44.6ms pred gate=device Token # 365: 3.736ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=0.999 next=pair draft=13 prop=13 pred gate=device Token # 366: 111.907ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=19 prop=19 olap pair=106.7ms serial=189.1ms gain=82.4ms ratio=0.44 s0=4.9ms s1=184.2ms wait=0.1/44.5ms pred gate=device Token # 367: 3.725ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 368: 112.140ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=20 prop=20 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=4.8ms s1=184.8ms wait=0.1/44.5ms pred gate=device Token # 369: 3.732ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 370: 112.255ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.892 next=draft=389 prop=389 olap pair=107.0ms serial=189.9ms gain=82.9ms ratio=0.44 s0=4.3ms s1=185.6ms wait=0.1/45.1ms pred gate=device Token # 371: 3.803ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=0.993 next=pair draft=1703 prop=1703 pred gate=device Token # 372: 112.441ms; value: next_token_ids=tensor([1703], device='cuda:0') mtp accept=1 prop=1703 top1=1703 accp=1.000 next=draft=996 prop=996 olap pair=107.2ms serial=190.2ms gain=83.0ms ratio=0.44 s0=3.9ms s1=186.3ms wait=0.1/46.1ms pred gate=device Token # 373: 3.717ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=pair draft=1847 prop=1847 pred gate=device Token # 374: 112.007ms; value: next_token_ids=tensor([1847], device='cuda:0') mtp accept=1 prop=1847 top1=1847 accp=1.000 next=draft=80 prop=80 olap pair=106.8ms serial=189.6ms gain=82.8ms ratio=0.44 s0=3.8ms s1=185.7ms wait=0.1/46.3ms pred gate=device Token # 375: 3.815ms; value: next_token_ids=tensor([80], device='cuda:0') mtp accept=1 prop=80 top1=80 accp=0.997 next=pair draft=31 prop=31 pred gate=device Token # 376: 111.966ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=20 prop=20 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=4.3ms s1=185.0ms wait=0.1/45.3ms pred gate=device Token # 377: 3.720ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=14 prop=14 pred gate=device Token # 378: 112.229ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=0.763 next=draft=223 prop=223 olap pair=107.0ms serial=190.1ms gain=83.0ms ratio=0.44 s0=4.4ms s1=185.7ms wait=0.1/44.9ms pred gate=device Token # 379: 3.717ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.996 next=pair draft=22 prop=22 pred gate=device Token # 380: 112.232ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=0.980 next=draft=13 prop=13 olap pair=107.1ms serial=190.1ms gain=83.0ms ratio=0.44 s0=4.4ms s1=185.7ms wait=0.1/45.0ms pred gate=device Token # 381: 3.731ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 382: 112.051ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=31 prop=31 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=4.2ms s1=185.4ms wait=0.1/45.4ms pred gate=device Token # 383: 3.791ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=23 prop=23 pred gate=device Token # 384: 112.070ms; value: next_token_ids=tensor([23], device='cuda:0') mtp accept=1 prop=23 top1=23 accp=1.000 next=draft=223 prop=223 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.6ms wait=0.1/46.2ms pred gate=device Token # 385: 3.775ms; value: next_token_ids=tensor([1847], device='cuda:0') mtp accept=0 prop=223 top1=1847 accp=0.307 next=pair draft=80 prop=80 pred gate=device Token # 386: 112.794ms; value: next_token_ids=tensor([80], device='cuda:0') mtp accept=1 prop=80 top1=80 accp=1.000 next=draft=31 prop=31 olap pair=107.5ms serial=190.6ms gain=83.2ms ratio=0.44 s0=3.9ms s1=186.7ms wait=0.1/46.1ms pred gate=device Token # 387: 3.759ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=22 prop=22 pred gate=device Token # 388: 112.487ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=draft=14 prop=14 olap pair=107.3ms serial=190.7ms gain=83.3ms ratio=0.44 s0=3.8ms s1=186.9ms wait=0.1/46.3ms pred gate=device Token # 389: 3.749ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=0.991 next=pair draft=223 prop=223 pred gate=device Token # 390: 112.334ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=926 prop=926 olap pair=107.1ms serial=190.2ms gain=83.1ms ratio=0.44 s0=3.7ms s1=186.5ms wait=0.1/46.4ms pred gate=device Token # 391: 3.725ms; value: next_token_ids=tensor([926], device='cuda:0') mtp accept=1 prop=926 top1=926 accp=1.000 next=pair draft=13 prop=13 pred gate=device Token # 392: 112.185ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=19 prop=19 olap pair=107.0ms serial=190.0ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.2ms wait=0.1/46.4ms pred gate=device Token # 393: 3.731ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 394: 112.486ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=1002 prop=1002 olap pair=107.3ms serial=190.3ms gain=83.1ms ratio=0.44 s0=4.2ms s1=186.2ms wait=0.1/45.8ms pred gate=device Token # 395: 3.693ms; value: next_token_ids=tensor([1002], device='cuda:0') mtp accept=1 prop=1002 top1=1002 accp=1.000 next=pair draft=1847 prop=1847 pred gate=device Token # 396: 112.119ms; value: next_token_ids=tensor([1847], device='cuda:0') mtp accept=1 prop=1847 top1=1847 accp=0.887 next=draft=80 prop=80 olap pair=106.9ms serial=189.6ms gain=82.6ms ratio=0.44 s0=4.9ms s1=184.7ms wait=0.1/44.4ms pred gate=device Token # 397: 3.789ms; value: next_token_ids=tensor([80], device='cuda:0') mtp accept=1 prop=80 top1=80 accp=0.938 next=pair draft=31 prop=31 pred gate=device Token # 398: 112.211ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=24 prop=24 olap pair=107.0ms serial=189.8ms gain=82.7ms ratio=0.44 s0=4.8ms s1=184.9ms wait=0.1/44.6ms pred gate=device Token # 399: 3.699ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=pair draft=14 prop=14 pred gate=device Token # 400: 112.202ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=0.999 next=draft=223 prop=223 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.8ms wait=0.1/46.2ms pred gate=device Token # 401: 3.928ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.990 next=pair draft=1872 prop=1872 pred gate=device Token # 402: 115.108ms; value: next_token_ids=tensor([1872], device='cuda:0') mtp accept=1 prop=1872 top1=1872 accp=1.000 next=draft=13 prop=13 olap pair=107.4ms serial=190.6ms gain=83.2ms ratio=0.44 s0=4.2ms s1=186.3ms wait=0.1/45.4ms pred gate=device Token # 403: 3.789ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 404: 112.651ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=31 prop=31 olap pair=107.4ms serial=190.6ms gain=83.1ms ratio=0.44 s0=4.0ms s1=186.5ms wait=0.1/46.0ms pred gate=device Token # 405: 3.767ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=1942 prop=1942 pred gate=device Token # 406: 112.349ms; value: next_token_ids=tensor([1942], device='cuda:0') mtp accept=1 prop=1942 top1=1942 accp=1.000 next=draft=1847 prop=1847 olap pair=107.2ms serial=190.1ms gain=82.9ms ratio=0.44 s0=4.0ms s1=186.1ms wait=0.1/46.2ms pred gate=device Token # 407: 3.790ms; value: next_token_ids=tensor([1847], device='cuda:0') mtp accept=1 prop=1847 top1=1847 accp=0.965 next=pair draft=10877 prop=10877 pred gate=device Token # 408: 112.236ms; value: next_token_ids=tensor([80], device='cuda:0') mtp accept=0 prop=10877 top1=80 accp=0.186 next=draft=31 prop=31 olap pair=107.0ms serial=190.0ms gain=82.9ms ratio=0.44 s0=4.0ms s1=186.0ms wait=0.1/46.0ms pred gate=device Token # 409: 112.414ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=553 prop=553 olap pair=107.1ms serial=190.1ms gain=82.9ms ratio=0.44 s0=4.1ms s1=186.0ms wait=0.1/45.8ms pred gate=device Token # 410: 3.678ms; value: next_token_ids=tensor([553], device='cuda:0') mtp accept=1 prop=553 top1=553 accp=1.000 next=pair draft=14 prop=14 pred gate=device Token # 411: 112.115ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=0.905 next=draft=223 prop=223 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=4.5ms s1=185.1ms wait=0.1/44.8ms pred gate=device Token # 412: 3.774ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.995 next=pair draft=1457 prop=1457 pred gate=device Token # 413: 112.618ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=draft=13 prop=13 olap pair=107.4ms serial=190.5ms gain=83.1ms ratio=0.44 s0=4.2ms s1=186.3ms wait=0.1/45.8ms pred gate=device Token # 414: 3.762ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 415: 112.242ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=31 prop=31 olap pair=107.1ms serial=189.9ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.1ms wait=0.1/46.2ms pred gate=device Token # 416: 3.857ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=4460 prop=4460 pred gate=device Token # 417: 112.137ms; value: next_token_ids=tensor([4460], device='cuda:0') mtp accept=1 prop=4460 top1=4460 accp=1.000 next=draft=1847 prop=1847 olap pair=107.0ms serial=189.9ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.0ms wait=0.1/46.4ms pred gate=device Token # 418: 3.676ms; value: next_token_ids=tensor([1847], device='cuda:0') mtp accept=1 prop=1847 top1=1847 accp=0.935 next=pair draft=10877 prop=10877 pred gate=device Token # 419: 112.115ms; value: next_token_ids=tensor([10877], device='cuda:0') mtp accept=1 prop=10877 top1=10877 accp=0.996 next=draft=320 prop=320 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.8ms wait=0.1/46.2ms pred gate=device Token # 420: 3.725ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=1207 prop=1207 pred gate=device Token # 421: 111.701ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=1 prop=1207 top1=1207 accp=0.778 next=draft=87805 prop=87805 olap pair=106.5ms serial=189.1ms gain=82.5ms ratio=0.44 s0=3.8ms s1=185.3ms wait=0.1/46.4ms pred gate=device Token # 422: 3.725ms; value: next_token_ids=tensor([313], device='cuda:0') mtp accept=0 prop=87805 top1=313 accp=0.142 next=pair draft=31 prop=31 pred gate=device Token # 423: 112.055ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=0.950 next=draft=5769 prop=5769 olap pair=106.8ms serial=189.6ms gain=82.8ms ratio=0.44 s0=3.8ms s1=185.9ms wait=0.1/46.4ms pred gate=device Token # 424: 3.722ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=0.997 next=pair draft=1484 prop=1484 pred gate=device Token # 425: 112.012ms; value: next_token_ids=tensor([1484], device='cuda:0') mtp accept=1 prop=1484 top1=1484 accp=1.000 next=draft=223 prop=223 olap pair=106.8ms serial=189.4ms gain=82.7ms ratio=0.44 s0=4.1ms s1=185.3ms wait=0.1/45.7ms pred gate=device Token # 426: 3.709ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.998 next=pair draft=2431 prop=6715 pred gate=device Token # 427: 111.955ms; value: next_token_ids=tensor([2431], device='cuda:0') mtp accept=0 prop=6715 top1=2431 accp=0.738 next=draft=12701 prop=12701 olap pair=106.8ms serial=189.4ms gain=82.7ms ratio=0.44 s0=4.3ms s1=185.1ms wait=0.1/45.1ms pred gate=device Token # 428: 112.056ms; value: next_token_ids=tensor([12701], device='cuda:0') mtp accept=1 prop=12701 top1=1334 accp=0.115 next=draft=303 prop=303 olap pair=106.8ms serial=189.5ms gain=82.8ms ratio=0.44 s0=3.8ms s1=185.7ms wait=0.1/46.2ms pred gate=device Token # 429: 3.769ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=0 prop=303 top1=320 accp=0.159 next=pair draft=39932 prop=39932 pred gate=device Token # 430: 111.979ms; value: next_token_ids=tensor([39932], device='cuda:0') mtp accept=1 prop=39932 top1=39932 accp=0.995 next=draft=5640 prop=5640 olap pair=106.8ms serial=189.3ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.4ms wait=0.1/46.3ms pred gate=device Token # 431: 3.727ms; value: next_token_ids=tensor([10251], device='cuda:0') mtp accept=0 prop=5640 top1=10251 accp=0.217 next=pair draft=1959 prop=1959 pred gate=device Token # 432: 112.468ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=1959 top1=223 accp=0.136 next=draft=7163 prop=7163 olap pair=107.3ms serial=189.4ms gain=82.1ms ratio=0.43 s0=4.1ms s1=185.3ms wait=0.1/45.8ms pred gate=device Token # 433: 112.271ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.999 next=draft=27521 prop=27521 olap pair=107.0ms serial=189.6ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.2ms wait=0.1/45.1ms pred gate=device Token # 434: 3.735ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 435: 112.022ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=draft=223 prop=223 olap pair=106.8ms serial=188.5ms gain=81.7ms ratio=0.43 s0=5.7ms s1=182.9ms wait=0.1/44.1ms pred gate=device Token # 436: 3.723ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=98938 prop=98938 pred gate=device Token # 437: 113.437ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=0 prop=98938 top1=301 accp=0.351 next=draft=2393 prop=2393 olap pair=107.8ms serial=189.0ms gain=81.3ms ratio=0.43 s0=4.5ms s1=184.5ms wait=0.1/45.7ms pred gate=device Token # 438: 112.415ms; value: next_token_ids=tensor([2393], device='cuda:0') mtp accept=1 prop=2393 top1=2393 accp=0.939 next=draft=946 prop=946 olap pair=107.1ms serial=189.3ms gain=82.2ms ratio=0.43 s0=5.3ms s1=184.1ms wait=0.1/44.6ms pred gate=device Token # 439: 3.778ms; value: next_token_ids=tensor([946], device='cuda:0') mtp accept=1 prop=946 top1=946 accp=1.000 next=pair draft=478 prop=478 pred gate=device Token # 440: 112.398ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.991 next=draft=4389 prop=4389 olap pair=107.2ms serial=189.6ms gain=82.4ms ratio=0.43 s0=4.8ms s1=184.8ms wait=0.1/45.1ms pred gate=device Token # 441: 3.659ms; value: next_token_ids=tensor([18467], device='cuda:0') mtp accept=0 prop=4389 top1=18467 accp=0.574 next=pair draft=16992 prop=16992 pred gate=device Token # 442: 111.903ms; value: next_token_ids=tensor([16992], device='cuda:0') mtp accept=1 prop=16992 top1=16992 accp=0.830 next=draft=1824 prop=3745 olap pair=106.7ms serial=189.4ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.4ms wait=0.1/46.1ms pred gate=device Token # 443: 3.662ms; value: next_token_ids=tensor([642], device='cuda:0') mtp accept=0 prop=3745 top1=642 accp=0.159 next=pair draft=2401 prop=71688 pred gate=device Token # 444: 112.151ms; value: next_token_ids=tensor([3745], device='cuda:0') mtp accept=0 prop=71688 top1=3745 accp=0.241 next=draft=10542 prop=10542 olap pair=106.9ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.2ms s1=185.5ms wait=0.1/45.4ms pred gate=device Token # 445: 112.354ms; value: next_token_ids=tensor([10542], device='cuda:0') mtp accept=1 prop=10542 top1=10542 accp=0.876 next=draft=1703 prop=1703 olap pair=107.1ms serial=188.7ms gain=81.6ms ratio=0.43 s0=4.1ms s1=184.6ms wait=0.1/45.8ms pred gate=device Token # 446: 3.721ms; value: next_token_ids=tensor([1703], device='cuda:0') mtp accept=1 prop=1703 top1=1703 accp=1.000 next=pair draft=996 prop=996 pred gate=device Token # 447: 112.736ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=0.802 next=draft=1824 prop=1824 olap pair=107.5ms serial=189.2ms gain=81.7ms ratio=0.43 s0=4.2ms s1=185.0ms wait=0.1/45.9ms pred gate=device Token # 448: 3.702ms; value: next_token_ids=tensor([1824], device='cuda:0') mtp accept=1 prop=1824 top1=1824 accp=0.998 next=pair draft=2401 prop=2401 pred gate=device Token # 449: 112.262ms; value: next_token_ids=tensor([2401], device='cuda:0') mtp accept=1 prop=2401 top1=2401 accp=0.986 next=draft=2575 prop=2575 olap pair=106.9ms serial=189.7ms gain=82.8ms ratio=0.44 s0=3.9ms s1=185.8ms wait=0.1/46.2ms pred gate=device Token # 450: 3.685ms; value: next_token_ids=tensor([2575], device='cuda:0') mtp accept=1 prop=2575 top1=2575 accp=0.932 next=pair draft=320 prop=320 pred gate=device Token # 451: 112.707ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.933 next=draft=7346 prop=7346 olap pair=107.5ms serial=189.0ms gain=81.5ms ratio=0.43 s0=4.2ms s1=184.8ms wait=0.1/45.8ms pred gate=device Token # 452: 3.747ms; value: next_token_ids=tensor([7346], device='cuda:0') mtp accept=1 prop=7346 top1=7346 accp=0.934 next=pair draft=303 prop=303 pred gate=device Token # 453: 112.625ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.968 next=draft=14785 prop=4339 olap pair=107.5ms serial=188.6ms gain=81.1ms ratio=0.43 s0=4.2ms s1=184.4ms wait=0.1/45.8ms pred gate=device Token # 454: 3.694ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=5640 accp=0.465 next=pair draft=223 prop=8283 pred gate=device Token # 455: 113.198ms; value: next_token_ids=tensor([14260], device='cuda:0') mtp accept=0 prop=8283 top1=14260 accp=0.110 next=draft=2103 prop=2103 olap pair=108.0ms serial=189.7ms gain=81.7ms ratio=0.43 s0=4.3ms s1=185.4ms wait=0.1/45.7ms pred gate=device Token # 456: 114.060ms; value: next_token_ids=tensor([2103], device='cuda:0') mtp accept=1 prop=2103 top1=2103 accp=1.000 next=draft=768 prop=768 olap pair=108.0ms serial=189.8ms gain=81.8ms ratio=0.43 s0=4.4ms s1=185.3ms wait=0.1/45.8ms pred gate=device Token # 457: 4.535ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=320 accp=0.506 next=pair draft=7163 prop=7163 pred gate=device Token # 458: 113.500ms; value: next_token_ids=tensor([9884], device='cuda:0') mtp accept=0 prop=7163 top1=9884 accp=0.072 next=draft=10 prop=10 olap pair=107.4ms serial=190.0ms gain=82.7ms ratio=0.44 s0=5.4ms s1=184.7ms wait=0.1/44.6ms pred gate=device Token # 459: 113.481ms; value: next_token_ids=tensor([10], device='cuda:0') mtp accept=1 prop=10 top1=10 accp=1.000 next=draft=7163 prop=7163 olap pair=107.2ms serial=188.7ms gain=81.5ms ratio=0.43 s0=8.7ms s1=180.1ms wait=0.2/40.7ms pred gate=device Token # 460: 4.410ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 461: 112.230ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=106.9ms serial=189.7ms gain=82.8ms ratio=0.44 s0=3.8ms s1=185.9ms wait=0.1/46.3ms pred gate=device Token # 462: 3.707ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=11 prop=11 pred gate=device Token # 463: 111.861ms; value: next_token_ids=tensor([11], device='cuda:0') mtp accept=1 prop=11 top1=11 accp=1.000 next=draft=35015 prop=35015 olap pair=106.6ms serial=189.1ms gain=82.5ms ratio=0.44 s0=3.8ms s1=185.3ms wait=0.1/46.2ms pred gate=device Token # 464: 3.797ms; value: next_token_ids=tensor([35015], device='cuda:0') mtp accept=1 prop=35015 top1=35015 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 465: 112.594ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.702 next=draft=5769 prop=5769 olap pair=107.4ms serial=189.3ms gain=82.0ms ratio=0.43 s0=4.5ms s1=184.8ms wait=0.1/45.2ms pred gate=device Token # 466: 3.781ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=1.000 next=pair draft=1484 prop=1484 pred gate=device Token # 467: 112.041ms; value: next_token_ids=tensor([1484], device='cuda:0') mtp accept=1 prop=1484 top1=1484 accp=0.689 next=draft=1148 prop=33 olap pair=106.8ms serial=189.6ms gain=82.8ms ratio=0.44 s0=3.9ms s1=185.7ms wait=0.1/46.4ms pred gate=device Token # 468: 3.782ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=0 prop=33 top1=16 accp=0.217 next=pair draft=18 prop=18 pred gate=device Token # 469: 112.182ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=0.614 next=draft=1148 prop=1148 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=3.9ms s1=186.0ms wait=0.1/46.3ms pred gate=device Token # 470: 3.697ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=0.963 next=pair draft=14149 prop=14149 pred gate=device Token # 471: 112.349ms; value: next_token_ids=tensor([14149], device='cuda:0') mtp accept=1 prop=14149 top1=14149 accp=0.994 next=draft=303 prop=303 olap pair=107.1ms serial=189.5ms gain=82.4ms ratio=0.43 s0=4.0ms s1=185.5ms wait=0.1/46.1ms pred gate=device Token # 472: 3.717ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.975 next=pair draft=5769 prop=5769 pred gate=device Token # 473: 112.339ms; value: next_token_ids=tensor([2524], device='cuda:0') mtp accept=0 prop=5769 top1=5769 accp=0.618 next=draft=313 prop=313 olap pair=107.1ms serial=189.7ms gain=82.5ms ratio=0.44 s0=5.2ms s1=184.4ms wait=0.1/44.7ms pred gate=device Token # 474: 112.137ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=313 top1=223 accp=0.183 next=draft=5769 prop=5769 olap pair=106.9ms serial=189.3ms gain=82.5ms ratio=0.44 s0=5.2ms s1=184.1ms wait=0.1/44.8ms pred gate=device Token # 475: 112.664ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=0.966 next=draft=1484 prop=1484 olap pair=107.3ms serial=189.1ms gain=81.8ms ratio=0.43 s0=4.3ms s1=184.8ms wait=0.1/45.6ms pred gate=device Token # 476: 3.735ms; value: next_token_ids=tensor([1484], device='cuda:0') mtp accept=1 prop=1484 top1=1484 accp=1.000 next=pair draft=64 prop=64 pred gate=device Token # 477: 112.272ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=draft=20 prop=20 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.4ms wait=0.1/45.3ms pred gate=device Token # 478: 3.714ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=438 prop=31 pred gate=device Token # 479: 112.340ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=0 prop=31 top1=438 accp=0.736 next=draft=223 prop=223 olap pair=107.1ms serial=189.6ms gain=82.6ms ratio=0.44 s0=6.2ms s1=183.4ms wait=0.2/43.2ms pred gate=device Token # 480: 112.502ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=7163 prop=7163 olap pair=107.2ms serial=190.0ms gain=82.8ms ratio=0.44 s0=4.5ms s1=185.6ms wait=0.1/45.0ms pred gate=device Token # 481: 3.725ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 482: 112.113ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=6391 prop=6391 olap pair=106.8ms serial=189.4ms gain=82.5ms ratio=0.44 s0=4.3ms s1=185.1ms wait=0.1/45.3ms pred gate=device Token # 483: 3.711ms; value: next_token_ids=tensor([6391], device='cuda:0') mtp accept=1 prop=6391 top1=6391 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 484: 112.890ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=2636 prop=2636 olap pair=106.8ms serial=188.8ms gain=82.0ms ratio=0.43 s0=5.7ms s1=183.1ms wait=0.2/44.1ms pred gate=device Token # 485: 4.636ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.987 next=pair draft=29764 prop=29764 pred gate=device Token # 486: 112.648ms; value: next_token_ids=tensor([29764], device='cuda:0') mtp accept=1 prop=29764 top1=29764 accp=1.000 next=draft=10 prop=10 olap pair=107.3ms serial=189.4ms gain=82.1ms ratio=0.43 s0=7.7ms s1=181.7ms wait=0.2/41.9ms pred gate=device Token # 487: 3.763ms; value: next_token_ids=tensor([10], device='cuda:0') mtp accept=1 prop=10 top1=10 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 488: 112.201ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.0ms s1=185.6ms wait=0.1/45.9ms pred gate=device Token # 489: 3.700ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 490: 112.023ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=draft=11 prop=11 olap pair=106.8ms serial=189.4ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.6ms wait=0.1/46.3ms pred gate=device Token # 491: 3.704ms; value: next_token_ids=tensor([11], device='cuda:0') mtp accept=1 prop=11 top1=11 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 492: 112.065ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.808 next=draft=29764 prop=29764 olap pair=106.9ms serial=189.4ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.5ms wait=0.1/46.0ms pred gate=device Token # 493: 3.946ms; value: next_token_ids=tensor([29764], device='cuda:0') mtp accept=1 prop=29764 top1=29764 accp=0.600 next=pair draft=10 prop=10 pred gate=device Token # 494: 112.418ms; value: next_token_ids=tensor([10], device='cuda:0') mtp accept=1 prop=10 top1=10 accp=1.000 next=draft=5769 prop=5769 olap pair=107.2ms serial=190.2ms gain=83.0ms ratio=0.44 s0=4.1ms s1=186.1ms wait=0.1/45.8ms pred gate=device Token # 495: 3.714ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=1.000 next=pair draft=1484 prop=1484 pred gate=device Token # 496: 112.070ms; value: next_token_ids=tensor([1484], device='cuda:0') mtp accept=1 prop=1484 top1=1484 accp=0.998 next=draft=64 prop=64 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.6ms wait=0.1/46.3ms pred gate=device Token # 497: 3.753ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 498: 111.660ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=13 prop=13 olap pair=106.5ms serial=189.0ms gain=82.5ms ratio=0.44 s0=3.8ms s1=185.2ms wait=0.1/46.3ms pred gate=device Token # 499: 3.805ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 500: 111.947ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=11 prop=11 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.5ms wait=0.1/46.1ms pred gate=device Token # 501: 3.691ms; value: next_token_ids=tensor([11], device='cuda:0') mtp accept=1 prop=11 top1=11 accp=1.000 next=pair draft=35015 prop=35015 pred gate=device Token # 502: 112.204ms; value: next_token_ids=tensor([35015], device='cuda:0') mtp accept=1 prop=35015 top1=35015 accp=0.968 next=draft=223 prop=223 olap pair=107.0ms serial=189.8ms gain=82.9ms ratio=0.44 s0=4.0ms s1=185.8ms wait=0.1/46.0ms pred gate=device Token # 503: 3.760ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=5769 prop=5769 pred gate=device Token # 504: 112.350ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=1.000 next=draft=1484 prop=1484 olap pair=107.1ms serial=189.9ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.1ms wait=0.1/46.3ms pred gate=device Token # 505: 3.720ms; value: next_token_ids=tensor([1484], device='cuda:0') mtp accept=1 prop=1484 top1=1484 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 506: 111.842ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=0.984 next=draft=1320 prop=1320 olap pair=106.6ms serial=189.3ms gain=82.6ms ratio=0.44 s0=3.8ms s1=185.5ms wait=0.1/46.3ms pred gate=device Token # 507: 3.699ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=pair draft=30633 prop=30633 pred gate=device Token # 508: 111.905ms; value: next_token_ids=tensor([30633], device='cuda:0') mtp accept=1 prop=30633 top1=30633 accp=0.916 next=draft=26 prop=26 olap pair=106.6ms serial=189.3ms gain=82.6ms ratio=0.44 s0=3.8ms s1=185.5ms wait=0.1/46.3ms pred gate=device Token # 509: 3.720ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 510: 112.118ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.952 next=draft=2636 prop=2636 olap pair=106.9ms serial=189.8ms gain=82.8ms ratio=0.44 s0=3.7ms s1=186.0ms wait=0.1/46.4ms pred gate=device Token # 511: 3.783ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.984 next=pair draft=39932 prop=39932 pred gate=device Token # 512: 112.219ms; value: next_token_ids=tensor([39932], device='cuda:0') mtp accept=1 prop=39932 top1=39932 accp=1.000 next=draft=5640 prop=5640 olap pair=107.0ms serial=189.8ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.1ms wait=0.1/46.3ms pred gate=device Token # 513: 3.745ms; value: next_token_ids=tensor([10251], device='cuda:0') mtp accept=0 prop=5640 top1=10251 accp=0.401 next=pair draft=621 prop=621 pred gate=device Token # 514: 112.100ms; value: next_token_ids=tensor([3599], device='cuda:0') mtp accept=0 prop=621 top1=3599 accp=0.183 next=draft=20973 prop=20973 olap pair=106.9ms serial=189.8ms gain=82.9ms ratio=0.44 s0=3.7ms s1=186.0ms wait=0.1/46.3ms pred gate=device Token # 515: 112.162ms; value: next_token_ids=tensor([20973], device='cuda:0') mtp accept=1 prop=20973 top1=20973 accp=0.762 next=draft=15120 prop=15120 olap pair=106.9ms serial=189.6ms gain=82.8ms ratio=0.44 s0=3.8ms s1=185.8ms wait=0.1/46.3ms pred gate=device Token # 516: 3.735ms; value: next_token_ids=tensor([15120], device='cuda:0') mtp accept=1 prop=15120 top1=15120 accp=0.999 next=pair draft=223 prop=223 pred gate=device Token # 517: 112.884ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.984 next=draft=5769 prop=5769 olap pair=106.8ms serial=188.3ms gain=81.4ms ratio=0.43 s0=8.0ms s1=180.3ms wait=0.2/41.5ms pred gate=device Token # 518: 4.572ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=1.000 next=pair draft=1484 prop=1484 pred gate=device Token # 519: 111.917ms; value: next_token_ids=tensor([3286], device='cuda:0') mtp accept=0 prop=1484 top1=3286 accp=0.057 next=draft=223 prop=223 olap pair=106.7ms serial=189.3ms gain=82.6ms ratio=0.44 s0=3.8ms s1=185.5ms wait=0.1/46.3ms pred gate=device Token # 520: 111.907ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=301 prop=301 olap pair=106.6ms serial=189.1ms gain=82.5ms ratio=0.44 s0=3.8ms s1=185.4ms wait=0.1/46.4ms pred gate=device Token # 521: 3.741ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=1.000 next=pair draft=1703 prop=1703 pred gate=device Token # 522: 112.215ms; value: next_token_ids=tensor([1703], device='cuda:0') mtp accept=1 prop=1703 top1=1703 accp=1.000 next=draft=996 prop=996 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=3.7ms s1=186.1ms wait=0.1/46.5ms pred gate=device Token # 523: 3.726ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=0.974 next=pair draft=320 prop=320 pred gate=device Token # 524: 112.047ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.959 next=draft=1207 prop=1207 olap pair=106.9ms serial=189.7ms gain=82.9ms ratio=0.44 s0=3.7ms s1=186.0ms wait=0.1/46.3ms pred gate=device Token # 525: 3.708ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=1 prop=1207 top1=1207 accp=0.924 next=pair draft=223 prop=223 pred gate=device Token # 526: 111.861ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.613 next=draft=5769 prop=5769 olap pair=106.7ms serial=189.4ms gain=82.6ms ratio=0.44 s0=3.8ms s1=185.6ms wait=0.1/46.3ms pred gate=device Token # 527: 3.701ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=1.000 next=pair draft=3286 prop=3286 pred gate=device Token # 528: 111.995ms; value: next_token_ids=tensor([3286], device='cuda:0') mtp accept=1 prop=3286 top1=3286 accp=1.000 next=draft=223 prop=223 olap pair=106.7ms serial=189.3ms gain=82.6ms ratio=0.44 s0=3.8ms s1=185.5ms wait=0.1/46.2ms pred gate=device Token # 529: 3.768ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=125236 prop=125236 pred gate=device Token # 530: 111.996ms; value: next_token_ids=tensor([125236], device='cuda:0') mtp accept=1 prop=125236 top1=125236 accp=0.710 next=draft=223 prop=223 olap pair=106.8ms serial=189.6ms gain=82.8ms ratio=0.44 s0=3.8ms s1=185.9ms wait=0.1/46.4ms pred gate=device Token # 531: 3.754ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.854 next=pair draft=1457 prop=1457 pred gate=device Token # 532: 111.965ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=0.963 next=draft=504 prop=504 olap pair=106.8ms serial=189.6ms gain=82.8ms ratio=0.44 s0=3.8ms s1=185.8ms wait=0.1/46.3ms pred gate=device Token # 533: 3.776ms; value: next_token_ids=tensor([504], device='cuda:0') mtp accept=1 prop=504 top1=504 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 534: 112.583ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.987 next=draft=2636 prop=2636 olap pair=107.4ms serial=190.8ms gain=83.4ms ratio=0.44 s0=3.8ms s1=187.0ms wait=0.1/46.4ms pred gate=device Token # 535: 3.685ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.980 next=pair draft=39932 prop=39932 pred gate=device Token # 536: 111.988ms; value: next_token_ids=tensor([39932], device='cuda:0') mtp accept=1 prop=39932 top1=39932 accp=0.860 next=draft=10251 prop=10251 olap pair=106.8ms serial=189.6ms gain=82.8ms ratio=0.44 s0=4.0ms s1=185.6ms wait=0.1/46.1ms pred gate=device Token # 537: 3.733ms; value: next_token_ids=tensor([10251], device='cuda:0') mtp accept=1 prop=10251 top1=10251 accp=0.870 next=pair draft=3723 prop=21676 pred gate=device Token # 538: 111.953ms; value: next_token_ids=tensor([21676], device='cuda:0') mtp accept=1 prop=21676 top1=6365 accp=0.664 next=draft=3021 prop=223 olap pair=106.8ms serial=189.3ms gain=82.5ms ratio=0.44 s0=4.3ms s1=185.0ms wait=0.1/45.6ms pred gate=device Token # 539: 3.767ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.702 next=pair draft=1457 prop=1457 pred gate=device Token # 540: 112.043ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=5769 accp=0.542 next=draft=504 prop=504 olap pair=106.8ms serial=189.3ms gain=82.5ms ratio=0.44 s0=4.3ms s1=185.0ms wait=0.1/45.3ms pred gate=device Token # 541: 3.785ms; value: next_token_ids=tensor([504], device='cuda:0') mtp accept=1 prop=504 top1=504 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 542: 112.554ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=53196 prop=53196 olap pair=107.4ms serial=189.8ms gain=82.4ms ratio=0.43 s0=4.1ms s1=185.7ms wait=0.1/45.8ms pred gate=device Token # 543: 3.750ms; value: next_token_ids=tensor([53196], device='cuda:0') mtp accept=1 prop=53196 top1=53196 accp=0.974 next=pair draft=1703 prop=1703 pred gate=device Token # 544: 112.871ms; value: next_token_ids=tensor([1703], device='cuda:0') mtp accept=1 prop=1703 top1=1703 accp=1.000 next=draft=996 prop=996 olap pair=107.5ms serial=189.4ms gain=81.9ms ratio=0.43 s0=4.7ms s1=184.7ms wait=0.1/45.4ms pred gate=device Token # 545: 3.699ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 546: 112.168ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.997 next=draft=556 prop=556 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.6ms s1=184.9ms wait=0.1/45.0ms pred gate=device Token # 547: 3.783ms; value: next_token_ids=tensor([556], device='cuda:0') mtp accept=1 prop=556 top1=556 accp=0.892 next=pair draft=48079 prop=48079 pred gate=device Token # 548: 112.194ms; value: next_token_ids=tensor([35007], device='cuda:0') mtp accept=0 prop=48079 top1=3723 accp=0.096 next=draft=303 prop=303 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.1ms wait=0.1/45.2ms pred gate=device Token # 549: 112.159ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.970 next=draft=1207 prop=1207 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.2ms wait=0.1/45.1ms pred gate=device Token # 550: 3.711ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=1 prop=1207 top1=1207 accp=1.000 next=pair draft=49365 prop=40101 pred gate=device Token # 551: 111.884ms; value: next_token_ids=tensor([40101], device='cuda:0') mtp accept=1 prop=40101 top1=40101 accp=0.783 next=draft=1460 prop=4339 olap pair=106.7ms serial=189.4ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.0ms wait=0.1/45.1ms pred gate=device Token # 552: 3.748ms; value: next_token_ids=tensor([1460], device='cuda:0') mtp accept=0 prop=4339 top1=1460 accp=0.550 next=pair draft=2431 prop=2431 pred gate=device Token # 553: 112.164ms; value: next_token_ids=tensor([1168], device='cuda:0') mtp accept=0 prop=2431 top1=1168 accp=0.042 next=draft=91338 prop=91338 olap pair=106.9ms serial=189.3ms gain=82.4ms ratio=0.44 s0=4.1ms s1=185.2ms wait=0.1/46.0ms pred gate=device Token # 554: 112.209ms; value: next_token_ids=tensor([91338], device='cuda:0') mtp accept=1 prop=91338 top1=91338 accp=0.999 next=draft=320 prop=320 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=4.0ms s1=185.6ms wait=0.1/46.0ms pred gate=device Token # 555: 3.692ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=18467 prop=13103 pred gate=device Token # 556: 112.520ms; value: next_token_ids=tensor([13103], device='cuda:0') mtp accept=1 prop=13103 top1=13103 accp=0.569 next=draft=18467 prop=18467 olap pair=107.3ms serial=189.0ms gain=81.7ms ratio=0.43 s0=4.7ms s1=184.3ms wait=0.1/45.2ms pred gate=device Token # 557: 3.716ms; value: next_token_ids=tensor([18467], device='cuda:0') mtp accept=1 prop=18467 top1=18467 accp=0.785 next=pair draft=4953 prop=9013 pred gate=device Token # 558: 111.992ms; value: next_token_ids=tensor([9013], device='cuda:0') mtp accept=1 prop=9013 top1=2541 accp=0.654 next=draft=4992 prop=4992 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.1ms wait=0.1/45.1ms pred gate=device Token # 559: 3.724ms; value: next_token_ids=tensor([3745], device='cuda:0') mtp accept=0 prop=4992 top1=3745 accp=0.367 next=pair draft=70889 prop=70889 pred gate=device Token # 560: 112.173ms; value: next_token_ids=tensor([2311], device='cuda:0') mtp accept=0 prop=70889 top1=2311 accp=0.290 next=draft=20251 prop=20251 olap pair=107.0ms serial=189.7ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.3ms wait=0.1/45.3ms pred gate=device Token # 561: 112.281ms; value: next_token_ids=tensor([4674], device='cuda:0') mtp accept=0 prop=20251 top1=22858 accp=0.070 next=draft=478 prop=478 olap pair=107.0ms serial=189.6ms gain=82.6ms ratio=0.44 s0=4.5ms s1=185.1ms wait=0.1/45.0ms pred gate=device Token # 562: 112.241ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.912 next=draft=7346 prop=1326 olap pair=107.0ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.9ms s1=184.6ms wait=0.1/44.6ms pred gate=device Token # 563: 3.733ms; value: next_token_ids=tensor([4754], device='cuda:0') mtp accept=0 prop=1326 top1=4754 accp=0.307 next=pair draft=768 prop=768 pred gate=device Token # 564: 111.902ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.991 next=draft=7163 prop=7163 olap pair=106.7ms serial=189.4ms gain=82.7ms ratio=0.44 s0=4.1ms s1=185.3ms wait=0.1/45.9ms pred gate=device Token # 565: 3.735ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.820 next=pair draft=27521 prop=27521 pred gate=device Token # 566: 111.936ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=106.7ms serial=189.2ms gain=82.5ms ratio=0.44 s0=4.0ms s1=185.2ms wait=0.1/46.1ms pred gate=device Token # 567: 3.680ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 568: 111.864ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.814 next=draft=389 prop=58000 olap pair=106.7ms serial=189.3ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.5ms wait=0.1/46.3ms pred gate=device Token # 569: 3.762ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=0 prop=58000 top1=389 accp=0.327 next=pair draft=100791 prop=100791 pred gate=device Token # 570: 112.157ms; value: next_token_ids=tensor([100791], device='cuda:0') mtp accept=1 prop=100791 top1=100791 accp=0.999 next=draft=3467 prop=303 olap pair=106.9ms serial=189.8ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.1ms wait=0.1/46.3ms pred gate=device Token # 571: 3.755ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.377 next=pair draft=2099 prop=2099 pred gate=device Token # 572: 112.084ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=0 prop=2099 top1=2636 accp=0.306 next=draft=2099 prop=2099 olap pair=106.9ms serial=189.7ms gain=82.8ms ratio=0.44 s0=3.9ms s1=185.8ms wait=0.1/46.3ms pred gate=device Token # 573: 112.058ms; value: next_token_ids=tensor([113008], device='cuda:0') mtp accept=0 prop=2099 top1=113008 accp=0.212 next=draft=223 prop=20 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.7ms wait=0.1/46.4ms pred gate=device Token # 574: 112.344ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=0.330 next=draft=124637 prop=124637 olap pair=106.9ms serial=189.8ms gain=82.8ms ratio=0.44 s0=3.8ms s1=186.0ms wait=0.1/46.4ms pred gate=device Token # 575: 3.693ms; value: next_token_ids=tensor([124637], device='cuda:0') mtp accept=1 prop=124637 top1=124637 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 576: 112.082ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.999 next=draft=8283 prop=8283 olap pair=106.9ms serial=189.7ms gain=82.8ms ratio=0.44 s0=3.8ms s1=185.9ms wait=0.1/46.4ms pred gate=device Token # 577: 3.750ms; value: next_token_ids=tensor([8283], device='cuda:0') mtp accept=1 prop=8283 top1=8283 accp=0.824 next=pair draft=548 prop=548 pred gate=device Token # 578: 112.281ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=1 prop=548 top1=49076 accp=0.874 next=draft=768 prop=768 olap pair=107.1ms serial=189.6ms gain=82.5ms ratio=0.44 s0=5.7ms s1=183.9ms wait=0.1/44.2ms pred gate=device Token # 579: 3.780ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.930 next=pair draft=19 prop=19 pred gate=device Token # 580: 112.324ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=13 prop=13 olap pair=107.0ms serial=190.0ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.1ms wait=0.1/46.1ms pred gate=device Token # 581: 3.772ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=18 prop=18 pred gate=device Token # 582: 112.374ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=13 prop=13 olap pair=107.2ms serial=190.2ms gain=83.0ms ratio=0.44 s0=4.4ms s1=185.8ms wait=0.1/45.4ms pred gate=device Token # 583: 3.772ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=22 prop=22 pred gate=device Token # 584: 112.133ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=draft=13 prop=13 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.3ms wait=0.1/45.0ms pred gate=device Token # 585: 3.824ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=26 prop=26 pred gate=device Token # 586: 112.169ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=draft=13 prop=13 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.4ms wait=0.1/45.2ms pred gate=device Token # 587: 3.730ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=23 prop=23 pred gate=device Token # 588: 112.212ms; value: next_token_ids=tensor([23], device='cuda:0') mtp accept=1 prop=23 top1=23 accp=1.000 next=draft=13 prop=13 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=4.1ms s1=185.7ms wait=0.1/45.8ms pred gate=device Token # 589: 3.796ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=25 prop=25 pred gate=device Token # 590: 111.994ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=draft=13 prop=13 olap pair=106.8ms serial=189.3ms gain=82.5ms ratio=0.44 s0=4.3ms s1=185.0ms wait=0.1/45.2ms pred gate=device Token # 591: 3.806ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=24 prop=24 pred gate=device Token # 592: 112.306ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=draft=13 prop=13 olap pair=107.1ms serial=190.0ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.2ms wait=0.1/46.3ms pred gate=device Token # 593: 3.784ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=18 prop=18 pred gate=device Token # 594: 112.206ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=13 prop=13 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=3.9ms s1=185.8ms wait=0.1/46.1ms pred gate=device Token # 595: 3.738ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 596: 112.115ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=438 prop=438 olap pair=106.9ms serial=189.7ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.8ms wait=0.1/46.4ms pred gate=device Token # 597: 3.781ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.812 next=pair draft=223 prop=223 pred gate=device Token # 598: 112.389ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=2111 prop=2111 olap pair=107.1ms serial=189.8ms gain=82.6ms ratio=0.44 s0=4.1ms s1=185.7ms wait=0.1/46.0ms pred gate=device Token # 599: 3.749ms; value: next_token_ids=tensor([2111], device='cuda:0') mtp accept=1 prop=2111 top1=2111 accp=1.000 next=pair draft=1148 prop=1148 pred gate=device Token # 600: 112.304ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=1148 top1=303 accp=0.063 next=draft=113008 prop=113008 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=4.1ms s1=185.8ms wait=0.1/45.8ms pred gate=device Token # 601: 112.355ms; value: next_token_ids=tensor([113008], device='cuda:0') mtp accept=1 prop=113008 top1=113008 accp=0.980 next=draft=21 prop=21 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=4.2ms s1=185.6ms wait=0.1/45.5ms pred gate=device Token # 602: 3.770ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=pair draft=124637 prop=124637 pred gate=device Token # 603: 112.021ms; value: next_token_ids=tensor([124637], device='cuda:0') mtp accept=1 prop=124637 top1=124637 accp=1.000 next=draft=303 prop=303 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.0ms wait=0.1/45.0ms pred gate=device Token # 604: 3.762ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.991 next=pair draft=2636 prop=2636 pred gate=device Token # 605: 111.737ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=1.000 next=draft=113008 prop=113008 olap pair=106.6ms serial=188.7ms gain=82.2ms ratio=0.44 s0=4.9ms s1=183.8ms wait=0.1/44.8ms pred gate=device Token # 606: 3.759ms; value: next_token_ids=tensor([113008], device='cuda:0') mtp accept=1 prop=113008 top1=113008 accp=0.924 next=pair draft=21 prop=21 pred gate=device Token # 607: 111.953ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=draft=124637 prop=124637 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=4.5ms s1=184.9ms wait=0.1/45.3ms pred gate=device Token # 608: 3.697ms; value: next_token_ids=tensor([124637], device='cuda:0') mtp accept=1 prop=124637 top1=124637 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 609: 111.687ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=5640 prop=58000 olap pair=106.5ms serial=188.9ms gain=82.4ms ratio=0.44 s0=3.8ms s1=185.1ms wait=0.1/46.3ms pred gate=device Token # 610: 3.721ms; value: next_token_ids=tensor([58000], device='cuda:0') mtp accept=1 prop=58000 top1=4755 accp=0.318 next=pair draft=23 prop=23 pred gate=device Token # 611: 111.992ms; value: next_token_ids=tensor([23], device='cuda:0') mtp accept=1 prop=23 top1=23 accp=0.992 next=draft=768 prop=1148 olap pair=106.8ms serial=189.3ms gain=82.6ms ratio=0.44 s0=4.6ms s1=184.7ms wait=0.1/45.1ms pred gate=device Token # 612: 3.706ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=0.581 next=pair draft=6785 prop=6785 pred gate=device Token # 613: 112.539ms; value: next_token_ids=tensor([580], device='cuda:0') mtp accept=0 prop=6785 top1=4755 accp=0.393 next=draft=18 prop=18 olap pair=107.3ms serial=189.7ms gain=82.4ms ratio=0.43 s0=4.5ms s1=185.2ms wait=0.1/44.9ms pred gate=device Token # 614: 112.082ms; value: next_token_ids=tensor([2616], device='cuda:0') mtp accept=0 prop=18 top1=2616 accp=0.003 next=draft=48912 prop=48912 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.0ms wait=0.1/44.9ms pred gate=device Token # 615: 112.606ms; value: next_token_ids=tensor([48912], device='cuda:0') mtp accept=1 prop=48912 top1=48912 accp=1.000 next=draft=303 prop=303 olap pair=107.0ms serial=189.3ms gain=82.3ms ratio=0.43 s0=6.3ms s1=183.1ms wait=0.2/43.4ms pred gate=device Token # 616: 3.709ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=113008 prop=113008 pred gate=device Token # 617: 112.321ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=0 prop=113008 top1=2636 accp=0.161 next=draft=113008 prop=113008 olap pair=107.1ms serial=190.1ms gain=83.0ms ratio=0.44 s0=4.4ms s1=185.8ms wait=0.1/45.3ms pred gate=device Token # 618: 112.272ms; value: next_token_ids=tensor([113008], device='cuda:0') mtp accept=1 prop=113008 top1=113008 accp=0.974 next=draft=23 prop=23 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.5ms wait=0.1/45.3ms pred gate=device Token # 619: 3.736ms; value: next_token_ids=tensor([23], device='cuda:0') mtp accept=1 prop=23 top1=23 accp=1.000 next=pair draft=124637 prop=124637 pred gate=device Token # 620: 112.036ms; value: next_token_ids=tensor([124637], device='cuda:0') mtp accept=1 prop=124637 top1=124637 accp=1.000 next=draft=320 prop=320 olap pair=106.8ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.3ms s1=185.1ms wait=0.1/45.2ms pred gate=device Token # 621: 3.712ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.912 next=pair draft=58000 prop=58000 pred gate=device Token # 622: 112.254ms; value: next_token_ids=tensor([58000], device='cuda:0') mtp accept=1 prop=58000 top1=58000 accp=1.000 next=draft=25 prop=25 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.3ms wait=0.1/45.2ms pred gate=device Token # 623: 3.786ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=pair draft=1148 prop=1148 pred gate=device Token # 624: 112.212ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=0.973 next=draft=14785 prop=14785 olap pair=107.0ms serial=189.6ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.2ms wait=0.1/45.5ms pred gate=device Token # 625: 3.741ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=0 prop=14785 top1=1300 accp=0.219 next=pair draft=27521 prop=27521 pred gate=device Token # 626: 112.010ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=0.999 next=draft=19698 prop=19698 olap pair=106.8ms serial=189.3ms gain=82.5ms ratio=0.44 s0=4.5ms s1=184.8ms wait=0.1/45.2ms pred gate=device Token # 627: 3.707ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=1492 prop=1492 pred gate=device Token # 628: 112.384ms; value: next_token_ids=tensor([24106], device='cuda:0') mtp accept=0 prop=1492 top1=24106 accp=0.473 next=draft=223 prop=223 olap pair=107.1ms serial=190.1ms gain=83.0ms ratio=0.44 s0=3.9ms s1=186.2ms wait=0.1/46.1ms pred gate=device Token # 629: 112.175ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=25 prop=25 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.7ms wait=0.1/46.3ms pred gate=device Token # 630: 3.698ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=pair draft=1148 prop=768 pred gate=device Token # 631: 111.942ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=0 prop=768 top1=1148 accp=0.825 next=draft=14785 prop=67384 olap pair=106.7ms serial=189.3ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.4ms wait=0.1/46.3ms pred gate=device Token # 632: 112.161ms; value: next_token_ids=tensor([67384], device='cuda:0') mtp accept=1 prop=67384 top1=14785 accp=0.932 next=draft=4339 prop=4339 olap pair=106.9ms serial=189.7ms gain=82.8ms ratio=0.44 s0=4.1ms s1=185.6ms wait=0.1/45.9ms pred gate=device Token # 633: 3.682ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=0.979 next=pair draft=768 prop=478 pred gate=device Token # 634: 112.010ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.691 next=draft=25 prop=18467 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.5ms wait=0.1/46.2ms pred gate=device Token # 635: 3.700ms; value: next_token_ids=tensor([18467], device='cuda:0') mtp accept=1 prop=18467 top1=18467 accp=0.063 next=pair draft=16992 prop=2541 pred gate=device Token # 636: 111.889ms; value: next_token_ids=tensor([2541], device='cuda:0') mtp accept=1 prop=2541 top1=2541 accp=0.611 next=draft=2311 prop=2311 olap pair=106.7ms serial=189.4ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.6ms wait=0.1/46.4ms pred gate=device Token # 637: 3.715ms; value: next_token_ids=tensor([2311], device='cuda:0') mtp accept=1 prop=2311 top1=2311 accp=0.871 next=pair draft=20251 prop=20251 pred gate=device Token # 638: 112.162ms; value: next_token_ids=tensor([20251], device='cuda:0') mtp accept=1 prop=20251 top1=20251 accp=1.000 next=draft=320 prop=320 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.8ms wait=0.1/46.2ms pred gate=device Token # 639: 3.735ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.988 next=pair draft=14785 prop=14785 pred gate=device Token # 640: 111.892ms; value: next_token_ids=tensor([4029], device='cuda:0') mtp accept=0 prop=14785 top1=5640 accp=0.084 next=draft=2541 prop=303 olap pair=106.7ms serial=189.4ms gain=82.7ms ratio=0.44 s0=4.0ms s1=185.4ms wait=0.1/46.0ms pred gate=device Token # 641: 111.946ms; value: next_token_ids=tensor([32257], device='cuda:0') mtp accept=0 prop=303 top1=32257 accp=0.025 next=draft=1959 prop=1959 olap pair=106.7ms serial=189.3ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.4ms wait=0.1/46.2ms pred gate=device Token # 642: 112.066ms; value: next_token_ids=tensor([1959], device='cuda:0') mtp accept=1 prop=1959 top1=223 accp=0.725 next=draft=8283 prop=8283 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=4.3ms s1=185.3ms wait=0.1/45.3ms pred gate=device Token # 643: 3.681ms; value: next_token_ids=tensor([8283], device='cuda:0') mtp accept=1 prop=8283 top1=8283 accp=0.913 next=pair draft=2431 prop=109058 pred gate=device Token # 644: 112.000ms; value: next_token_ids=tensor([109058], device='cuda:0') mtp accept=1 prop=109058 top1=2431 accp=0.868 next=draft=223 prop=223 olap pair=106.8ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.1ms wait=0.1/45.2ms pred gate=device Token # 645: 3.703ms; value: next_token_ids=tensor([20373], device='cuda:0') mtp accept=0 prop=223 top1=20373 accp=0.020 next=pair draft=301 prop=301 pred gate=device Token # 646: 111.873ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=0.999 next=draft=1703 prop=1703 olap pair=106.6ms serial=189.0ms gain=82.4ms ratio=0.44 s0=4.1ms s1=184.9ms wait=0.1/45.7ms pred gate=device Token # 647: 3.762ms; value: next_token_ids=tensor([1703], device='cuda:0') mtp accept=1 prop=1703 top1=1703 accp=0.994 next=pair draft=996 prop=996 pred gate=device Token # 648: 112.252ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=draft=5262 prop=5262 olap pair=107.1ms serial=189.0ms gain=81.9ms ratio=0.43 s0=4.2ms s1=184.8ms wait=0.1/46.0ms pred gate=device Token # 649: 3.804ms; value: next_token_ids=tensor([5262], device='cuda:0') mtp accept=1 prop=5262 top1=5262 accp=0.999 next=pair draft=320 prop=320 pred gate=device Token # 650: 111.882ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.995 next=draft=13103 prop=13103 olap pair=106.7ms serial=189.3ms gain=82.5ms ratio=0.44 s0=4.4ms s1=184.9ms wait=0.1/45.1ms pred gate=device Token # 651: 3.746ms; value: next_token_ids=tensor([13103], device='cuda:0') mtp accept=1 prop=13103 top1=13103 accp=0.904 next=pair draft=19585 prop=19585 pred gate=device Token # 652: 112.127ms; value: next_token_ids=tensor([19585], device='cuda:0') mtp accept=1 prop=19585 top1=19585 accp=0.997 next=draft=223 prop=223 olap pair=107.0ms serial=189.9ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.4ms wait=0.1/45.2ms pred gate=device Token # 653: 3.685ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.915 next=pair draft=20 prop=20 pred gate=device Token # 654: 112.174ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=64 prop=64 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.1ms s1=185.4ms wait=0.1/45.9ms pred gate=device Token # 655: 3.702ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=1069 prop=1069 pred gate=device Token # 656: 112.029ms; value: next_token_ids=tensor([1069], device='cuda:0') mtp accept=1 prop=1069 top1=1069 accp=0.633 next=draft=940 prop=13 olap pair=106.9ms serial=189.6ms gain=82.8ms ratio=0.44 s0=3.8ms s1=185.9ms wait=0.1/46.3ms pred gate=device Token # 657: 3.792ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=0 prop=13 top1=940 accp=0.532 next=pair draft=223 prop=223 pred gate=device Token # 658: 113.216ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=19 prop=19 olap pair=107.9ms serial=190.7ms gain=82.7ms ratio=0.43 s0=3.8ms s1=186.9ms wait=0.1/46.2ms pred gate=device Token # 659: 3.753ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=1148 prop=1148 pred gate=device Token # 660: 112.199ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=0.964 next=draft=20 prop=20 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=3.8ms s1=186.0ms wait=0.1/46.4ms pred gate=device Token # 661: 3.809ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=0.999 next=pair draft=64 prop=64 pred gate=device Token # 662: 111.939ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=draft=1069 prop=1069 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=4.0ms s1=185.4ms wait=0.1/45.9ms pred gate=device Token # 663: 3.691ms; value: next_token_ids=tensor([1069], device='cuda:0') mtp accept=1 prop=1069 top1=1069 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 664: 112.185ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.881 next=draft=223 prop=223 olap pair=106.9ms serial=189.8ms gain=82.9ms ratio=0.44 s0=3.9ms s1=185.9ms wait=0.1/46.3ms pred gate=device Token # 665: 3.704ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=19204 prop=19204 pred gate=device Token # 666: 112.249ms; value: next_token_ids=tensor([19204], device='cuda:0') mtp accept=1 prop=19204 top1=19204 accp=0.954 next=draft=24798 prop=24798 olap pair=107.1ms serial=190.0ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.2ms wait=0.1/46.3ms pred gate=device Token # 667: 3.746ms; value: next_token_ids=tensor([24798], device='cuda:0') mtp accept=1 prop=24798 top1=24798 accp=1.000 next=pair draft=2111 prop=2111 pred gate=device Token # 668: 111.963ms; value: next_token_ids=tensor([2111], device='cuda:0') mtp accept=1 prop=2111 top1=2111 accp=1.000 next=draft=303 prop=303 olap pair=106.8ms serial=189.4ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.6ms wait=0.1/46.4ms pred gate=device Token # 669: 3.703ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=63760 prop=63760 pred gate=device Token # 670: 112.190ms; value: next_token_ids=tensor([63760], device='cuda:0') mtp accept=1 prop=63760 top1=1107 accp=0.474 next=draft=20 prop=20 olap pair=107.0ms serial=190.0ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.2ms wait=0.1/46.4ms pred gate=device Token # 671: 3.707ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=0 prop=20 top1=21 accp=0.000 next=pair draft=1148 prop=1148 pred gate=device Token # 672: 112.144ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=0.989 next=draft=422 prop=20 olap pair=106.9ms serial=189.8ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.0ms wait=0.1/46.4ms pred gate=device Token # 673: 3.703ms; value: next_token_ids=tensor([20198], device='cuda:0') mtp accept=0 prop=20 top1=422 accp=0.898 next=pair draft=478 prop=478 pred gate=device Token # 674: 112.075ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.813 next=draft=33518 prop=33518 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.7ms wait=0.1/46.3ms pred gate=device Token # 675: 3.667ms; value: next_token_ids=tensor([14785], device='cuda:0') mtp accept=0 prop=33518 top1=14785 accp=0.321 next=pair draft=4339 prop=5640 pred gate=device Token # 676: 112.044ms; value: next_token_ids=tensor([1139], device='cuda:0') mtp accept=0 prop=5640 top1=100153 accp=0.129 next=draft=1356 prop=1356 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.5ms wait=0.1/46.3ms pred gate=device Token # 677: 112.255ms; value: next_token_ids=tensor([1356], device='cuda:0') mtp accept=1 prop=1356 top1=1356 accp=1.000 next=draft=21274 prop=21274 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.8ms wait=0.1/46.3ms pred gate=device Token # 678: 3.713ms; value: next_token_ids=tensor([21274], device='cuda:0') mtp accept=1 prop=21274 top1=21274 accp=1.000 next=pair draft=320 prop=313 pred gate=device Token # 679: 112.344ms; value: next_token_ids=tensor([313], device='cuda:0') mtp accept=1 prop=313 top1=313 accp=0.292 next=draft=64 prop=64 olap pair=107.2ms serial=190.1ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.2ms wait=0.1/46.3ms pred gate=device Token # 680: 3.697ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 681: 112.115ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=13 prop=13 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.7ms wait=0.1/46.4ms pred gate=device Token # 682: 3.809ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 683: 111.814ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=320 prop=320 olap pair=106.6ms serial=189.1ms gain=82.5ms ratio=0.44 s0=3.9ms s1=185.3ms wait=0.1/46.3ms pred gate=device Token # 684: 3.731ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.990 next=pair draft=2204 prop=2204 pred gate=device Token # 685: 112.510ms; value: next_token_ids=tensor([3660], device='cuda:0') mtp accept=0 prop=2204 top1=3660 accp=0.554 next=draft=313 prop=313 olap pair=107.3ms serial=189.7ms gain=82.4ms ratio=0.43 s0=4.0ms s1=185.7ms wait=0.1/46.2ms pred gate=device Token # 686: 112.286ms; value: next_token_ids=tensor([97100], device='cuda:0') mtp accept=0 prop=313 top1=97100 accp=0.019 next=draft=313 prop=313 olap pair=107.0ms serial=189.9ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.1ms wait=0.1/46.5ms pred gate=device Token # 687: 112.256ms; value: next_token_ids=tensor([313], device='cuda:0') mtp accept=1 prop=313 top1=313 accp=1.000 next=draft=303 prop=303 olap pair=106.9ms serial=189.7ms gain=82.8ms ratio=0.44 s0=3.9ms s1=185.8ms wait=0.1/46.2ms pred gate=device Token # 688: 3.700ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=80 prop=80 pred gate=device Token # 689: 112.307ms; value: next_token_ids=tensor([80], device='cuda:0') mtp accept=1 prop=80 top1=80 accp=1.000 next=draft=64 prop=64 olap pair=107.0ms serial=189.9ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.1ms wait=0.1/46.3ms pred gate=device Token # 690: 3.736ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 691: 112.271ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=13 prop=13 olap pair=107.1ms serial=190.1ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.3ms wait=0.1/46.4ms pred gate=device Token # 692: 3.782ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 693: 111.945ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=223 prop=223 olap pair=106.7ms serial=189.4ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.5ms wait=0.1/46.2ms pred gate=device Token # 694: 3.695ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=389 prop=389 pred gate=device Token # 695: 111.927ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=0.979 next=draft=100791 prop=100791 olap pair=106.7ms serial=189.3ms gain=82.5ms ratio=0.44 s0=4.3ms s1=185.0ms wait=0.1/45.5ms pred gate=device Token # 696: 3.715ms; value: next_token_ids=tensor([100791], device='cuda:0') mtp accept=1 prop=100791 top1=100791 accp=0.963 next=pair draft=303 prop=303 pred gate=device Token # 697: 111.833ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.759 next=draft=2431 prop=113008 olap pair=106.7ms serial=189.1ms gain=82.5ms ratio=0.44 s0=4.3ms s1=184.8ms wait=0.1/45.5ms pred gate=device Token # 698: 3.693ms; value: next_token_ids=tensor([7242], device='cuda:0') mtp accept=0 prop=113008 top1=2431 accp=0.805 next=pair draft=2431 prop=2431 pred gate=device Token # 699: 114.380ms; value: next_token_ids=tensor([2431], device='cuda:0') mtp accept=1 prop=2431 top1=113008 accp=0.445 next=draft=15120 prop=15120 olap pair=109.2ms serial=193.4ms gain=84.2ms ratio=0.44 s0=4.7ms s1=188.7ms wait=0.1/44.9ms pred gate=device Token # 700: 3.697ms; value: next_token_ids=tensor([29372], device='cuda:0') mtp accept=0 prop=15120 top1=29372 accp=0.257 next=pair draft=1703 prop=1703 pred gate=device Token # 701: 112.321ms; value: next_token_ids=tensor([70889], device='cuda:0') mtp accept=0 prop=1703 top1=70889 accp=0.001 next=draft=12114 prop=12114 olap pair=107.1ms serial=189.7ms gain=82.6ms ratio=0.44 s0=4.5ms s1=185.3ms wait=0.1/45.2ms pred gate=device Token # 702: 113.239ms; value: next_token_ids=tensor([1139], device='cuda:0') mtp accept=0 prop=12114 top1=28134 accp=0.106 next=draft=1356 prop=1356 olap pair=107.9ms serial=190.0ms gain=82.1ms ratio=0.43 s0=5.7ms s1=184.3ms wait=0.2/44.0ms pred gate=device Token # 703: 112.355ms; value: next_token_ids=tensor([1356], device='cuda:0') mtp accept=1 prop=1356 top1=1356 accp=1.000 next=draft=21274 prop=21274 olap pair=107.1ms serial=189.4ms gain=82.3ms ratio=0.43 s0=3.9ms s1=185.4ms wait=0.1/46.2ms pred gate=device Token # 704: 3.739ms; value: next_token_ids=tensor([21274], device='cuda:0') mtp accept=1 prop=21274 top1=21274 accp=1.000 next=pair draft=1148 prop=1148 pred gate=device Token # 705: 112.176ms; value: next_token_ids=tensor([3467], device='cuda:0') mtp accept=0 prop=1148 top1=3467 accp=0.445 next=draft=1148 prop=1148 olap pair=106.9ms serial=189.7ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.8ms wait=0.1/46.2ms pred gate=device Token # 706: 112.186ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=1.000 next=draft=445 prop=445 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.7ms wait=0.1/46.3ms pred gate=device Token # 707: 3.865ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=1 prop=445 top1=445 accp=0.974 next=pair draft=82817 prop=82817 pred gate=device Token # 708: 112.143ms; value: next_token_ids=tensor([27342], device='cuda:0') mtp accept=0 prop=82817 top1=27342 accp=0.004 next=draft=19845 prop=525 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.0ms s1=185.5ms wait=0.1/46.1ms pred gate=device Token # 709: 112.089ms; value: next_token_ids=tensor([525], device='cuda:0') mtp accept=1 prop=525 top1=525 accp=0.475 next=draft=303 prop=303 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.5ms wait=0.1/46.1ms pred gate=device Token # 710: 3.722ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=80 prop=80 pred gate=device Token # 711: 112.195ms; value: next_token_ids=tensor([80], device='cuda:0') mtp accept=1 prop=80 top1=80 accp=1.000 next=draft=64 prop=64 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.8ms wait=0.1/46.3ms pred gate=device Token # 712: 3.714ms; value: next_token_ids=tensor([64], device='cuda:0') mtp accept=1 prop=64 top1=64 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 713: 112.240ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=13 prop=13 olap pair=107.0ms serial=189.9ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.0ms wait=0.1/46.4ms pred gate=device Token # 714: 3.834ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 715: 111.933ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=223 prop=223 olap pair=106.7ms serial=189.0ms gain=82.3ms ratio=0.44 s0=4.3ms s1=184.7ms wait=0.1/45.5ms pred gate=device Token # 716: 3.701ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=69367 prop=69367 pred gate=device Token # 717: 112.550ms; value: next_token_ids=tensor([69367], device='cuda:0') mtp accept=1 prop=69367 top1=2916 accp=0.882 next=draft=3021 prop=3021 olap pair=107.3ms serial=190.1ms gain=82.9ms ratio=0.44 s0=4.0ms s1=186.1ms wait=0.1/46.0ms pred gate=device Token # 718: 3.775ms; value: next_token_ids=tensor([3021], device='cuda:0') mtp accept=1 prop=3021 top1=3021 accp=0.984 next=pair draft=301 prop=301 pred gate=device Token # 719: 111.957ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=1.000 next=draft=303 prop=303 olap pair=106.8ms serial=189.2ms gain=82.4ms ratio=0.44 s0=4.3ms s1=184.9ms wait=0.1/45.5ms pred gate=device Token # 720: 3.787ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=30775 prop=30775 pred gate=device Token # 721: 112.357ms; value: next_token_ids=tensor([30775], device='cuda:0') mtp accept=1 prop=30775 top1=30775 accp=0.998 next=draft=1877 prop=389 olap pair=107.1ms serial=189.7ms gain=82.6ms ratio=0.44 s0=4.3ms s1=185.4ms wait=0.1/45.4ms pred gate=device Token # 722: 3.740ms; value: next_token_ids=tensor([313], device='cuda:0') mtp accept=0 prop=389 top1=313 accp=0.016 next=pair draft=223 prop=223 pred gate=device Token # 723: 111.929ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.642 next=draft=389 prop=389 olap pair=106.6ms serial=188.9ms gain=82.2ms ratio=0.44 s0=4.4ms s1=184.5ms wait=0.1/45.4ms pred gate=device Token # 724: 3.717ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=0.612 next=pair draft=21826 prop=21826 pred gate=device Token # 725: 112.006ms; value: next_token_ids=tensor([13479], device='cuda:0') mtp accept=0 prop=21826 top1=13479 accp=0.242 next=draft=9184 prop=9184 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.0ms s1=185.4ms wait=0.1/46.0ms pred gate=device Token # 726: 111.948ms; value: next_token_ids=tensor([21539], device='cuda:0') mtp accept=0 prop=9184 top1=21539 accp=0.392 next=draft=2022 prop=2022 olap pair=106.7ms serial=189.0ms gain=82.4ms ratio=0.44 s0=4.3ms s1=184.7ms wait=0.1/45.3ms pred gate=device Token # 727: 112.286ms; value: next_token_ids=tensor([2022], device='cuda:0') mtp accept=1 prop=2022 top1=2022 accp=0.947 next=draft=320 prop=320 olap pair=106.9ms serial=189.5ms gain=82.5ms ratio=0.44 s0=4.9ms s1=184.5ms wait=0.1/45.0ms pred gate=device Token # 728: 3.887ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.994 next=pair draft=1207 prop=1207 pred gate=device Token # 729: 112.077ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=1 prop=1207 top1=1207 accp=0.855 next=draft=18467 prop=18467 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.8ms s1=184.7ms wait=0.1/45.0ms pred gate=device Token # 730: 3.685ms; value: next_token_ids=tensor([39932], device='cuda:0') mtp accept=0 prop=18467 top1=39932 accp=0.041 next=pair draft=5640 prop=5640 pred gate=device Token # 731: 111.955ms; value: next_token_ids=tensor([5640], device='cuda:0') mtp accept=1 prop=5640 top1=5640 accp=0.988 next=draft=1877 prop=1877 olap pair=106.7ms serial=189.3ms gain=82.6ms ratio=0.44 s0=4.4ms s1=184.9ms wait=0.1/45.1ms pred gate=device Token # 732: 3.713ms; value: next_token_ids=tensor([32712], device='cuda:0') mtp accept=0 prop=1877 top1=1877 accp=0.673 next=pair draft=1703 prop=1703 pred gate=device Token # 733: 112.092ms; value: next_token_ids=tensor([1703], device='cuda:0') mtp accept=1 prop=1703 top1=1703 accp=0.606 next=draft=66005 prop=66005 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.2ms wait=0.1/45.2ms pred gate=device Token # 734: 3.688ms; value: next_token_ids=tensor([66005], device='cuda:0') mtp accept=1 prop=66005 top1=66005 accp=0.957 next=pair draft=478 prop=478 pred gate=device Token # 735: 112.152ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.999 next=draft=18467 prop=18467 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.4ms wait=0.1/45.2ms pred gate=device Token # 736: 3.695ms; value: next_token_ids=tensor([18467], device='cuda:0') mtp accept=1 prop=18467 top1=18467 accp=0.971 next=pair draft=16992 prop=16992 pred gate=device Token # 737: 111.657ms; value: next_token_ids=tensor([16992], device='cuda:0') mtp accept=1 prop=16992 top1=16992 accp=0.668 next=draft=5640 prop=5640 olap pair=106.5ms serial=188.9ms gain=82.4ms ratio=0.44 s0=4.4ms s1=184.6ms wait=0.1/45.2ms pred gate=device Token # 738: 3.683ms; value: next_token_ids=tensor([5640], device='cuda:0') mtp accept=1 prop=5640 top1=5640 accp=0.634 next=pair draft=313 prop=280 pred gate=device Token # 739: 112.068ms; value: next_token_ids=tensor([844], device='cuda:0') mtp accept=0 prop=280 top1=844 accp=0.081 next=draft=1703 prop=1703 olap pair=106.9ms serial=189.7ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.3ms wait=0.1/45.2ms pred gate=device Token # 740: 111.950ms; value: next_token_ids=tensor([1703], device='cuda:0') mtp accept=1 prop=1703 top1=1703 accp=0.992 next=draft=996 prop=996 olap pair=106.7ms serial=189.2ms gain=82.6ms ratio=0.44 s0=4.4ms s1=184.9ms wait=0.1/45.3ms pred gate=device Token # 741: 3.722ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=0.991 next=pair draft=320 prop=320 pred gate=device Token # 742: 112.302ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=478 accp=0.456 next=draft=14785 prop=14785 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.1ms wait=0.1/46.2ms pred gate=device Token # 743: 3.676ms; value: next_token_ids=tensor([1255], device='cuda:0') mtp accept=0 prop=14785 top1=1255 accp=0.327 next=pair draft=280 prop=280 pred gate=device Token # 744: 112.153ms; value: next_token_ids=tensor([280], device='cuda:0') mtp accept=1 prop=280 top1=280 accp=0.598 next=draft=31 prop=31 olap pair=106.9ms serial=189.8ms gain=82.8ms ratio=0.44 s0=3.9ms s1=185.8ms wait=0.1/46.2ms pred gate=device Token # 745: 3.781ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=21 prop=21 pred gate=device Token # 746: 112.267ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=0 prop=21 top1=25 accp=0.001 next=draft=223 prop=223 olap pair=107.0ms serial=189.6ms gain=82.6ms ratio=0.44 s0=4.5ms s1=185.1ms wait=0.1/45.2ms pred gate=device Token # 747: 112.190ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=3257 prop=3257 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=4.3ms s1=185.1ms wait=0.1/45.5ms pred gate=device Token # 748: 3.786ms; value: next_token_ids=tensor([3257], device='cuda:0') mtp accept=1 prop=3257 top1=3257 accp=1.000 next=pair draft=478 prop=478 pred gate=device Token # 749: 111.927ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.996 next=draft=82 prop=82 olap pair=106.7ms serial=189.4ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.6ms wait=0.1/46.2ms pred gate=device Token # 750: 3.719ms; value: next_token_ids=tensor([5640], device='cuda:0') mtp accept=0 prop=82 top1=5640 accp=0.012 next=pair draft=280 prop=280 pred gate=device Token # 751: 112.273ms; value: next_token_ids=tensor([58000], device='cuda:0') mtp accept=0 prop=280 top1=223 accp=0.425 next=draft=25 prop=25 olap pair=107.0ms serial=189.9ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.0ms wait=0.1/46.3ms pred gate=device Token # 752: 112.114ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=0.910 next=draft=768 prop=768 olap pair=106.8ms serial=189.6ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.7ms wait=0.1/46.2ms pred gate=device Token # 753: 3.733ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.491 next=pair draft=7163 prop=7163 pred gate=device Token # 754: 112.378ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.999 next=draft=27521 prop=27521 olap pair=107.1ms serial=189.5ms gain=82.4ms ratio=0.43 s0=4.0ms s1=185.5ms wait=0.1/46.1ms pred gate=device Token # 755: 3.785ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 756: 112.946ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=draft=1267 prop=1267 olap pair=106.8ms serial=188.4ms gain=81.6ms ratio=0.43 s0=7.8ms s1=180.5ms wait=0.2/42.0ms pred gate=device Token # 757: 4.643ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=0.997 next=pair draft=223 prop=223 pred gate=device Token # 758: 112.455ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=25 prop=25 olap pair=107.1ms serial=189.8ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.9ms wait=0.1/46.2ms pred gate=device Token # 759: 3.712ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 760: 111.856ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.938 next=draft=4339 prop=4339 olap pair=106.6ms serial=189.3ms gain=82.7ms ratio=0.44 s0=3.7ms s1=185.6ms wait=0.1/46.3ms pred gate=device Token # 761: 3.719ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=0.999 next=pair draft=223 prop=223 pred gate=device Token # 762: 111.970ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=0 prop=223 top1=223 accp=0.727 next=draft=7163 prop=7163 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.8ms wait=0.1/46.3ms pred gate=device Token # 763: 111.993ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=0 prop=7163 top1=25 accp=0.622 next=draft=982 prop=12 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=3.7ms s1=185.8ms wait=0.1/46.3ms pred gate=device Token # 764: 111.844ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=0.401 next=draft=10751 prop=10751 olap pair=106.6ms serial=189.2ms gain=82.6ms ratio=0.44 s0=3.7ms s1=185.5ms wait=0.1/46.6ms pred gate=device Token # 765: 3.697ms; value: next_token_ids=tensor([10751], device='cuda:0') mtp accept=1 prop=10751 top1=10751 accp=1.000 next=pair draft=31273 prop=31273 pred gate=device Token # 766: 111.895ms; value: next_token_ids=tensor([31774], device='cuda:0') mtp accept=0 prop=31273 top1=31774 accp=0.000 next=draft=3351 prop=3351 olap pair=106.7ms serial=189.5ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.7ms wait=0.1/46.3ms pred gate=device Token # 767: 111.949ms; value: next_token_ids=tensor([3351], device='cuda:0') mtp accept=1 prop=3351 top1=3351 accp=0.815 next=draft=438 prop=438 olap pair=106.6ms serial=189.3ms gain=82.6ms ratio=0.44 s0=3.8ms s1=185.5ms wait=0.1/46.4ms pred gate=device Token # 768: 3.820ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.994 next=pair draft=223 prop=223 pred gate=device Token # 769: 112.038ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.999 next=draft=7163 prop=7163 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=4.5ms s1=185.1ms wait=0.1/45.1ms pred gate=device Token # 770: 3.767ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 771: 112.110ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=25361 prop=25361 olap pair=107.0ms serial=189.0ms gain=82.0ms ratio=0.43 s0=4.0ms s1=184.9ms wait=0.1/46.3ms pred gate=device Token # 772: 3.716ms; value: next_token_ids=tensor([25361], device='cuda:0') mtp accept=1 prop=25361 top1=25361 accp=0.988 next=pair draft=1148 prop=1148 pred gate=device Token # 773: 112.433ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=0.992 next=draft=2524 prop=2524 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.5ms wait=0.1/45.2ms pred gate=device Token # 774: 3.722ms; value: next_token_ids=tensor([14149], device='cuda:0') mtp accept=0 prop=2524 top1=14149 accp=0.134 next=pair draft=303 prop=303 pred gate=device Token # 775: 112.003ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.997 next=draft=25 prop=25 olap pair=106.7ms serial=189.0ms gain=82.3ms ratio=0.44 s0=4.8ms s1=184.2ms wait=0.1/45.0ms pred gate=device Token # 776: 3.739ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=0.999 next=pair draft=12 prop=12 pred gate=device Token # 777: 111.983ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=0.999 next=draft=10751 prop=10751 olap pair=106.7ms serial=189.3ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.3ms wait=0.1/46.2ms pred gate=device Token # 778: 3.822ms; value: next_token_ids=tensor([10751], device='cuda:0') mtp accept=1 prop=10751 top1=10751 accp=1.000 next=pair draft=31774 prop=31774 pred gate=device Token # 779: 112.064ms; value: next_token_ids=tensor([31774], device='cuda:0') mtp accept=1 prop=31774 top1=31774 accp=1.000 next=draft=3351 prop=3351 olap pair=106.9ms serial=189.4ms gain=82.5ms ratio=0.44 s0=4.2ms s1=185.2ms wait=0.1/45.7ms pred gate=device Token # 780: 3.733ms; value: next_token_ids=tensor([3351], device='cuda:0') mtp accept=1 prop=3351 top1=3351 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 781: 112.192ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.999 next=draft=223 prop=223 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=4.0ms s1=185.8ms wait=0.1/46.2ms pred gate=device Token # 782: 3.737ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=25 prop=25 pred gate=device Token # 783: 112.505ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=0.910 next=draft=12 prop=12 olap pair=107.3ms serial=190.3ms gain=83.1ms ratio=0.44 s0=3.9ms s1=186.4ms wait=0.1/46.5ms pred gate=device Token # 784: 3.718ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=0.997 next=pair draft=10751 prop=10751 pred gate=device Token # 785: 112.030ms; value: next_token_ids=tensor([4980], device='cuda:0') mtp accept=0 prop=10751 top1=4980 accp=0.030 next=draft=1320 prop=1320 olap pair=106.8ms serial=189.1ms gain=82.3ms ratio=0.44 s0=4.9ms s1=184.2ms wait=0.1/45.0ms pred gate=device Token # 786: 112.330ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=draft=504 prop=504 olap pair=107.0ms serial=189.2ms gain=82.2ms ratio=0.43 s0=4.1ms s1=185.1ms wait=0.1/45.9ms pred gate=device Token # 787: 3.755ms; value: next_token_ids=tensor([504], device='cuda:0') mtp accept=1 prop=504 top1=504 accp=1.000 next=pair draft=438 prop=31 pred gate=device Token # 788: 112.448ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=0 prop=31 top1=438 accp=0.557 next=draft=223 prop=223 olap pair=107.2ms serial=189.9ms gain=82.7ms ratio=0.44 s0=3.8ms s1=186.0ms wait=0.1/46.4ms pred gate=device Token # 789: 112.409ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=6992 prop=6992 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.1ms wait=0.1/46.2ms pred gate=device Token # 790: 3.714ms; value: next_token_ids=tensor([6992], device='cuda:0') mtp accept=1 prop=6992 top1=6992 accp=1.000 next=pair draft=1320 prop=1320 pred gate=device Token # 791: 112.082ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=0.998 next=draft=1320 prop=1320 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=3.8ms s1=185.6ms wait=0.1/46.4ms pred gate=device Token # 792: 3.723ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 793: 112.527ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.616 next=draft=65388 prop=65388 olap pair=107.3ms serial=189.5ms gain=82.2ms ratio=0.43 s0=4.1ms s1=185.4ms wait=0.1/46.1ms pred gate=device Token # 794: 3.723ms; value: next_token_ids=tensor([65388], device='cuda:0') mtp accept=1 prop=65388 top1=65388 accp=0.999 next=pair draft=223 prop=223 pred gate=device Token # 795: 112.263ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=25 prop=25 olap pair=107.0ms serial=188.6ms gain=81.7ms ratio=0.43 s0=4.4ms s1=184.3ms wait=0.1/45.5ms pred gate=device Token # 796: 3.834ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=pair draft=12 prop=12 pred gate=device Token # 797: 112.044ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=draft=10543 prop=10543 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.3ms s1=185.2ms wait=0.1/45.4ms pred gate=device Token # 798: 3.779ms; value: next_token_ids=tensor([9146], device='cuda:0') mtp accept=0 prop=10543 top1=9146 accp=0.002 next=pair draft=3354 prop=3354 pred gate=device Token # 799: 112.068ms; value: next_token_ids=tensor([3354], device='cuda:0') mtp accept=1 prop=3354 top1=3354 accp=1.000 next=draft=438 prop=438 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=4.2ms s1=185.3ms wait=0.1/45.7ms pred gate=device Token # 800: 3.788ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.994 next=pair draft=223 prop=223 pred gate=device Token # 801: 112.018ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=10193 prop=10193 olap pair=106.8ms serial=189.6ms gain=82.8ms ratio=0.44 s0=3.9ms s1=185.7ms wait=0.1/46.3ms pred gate=device Token # 802: 3.708ms; value: next_token_ids=tensor([10193], device='cuda:0') mtp accept=1 prop=10193 top1=10193 accp=1.000 next=pair draft=15482 prop=15482 pred gate=device Token # 803: 112.214ms; value: next_token_ids=tensor([15482], device='cuda:0') mtp accept=1 prop=15482 top1=15482 accp=1.000 next=draft=303 prop=303 olap pair=107.0ms serial=189.9ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.0ms wait=0.1/46.2ms pred gate=device Token # 804: 3.788ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.999 next=pair draft=2636 prop=2636 pred gate=device Token # 805: 112.204ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.982 next=draft=223 prop=223 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.3ms wait=0.1/45.5ms pred gate=device Token # 806: 3.765ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=6992 prop=6992 pred gate=device Token # 807: 112.217ms; value: next_token_ids=tensor([6992], device='cuda:0') mtp accept=1 prop=6992 top1=6992 accp=1.000 next=draft=1320 prop=1320 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=4.1ms s1=185.7ms wait=0.1/46.0ms pred gate=device Token # 808: 3.765ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=pair draft=1320 prop=1320 pred gate=device Token # 809: 111.911ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=draft=565 prop=565 olap pair=106.7ms serial=189.3ms gain=82.6ms ratio=0.44 s0=3.8ms s1=185.5ms wait=0.1/46.4ms pred gate=device Token # 810: 3.795ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 811: 112.312ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=10193 prop=10193 olap pair=107.1ms serial=190.2ms gain=83.1ms ratio=0.44 s0=3.8ms s1=186.4ms wait=0.1/46.5ms pred gate=device Token # 812: 3.795ms; value: next_token_ids=tensor([10193], device='cuda:0') mtp accept=1 prop=10193 top1=10193 accp=1.000 next=pair draft=15482 prop=15482 pred gate=device Token # 813: 111.898ms; value: next_token_ids=tensor([15482], device='cuda:0') mtp accept=1 prop=15482 top1=15482 accp=1.000 next=draft=438 prop=438 olap pair=106.7ms serial=189.3ms gain=82.6ms ratio=0.44 s0=3.8ms s1=185.6ms wait=0.1/46.4ms pred gate=device Token # 814: 3.739ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 815: 112.019ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=7163 prop=7163 olap pair=106.8ms serial=189.6ms gain=82.7ms ratio=0.44 s0=3.7ms s1=185.8ms wait=0.1/46.4ms pred gate=device Token # 816: 3.724ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 817: 113.114ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=25361 prop=25361 olap pair=107.9ms serial=190.6ms gain=82.7ms ratio=0.43 s0=3.8ms s1=186.8ms wait=0.1/46.4ms pred gate=device Token # 818: 3.711ms; value: next_token_ids=tensor([25361], device='cuda:0') mtp accept=1 prop=25361 top1=25361 accp=1.000 next=pair draft=1148 prop=1148 pred gate=device Token # 819: 112.299ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=0 prop=1148 top1=320 accp=0.214 next=draft=2636 prop=2636 olap pair=107.1ms serial=190.1ms gain=83.0ms ratio=0.44 s0=3.7ms s1=186.3ms wait=0.1/46.4ms pred gate=device Token # 820: 112.091ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.949 next=draft=223 prop=223 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=4.4ms s1=184.9ms wait=0.1/45.5ms pred gate=device Token # 821: 3.797ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.969 next=pair draft=7163 prop=7163 pred gate=device Token # 822: 112.088ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.970 next=draft=27521 prop=27521 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.9ms s1=184.6ms wait=0.1/44.9ms pred gate=device Token # 823: 3.745ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 824: 112.034ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=0.972 next=draft=438 prop=438 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.1ms wait=0.1/45.3ms pred gate=device Token # 825: 3.865ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=0 prop=438 top1=565 accp=0.150 next=pair draft=223 prop=223 pred gate=device Token # 826: 112.245ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=7163 prop=7163 olap pair=107.0ms serial=189.9ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.6ms wait=0.1/45.3ms pred gate=device Token # 827: 3.753ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 828: 111.986ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=25361 prop=25361 olap pair=106.7ms serial=189.3ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.0ms wait=0.1/45.2ms pred gate=device Token # 829: 3.748ms; value: next_token_ids=tensor([25361], device='cuda:0') mtp accept=1 prop=25361 top1=25361 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 830: 112.010ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=106.7ms serial=189.3ms gain=82.6ms ratio=0.44 s0=4.3ms s1=185.0ms wait=0.1/45.3ms pred gate=device Token # 831: 3.787ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 832: 112.172ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=303 prop=303 olap pair=107.0ms serial=189.7ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.4ms wait=0.1/45.5ms pred gate=device Token # 833: 3.797ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.830 next=pair draft=2636 prop=2636 pred gate=device Token # 834: 111.962ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.988 next=draft=4169 prop=4169 olap pair=106.7ms serial=189.3ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.4ms wait=0.1/46.3ms pred gate=device Token # 835: 3.787ms; value: next_token_ids=tensor([4169], device='cuda:0') mtp accept=1 prop=4169 top1=4169 accp=1.000 next=pair draft=996 prop=40180 pred gate=device Token # 836: 111.922ms; value: next_token_ids=tensor([40180], device='cuda:0') mtp accept=1 prop=40180 top1=40180 accp=0.179 next=draft=20 prop=20 olap pair=106.7ms serial=189.1ms gain=82.4ms ratio=0.44 s0=4.7ms s1=184.4ms wait=0.1/45.0ms pred gate=device Token # 837: 3.758ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 838: 111.875ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.891 next=draft=2636 prop=2636 olap pair=106.6ms serial=188.9ms gain=82.3ms ratio=0.44 s0=4.9ms s1=184.0ms wait=0.1/44.6ms pred gate=device Token # 839: 3.739ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.983 next=pair draft=113008 prop=113008 pred gate=device Token # 840: 112.125ms; value: next_token_ids=tensor([113008], device='cuda:0') mtp accept=1 prop=113008 top1=113008 accp=0.998 next=draft=25 prop=25 olap pair=106.8ms serial=189.3ms gain=82.5ms ratio=0.44 s0=4.9ms s1=184.4ms wait=0.1/44.8ms pred gate=device Token # 841: 3.825ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=pair draft=124637 prop=124637 pred gate=device Token # 842: 112.183ms; value: next_token_ids=tensor([124637], device='cuda:0') mtp accept=1 prop=124637 top1=124637 accp=1.000 next=draft=478 prop=478 olap pair=107.0ms serial=189.9ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.0ms wait=0.1/46.4ms pred gate=device Token # 843: 3.795ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.999 next=pair draft=58000 prop=36081 pred gate=device Token # 844: 111.987ms; value: next_token_ids=tensor([36081], device='cuda:0') mtp accept=1 prop=36081 top1=36081 accp=0.202 next=draft=768 prop=768 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.8ms wait=0.1/46.4ms pred gate=device Token # 845: 3.805ms; value: next_token_ids=tensor([1703], device='cuda:0') mtp accept=0 prop=768 top1=1703 accp=0.014 next=pair draft=996 prop=996 pred gate=device Token # 846: 112.107ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=draft=768 prop=768 olap pair=106.8ms serial=189.0ms gain=82.2ms ratio=0.43 s0=4.0ms s1=185.0ms wait=0.1/46.0ms pred gate=device Token # 847: 3.760ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.999 next=pair draft=779 prop=779 pred gate=device Token # 848: 112.091ms; value: next_token_ids=tensor([779], device='cuda:0') mtp accept=1 prop=779 top1=779 accp=1.000 next=draft=320 prop=320 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=4.2ms s1=185.4ms wait=0.1/45.6ms pred gate=device Token # 849: 3.762ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 850: 112.193ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.965 next=draft=27521 prop=27521 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=3.8ms s1=186.0ms wait=0.1/46.4ms pred gate=device Token # 851: 3.911ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 852: 112.045ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=draft=1267 prop=1267 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.0ms wait=0.1/45.2ms pred gate=device Token # 853: 3.820ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=0.997 next=pair draft=223 prop=223 pred gate=device Token # 854: 112.245ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=779 prop=779 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.4ms wait=0.1/45.2ms pred gate=device Token # 855: 3.743ms; value: next_token_ids=tensor([779], device='cuda:0') mtp accept=1 prop=779 top1=779 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 856: 112.070ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.961 next=draft=18467 prop=18467 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.1ms wait=0.1/45.3ms pred gate=device Token # 857: 3.749ms; value: next_token_ids=tensor([18467], device='cuda:0') mtp accept=1 prop=18467 top1=18467 accp=0.810 next=pair draft=4339 prop=55180 pred gate=device Token # 858: 112.109ms; value: next_token_ids=tensor([55180], device='cuda:0') mtp accept=1 prop=55180 top1=55180 accp=0.792 next=draft=4339 prop=4339 olap pair=106.9ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.3ms wait=0.1/45.3ms pred gate=device Token # 859: 3.752ms; value: next_token_ids=tensor([91480], device='cuda:0') mtp accept=0 prop=4339 top1=91480 accp=0.012 next=pair draft=768 prop=768 pred gate=device Token # 860: 112.294ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.938 next=draft=19 prop=19 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.6ms wait=0.1/45.2ms pred gate=device Token # 861: 3.636ms; value: next_token_ids=tensor([8283], device='cuda:0') mtp accept=0 prop=19 top1=8283 accp=0.486 next=pair draft=548 prop=548 pred gate=device Token # 862: 112.829ms; value: next_token_ids=tensor([55180], device='cuda:0') mtp accept=0 prop=548 top1=55180 accp=0.044 next=draft=548 prop=548 olap pair=106.8ms serial=188.5ms gain=81.7ms ratio=0.43 s0=7.0ms s1=181.5ms wait=0.2/42.7ms pred gate=device Token # 863: 113.532ms; value: next_token_ids=tensor([19680], device='cuda:0') mtp accept=0 prop=548 top1=52759 accp=0.037 next=draft=768 prop=768 olap pair=107.3ms serial=188.0ms gain=80.7ms ratio=0.43 s0=8.8ms s1=179.2ms wait=0.2/40.7ms pred gate=device Token # 864: 112.685ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=0 prop=768 top1=548 accp=0.010 next=draft=768 prop=768 olap pair=107.2ms serial=189.3ms gain=82.1ms ratio=0.43 s0=5.0ms s1=184.3ms wait=0.1/44.9ms pred gate=device Token # 865: 112.233ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=56017 accp=0.580 next=draft=1255 prop=19 olap pair=107.0ms serial=189.1ms gain=82.1ms ratio=0.43 s0=4.0ms s1=185.0ms wait=0.1/46.1ms pred gate=device Token # 866: 3.773ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=0.430 next=pair draft=15 prop=15 pred gate=device Token # 867: 111.930ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=0.995 next=draft=18 prop=18 olap pair=106.7ms serial=189.3ms gain=82.6ms ratio=0.44 s0=4.1ms s1=185.3ms wait=0.1/45.9ms pred gate=device Token # 868: 3.745ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=0.999 next=pair draft=13 prop=13 pred gate=device Token # 869: 112.686ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=22 prop=22 olap pair=107.5ms serial=188.7ms gain=81.1ms ratio=0.43 s0=4.3ms s1=184.4ms wait=0.1/46.0ms pred gate=device Token # 870: 3.687ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=pair draft=15 prop=15 pred gate=device Token # 871: 112.780ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=223 prop=13 olap pair=107.7ms serial=189.1ms gain=81.4ms ratio=0.43 s0=4.2ms s1=184.9ms wait=0.1/46.0ms pred gate=device Token # 872: 3.683ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=0 prop=13 top1=26 accp=0.000 next=pair draft=13 prop=13 pred gate=device Token # 873: 112.742ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=23 prop=23 olap pair=107.6ms serial=188.8ms gain=81.2ms ratio=0.43 s0=4.4ms s1=184.4ms wait=0.1/45.9ms pred gate=device Token # 874: 3.649ms; value: next_token_ids=tensor([23], device='cuda:0') mtp accept=1 prop=23 top1=23 accp=1.000 next=pair draft=15 prop=15 pred gate=device Token # 875: 113.026ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=26 prop=1204 olap pair=107.8ms serial=189.2ms gain=81.4ms ratio=0.43 s0=4.3ms s1=184.9ms wait=0.1/46.0ms pred gate=device Token # 876: 3.788ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=0 prop=1204 top1=25 accp=0.100 next=pair draft=13 prop=13 pred gate=device Token # 877: 112.993ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=24 prop=24 olap pair=107.8ms serial=189.1ms gain=81.3ms ratio=0.43 s0=4.3ms s1=184.8ms wait=0.1/46.1ms pred gate=device Token # 878: 3.633ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=pair draft=15 prop=15 pred gate=device Token # 879: 112.635ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=24 prop=24 olap pair=107.5ms serial=188.7ms gain=81.2ms ratio=0.43 s0=4.2ms s1=184.5ms wait=0.1/46.1ms pred gate=device Token # 880: 3.726ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=0 prop=24 top1=18 accp=0.097 next=pair draft=13 prop=13 pred gate=device Token # 881: 112.962ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=19 prop=19 olap pair=107.7ms serial=189.1ms gain=81.4ms ratio=0.43 s0=4.3ms s1=184.8ms wait=0.1/46.1ms pred gate=device Token # 882: 3.675ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 883: 112.875ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.993 next=draft=343 prop=343 olap pair=107.6ms serial=189.0ms gain=81.4ms ratio=0.43 s0=4.3ms s1=184.7ms wait=0.1/46.0ms pred gate=device Token # 884: 3.765ms; value: next_token_ids=tensor([343], device='cuda:0') mtp accept=1 prop=343 top1=343 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 885: 112.114ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=13 prop=13 olap pair=107.0ms serial=189.4ms gain=82.4ms ratio=0.44 s0=3.9ms s1=185.5ms wait=0.1/46.4ms pred gate=device Token # 886: 3.763ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=0.999 next=pair draft=18 prop=18 pred gate=device Token # 887: 112.132ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=0 prop=18 top1=22 accp=0.116 next=draft=13 prop=13 olap pair=107.0ms serial=189.7ms gain=82.8ms ratio=0.44 s0=3.8ms s1=185.9ms wait=0.1/46.5ms pred gate=device Token # 888: 112.217ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=23 prop=23 olap pair=107.0ms serial=189.6ms gain=82.7ms ratio=0.44 s0=4.0ms s1=185.6ms wait=0.1/46.1ms pred gate=device Token # 889: 3.675ms; value: next_token_ids=tensor([23], device='cuda:0') mtp accept=1 prop=23 top1=23 accp=1.000 next=pair draft=13 prop=13 pred gate=device Token # 890: 111.951ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=24 prop=24 olap pair=106.8ms serial=189.3ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.4ms wait=0.1/46.5ms pred gate=device Token # 891: 3.718ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=pair draft=13 prop=13 pred gate=device Token # 892: 112.287ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=19 prop=19 olap pair=107.0ms serial=190.0ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.1ms wait=0.1/46.4ms pred gate=device Token # 893: 3.725ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=11 prop=11 pred gate=device Token # 894: 112.249ms; value: next_token_ids=tensor([11], device='cuda:0') mtp accept=1 prop=11 top1=11 accp=1.000 next=draft=565 prop=565 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.8ms wait=0.1/46.3ms pred gate=device Token # 895: 3.764ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=1.000 next=pair draft=343 prop=343 pred gate=device Token # 896: 111.916ms; value: next_token_ids=tensor([343], device='cuda:0') mtp accept=1 prop=343 top1=343 accp=1.000 next=draft=18 prop=18 olap pair=106.7ms serial=189.2ms gain=82.5ms ratio=0.44 s0=3.8ms s1=185.3ms wait=0.1/46.4ms pred gate=device Token # 897: 3.789ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=pair draft=13 prop=13 pred gate=device Token # 898: 112.436ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=26 prop=26 olap pair=107.2ms serial=190.2ms gain=83.0ms ratio=0.44 s0=4.0ms s1=186.2ms wait=0.1/46.1ms pred gate=device Token # 899: 3.771ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=pair draft=13 prop=13 pred gate=device Token # 900: 112.029ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=25 prop=25 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.7ms wait=0.1/46.6ms pred gate=device Token # 901: 3.723ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=pair draft=13 prop=13 pred gate=device Token # 902: 112.173ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=18 prop=18 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.8ms wait=0.1/46.5ms pred gate=device Token # 903: 3.690ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=pair draft=11 prop=11 pred gate=device Token # 904: 112.633ms; value: next_token_ids=tensor([11], device='cuda:0') mtp accept=1 prop=11 top1=11 accp=1.000 next=draft=438 prop=438 olap pair=107.4ms serial=189.1ms gain=81.7ms ratio=0.43 s0=4.5ms s1=184.6ms wait=0.1/45.8ms pred gate=device Token # 905: 3.787ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 906: 112.809ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.981 next=draft=1002 prop=1002 olap pair=107.5ms serial=188.7ms gain=81.2ms ratio=0.43 s0=4.8ms s1=183.9ms wait=0.1/45.5ms pred gate=device Token # 907: 3.735ms; value: next_token_ids=tensor([1002], device='cuda:0') mtp accept=1 prop=1002 top1=1002 accp=1.000 next=pair draft=565 prop=565 pred gate=device Token # 908: 112.912ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=0.964 next=draft=223 prop=223 olap pair=106.8ms serial=188.8ms gain=82.0ms ratio=0.43 s0=7.2ms s1=181.6ms wait=0.2/42.7ms pred gate=device Token # 909: 4.295ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.999 next=pair draft=856 prop=856 pred gate=device Token # 910: 112.402ms; value: next_token_ids=tensor([856], device='cuda:0') mtp accept=1 prop=856 top1=856 accp=1.000 next=draft=438 prop=438 olap pair=107.2ms serial=189.2ms gain=82.0ms ratio=0.43 s0=7.7ms s1=181.5ms wait=0.2/42.1ms pred gate=device Token # 911: 3.803ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.998 next=pair draft=223 prop=223 pred gate=device Token # 912: 112.472ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=20 prop=20 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.2ms wait=0.1/46.4ms pred gate=device Token # 913: 3.752ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 914: 112.049ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.845 next=draft=2636 prop=2636 olap pair=106.8ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.0ms wait=0.1/45.4ms pred gate=device Token # 915: 3.793ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.992 next=pair draft=113008 prop=113008 pred gate=device Token # 916: 112.126ms; value: next_token_ids=tensor([4169], device='cuda:0') mtp accept=0 prop=113008 top1=4169 accp=0.027 next=draft=996 prop=996 olap pair=106.9ms serial=189.9ms gain=83.0ms ratio=0.44 s0=4.2ms s1=185.7ms wait=0.1/45.8ms pred gate=device Token # 917: 111.894ms; value: next_token_ids=tensor([40180], device='cuda:0') mtp accept=0 prop=996 top1=40180 accp=0.490 next=draft=20 prop=20 olap pair=106.6ms serial=189.1ms gain=82.5ms ratio=0.44 s0=3.8ms s1=185.3ms wait=0.1/46.4ms pred gate=device Token # 918: 113.136ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=1267 prop=1267 olap pair=107.0ms serial=189.6ms gain=82.5ms ratio=0.44 s0=5.4ms s1=184.2ms wait=0.1/44.5ms pred gate=device Token # 919: 4.547ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=0.945 next=pair draft=223 prop=223 pred gate=device Token # 920: 112.503ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=779 prop=779 olap pair=107.0ms serial=189.3ms gain=82.2ms ratio=0.43 s0=6.8ms s1=182.4ms wait=0.2/43.0ms pred gate=device Token # 921: 3.768ms; value: next_token_ids=tensor([779], device='cuda:0') mtp accept=1 prop=779 top1=779 accp=1.000 next=pair draft=1148 prop=1148 pred gate=device Token # 922: 112.214ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=0.996 next=draft=14149 prop=14149 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=3.8ms s1=186.1ms wait=0.1/46.6ms pred gate=device Token # 923: 3.761ms; value: next_token_ids=tensor([14149], device='cuda:0') mtp accept=1 prop=14149 top1=14149 accp=0.961 next=pair draft=303 prop=303 pred gate=device Token # 924: 111.931ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=2204 prop=2204 olap pair=106.7ms serial=189.3ms gain=82.5ms ratio=0.44 s0=3.8ms s1=185.5ms wait=0.1/46.6ms pred gate=device Token # 925: 3.732ms; value: next_token_ids=tensor([2204], device='cuda:0') mtp accept=1 prop=2204 top1=2204 accp=0.643 next=pair draft=8283 prop=8283 pred gate=device Token # 926: 112.901ms; value: next_token_ids=tensor([55180], device='cuda:0') mtp accept=0 prop=8283 top1=55180 accp=0.007 next=draft=548 prop=548 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.5ms s1=185.2ms wait=0.1/45.6ms pred gate=device Token # 927: 112.037ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=1 prop=548 top1=548 accp=1.000 next=draft=389 prop=389 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=4.3ms s1=185.1ms wait=0.1/45.5ms pred gate=device Token # 928: 3.736ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=1.000 next=pair draft=779 prop=779 pred gate=device Token # 929: 112.097ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=0 prop=779 top1=20 accp=0.003 next=draft=303 prop=303 olap pair=106.9ms serial=189.4ms gain=82.5ms ratio=0.44 s0=5.3ms s1=184.2ms wait=0.1/44.3ms pred gate=device Token # 930: 112.169ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=3996 prop=3996 olap pair=106.8ms serial=189.2ms gain=82.4ms ratio=0.44 s0=5.7ms s1=183.5ms wait=0.2/44.1ms pred gate=device Token # 931: 3.721ms; value: next_token_ids=tensor([3996], device='cuda:0') mtp accept=1 prop=3996 top1=3996 accp=0.999 next=pair draft=1267 prop=1532 pred gate=device Token # 932: 112.362ms; value: next_token_ids=tensor([1532], device='cuda:0') mtp accept=1 prop=1532 top1=1532 accp=0.447 next=draft=996 prop=996 olap pair=107.1ms serial=189.9ms gain=82.8ms ratio=0.44 s0=4.0ms s1=185.9ms wait=0.1/46.1ms pred gate=device Token # 933: 3.777ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=pair draft=1267 prop=1267 pred gate=device Token # 934: 113.619ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=0.999 next=draft=223 prop=223 olap pair=108.3ms serial=191.3ms gain=83.0ms ratio=0.43 s0=3.8ms s1=187.5ms wait=0.1/46.4ms pred gate=device Token # 935: 3.858ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=779 prop=779 pred gate=device Token # 936: 112.194ms; value: next_token_ids=tensor([779], device='cuda:0') mtp accept=1 prop=779 top1=779 accp=1.000 next=draft=223 prop=223 olap pair=107.0ms serial=189.8ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.1ms wait=0.1/46.6ms pred gate=device Token # 937: 3.719ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.999 next=pair draft=15120 prop=15120 pred gate=device Token # 938: 112.254ms; value: next_token_ids=tensor([15120], device='cuda:0') mtp accept=1 prop=15120 top1=15120 accp=0.970 next=draft=223 prop=223 olap pair=107.0ms serial=189.9ms gain=82.8ms ratio=0.44 s0=3.8ms s1=186.1ms wait=0.1/46.5ms pred gate=device Token # 939: 3.790ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.633 next=pair draft=20 prop=20 pred gate=device Token # 940: 112.610ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=1267 prop=1267 olap pair=107.3ms serial=189.7ms gain=82.4ms ratio=0.43 s0=4.3ms s1=185.4ms wait=0.1/45.8ms pred gate=device Token # 941: 3.733ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=0.999 next=pair draft=223 prop=223 pred gate=device Token # 942: 112.470ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=779 prop=779 olap pair=107.1ms serial=190.1ms gain=83.0ms ratio=0.44 s0=4.4ms s1=185.7ms wait=0.1/45.4ms pred gate=device Token # 943: 3.719ms; value: next_token_ids=tensor([779], device='cuda:0') mtp accept=1 prop=779 top1=779 accp=1.000 next=pair draft=320 prop=1148 pred gate=device Token # 944: 112.362ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=0 prop=1148 top1=320 accp=0.660 next=draft=2636 prop=2524 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.6ms wait=0.1/45.4ms pred gate=device Token # 945: 112.168ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=0 prop=2524 top1=2636 accp=0.788 next=draft=113008 prop=113008 olap pair=106.8ms serial=189.6ms gain=82.7ms ratio=0.44 s0=4.3ms s1=185.3ms wait=0.1/45.4ms pred gate=device Token # 946: 112.717ms; value: next_token_ids=tensor([113008], device='cuda:0') mtp accept=1 prop=113008 top1=113008 accp=1.000 next=draft=779 prop=779 olap pair=107.4ms serial=189.3ms gain=81.9ms ratio=0.43 s0=4.3ms s1=185.1ms wait=0.1/45.9ms pred gate=device Token # 947: 3.802ms; value: next_token_ids=tensor([779], device='cuda:0') mtp accept=1 prop=779 top1=779 accp=1.000 next=pair draft=124637 prop=124637 pred gate=device Token # 948: 112.915ms; value: next_token_ids=tensor([124637], device='cuda:0') mtp accept=1 prop=124637 top1=124637 accp=1.000 next=draft=478 prop=478 olap pair=107.7ms serial=189.2ms gain=81.4ms ratio=0.43 s0=4.3ms s1=184.8ms wait=0.1/46.2ms pred gate=device Token # 949: 3.672ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.957 next=pair draft=36081 prop=36081 pred gate=device Token # 950: 112.213ms; value: next_token_ids=tensor([907], device='cuda:0') mtp accept=0 prop=36081 top1=907 accp=0.097 next=draft=768 prop=768 olap pair=107.0ms serial=189.4ms gain=82.4ms ratio=0.44 s0=3.9ms s1=185.5ms wait=0.1/46.3ms pred gate=device Token # 951: 112.042ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=7163 prop=7163 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.0ms wait=0.1/45.4ms pred gate=device Token # 952: 3.714ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.907 next=pair draft=27521 prop=27521 pred gate=device Token # 953: 112.316ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=107.0ms serial=189.9ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.5ms wait=0.1/45.4ms pred gate=device Token # 954: 3.767ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=1267 prop=1267 pred gate=device Token # 955: 112.438ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=draft=223 prop=223 olap pair=107.1ms serial=189.9ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.4ms wait=0.1/45.2ms pred gate=device Token # 956: 3.832ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=907 prop=907 pred gate=device Token # 957: 112.417ms; value: next_token_ids=tensor([907], device='cuda:0') mtp accept=1 prop=907 top1=907 accp=1.000 next=draft=320 prop=320 olap pair=107.2ms serial=190.3ms gain=83.0ms ratio=0.44 s0=4.4ms s1=185.8ms wait=0.1/45.3ms pred gate=device Token # 958: 3.786ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.999 next=pair draft=4339 prop=4339 pred gate=device Token # 959: 112.056ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=0.987 next=draft=768 prop=768 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.8ms s1=184.7ms wait=0.1/44.7ms pred gate=device Token # 960: 3.821ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.999 next=pair draft=907 prop=907 pred gate=device Token # 961: 113.322ms; value: next_token_ids=tensor([907], device='cuda:0') mtp accept=1 prop=907 top1=907 accp=0.999 next=draft=982 prop=982 olap pair=108.1ms serial=190.9ms gain=82.9ms ratio=0.43 s0=4.2ms s1=186.7ms wait=0.1/45.9ms pred gate=device Token # 962: 3.770ms; value: next_token_ids=tensor([982], device='cuda:0') mtp accept=1 prop=982 top1=982 accp=0.974 next=pair draft=223 prop=223 pred gate=device Token # 963: 111.929ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=28753 prop=28753 olap pair=106.7ms serial=189.2ms gain=82.5ms ratio=0.44 s0=4.1ms s1=185.1ms wait=0.1/45.9ms pred gate=device Token # 964: 3.693ms; value: next_token_ids=tensor([28753], device='cuda:0') mtp accept=1 prop=28753 top1=28753 accp=1.000 next=pair draft=25361 prop=25361 pred gate=device Token # 965: 111.737ms; value: next_token_ids=tensor([28602], device='cuda:0') mtp accept=0 prop=25361 top1=28602 accp=0.338 next=draft=18 prop=18 olap pair=106.6ms serial=189.1ms gain=82.5ms ratio=0.44 s0=3.8ms s1=185.3ms wait=0.1/46.5ms pred gate=device Token # 966: 111.815ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=0.847 next=draft=438 prop=438 olap pair=106.6ms serial=189.2ms gain=82.6ms ratio=0.44 s0=3.8ms s1=185.4ms wait=0.1/46.4ms pred gate=device Token # 967: 3.753ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.978 next=pair draft=1148 prop=1148 pred gate=device Token # 968: 112.173ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=0.952 next=draft=907 prop=907 olap pair=106.9ms serial=189.8ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.1ms wait=0.1/46.5ms pred gate=device Token # 969: 3.704ms; value: next_token_ids=tensor([13103], device='cuda:0') mtp accept=0 prop=907 top1=907 accp=0.490 next=pair draft=18467 prop=18467 pred gate=device Token # 970: 111.690ms; value: next_token_ids=tensor([49391], device='cuda:0') mtp accept=0 prop=18467 top1=2541 accp=0.036 next=draft=768 prop=768 olap pair=106.5ms serial=188.9ms gain=82.4ms ratio=0.44 s0=3.8ms s1=185.1ms wait=0.1/46.4ms pred gate=device Token # 971: 112.048ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.998 next=draft=907 prop=58000 olap pair=106.8ms serial=189.4ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.7ms wait=0.1/46.7ms pred gate=device Token # 972: 3.686ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=0 prop=58000 top1=907 accp=0.871 next=pair draft=27521 prop=12 pred gate=device Token # 973: 112.148ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=0 prop=12 top1=27521 accp=0.509 next=draft=19698 prop=19698 olap pair=106.9ms serial=189.8ms gain=82.9ms ratio=0.44 s0=3.7ms s1=186.1ms wait=0.1/46.5ms pred gate=device Token # 974: 112.215ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=0.982 next=draft=24106 prop=24106 olap pair=106.9ms serial=189.7ms gain=82.9ms ratio=0.44 s0=3.7ms s1=186.0ms wait=0.1/46.5ms pred gate=device Token # 975: 3.746ms; value: next_token_ids=tensor([24106], device='cuda:0') mtp accept=1 prop=24106 top1=24106 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 976: 112.099ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=907 prop=907 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.7ms wait=0.1/46.5ms pred gate=device Token # 977: 3.677ms; value: next_token_ids=tensor([907], device='cuda:0') mtp accept=1 prop=907 top1=907 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 978: 112.071ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.999 next=draft=907 prop=907 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=4.0ms s1=185.4ms wait=0.1/46.3ms pred gate=device Token # 979: 3.781ms; value: next_token_ids=tensor([907], device='cuda:0') mtp accept=1 prop=907 top1=907 accp=1.000 next=pair draft=12 prop=12 pred gate=device Token # 980: 111.913ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=0.963 next=draft=28753 prop=28753 olap pair=106.7ms serial=189.5ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.6ms wait=0.1/46.4ms pred gate=device Token # 981: 3.763ms; value: next_token_ids=tensor([28753], device='cuda:0') mtp accept=1 prop=28753 top1=28753 accp=0.964 next=pair draft=28602 prop=28602 pred gate=device Token # 982: 111.732ms; value: next_token_ids=tensor([28602], device='cuda:0') mtp accept=1 prop=28602 top1=28602 accp=0.981 next=draft=18 prop=18 olap pair=106.6ms serial=189.1ms gain=82.5ms ratio=0.44 s0=3.9ms s1=185.3ms wait=0.1/46.4ms pred gate=device Token # 983: 3.792ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 984: 112.079ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=106.8ms serial=189.6ms gain=82.8ms ratio=0.44 s0=3.8ms s1=185.8ms wait=0.1/46.5ms pred gate=device Token # 985: 3.739ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.978 next=pair draft=907 prop=907 pred gate=device Token # 986: 112.047ms; value: next_token_ids=tensor([907], device='cuda:0') mtp accept=1 prop=907 top1=907 accp=0.998 next=draft=12 prop=12 olap pair=106.8ms serial=189.6ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.8ms wait=0.1/46.5ms pred gate=device Token # 987: 3.778ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=6884 prop=6884 pred gate=device Token # 988: 112.016ms; value: next_token_ids=tensor([6884], device='cuda:0') mtp accept=1 prop=6884 top1=6884 accp=0.950 next=draft=1320 prop=1320 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.7ms wait=0.1/46.4ms pred gate=device Token # 989: 3.733ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=pair draft=18 prop=18 pred gate=device Token # 990: 112.075ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=31 prop=31 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.8ms wait=0.1/46.4ms pred gate=device Token # 991: 3.771ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=0.999 next=pair draft=7163 prop=7163 pred gate=device Token # 992: 112.104ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=1320 prop=1320 olap pair=106.9ms serial=189.5ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.7ms wait=0.1/46.4ms pred gate=device Token # 993: 3.743ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=0.999 next=pair draft=1320 prop=1320 pred gate=device Token # 994: 112.060ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=draft=303 prop=303 olap pair=106.9ms serial=189.6ms gain=82.8ms ratio=0.44 s0=3.8ms s1=185.8ms wait=0.1/46.4ms pred gate=device Token # 995: 3.762ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.998 next=pair draft=907 prop=907 pred gate=device Token # 996: 112.212ms; value: next_token_ids=tensor([907], device='cuda:0') mtp accept=1 prop=907 top1=907 accp=1.000 next=draft=12 prop=12 olap pair=107.0ms serial=189.6ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.8ms wait=0.1/46.5ms pred gate=device Token # 997: 3.680ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=29666 prop=29666 pred gate=device Token # 998: 112.092ms; value: next_token_ids=tensor([29666], device='cuda:0') mtp accept=1 prop=29666 top1=29666 accp=1.000 next=draft=2122 prop=2122 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=3.8ms s1=185.6ms wait=0.1/46.6ms pred gate=device Token # 999: 3.764ms; value: next_token_ids=tensor([2122], device='cuda:0') mtp accept=1 prop=2122 top1=2122 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1000: 111.892ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=0.998 next=draft=27521 prop=27521 olap pair=106.7ms serial=189.1ms gain=82.4ms ratio=0.44 s0=4.0ms s1=185.1ms wait=0.1/46.1ms pred gate=device Token # 1001: 3.742ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=0.898 next=pair draft=21484 prop=21484 pred gate=device Token # 1002: 111.944ms; value: next_token_ids=tensor([21484], device='cuda:0') mtp accept=1 prop=21484 top1=21484 accp=0.996 next=draft=303 prop=303 olap pair=106.7ms serial=189.3ms gain=82.6ms ratio=0.44 s0=3.8ms s1=185.5ms wait=0.1/46.5ms pred gate=device Token # 1003: 3.759ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.942 next=pair draft=77230 prop=77230 pred gate=device Token # 1004: 112.017ms; value: next_token_ids=tensor([77230], device='cuda:0') mtp accept=1 prop=77230 top1=77230 accp=0.999 next=draft=223 prop=223 olap pair=106.8ms serial=189.6ms gain=82.8ms ratio=0.44 s0=3.9ms s1=185.7ms wait=0.1/46.5ms pred gate=device Token # 1005: 3.770ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=31 accp=0.546 next=pair draft=7163 prop=7163 pred gate=device Token # 1006: 111.896ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.7ms wait=0.1/46.6ms pred gate=device Token # 1007: 3.760ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=1320 accp=0.196 next=pair draft=21484 prop=21484 pred gate=device Token # 1008: 112.006ms; value: next_token_ids=tensor([21484], device='cuda:0') mtp accept=1 prop=21484 top1=21484 accp=1.000 next=draft=303 prop=303 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.7ms wait=0.1/46.4ms pred gate=device Token # 1009: 3.712ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.985 next=pair draft=94073 prop=94073 pred gate=device Token # 1010: 112.000ms; value: next_token_ids=tensor([94073], device='cuda:0') mtp accept=1 prop=94073 top1=94073 accp=0.941 next=draft=320 prop=320 olap pair=106.7ms serial=189.2ms gain=82.5ms ratio=0.44 s0=3.9ms s1=185.3ms wait=0.1/46.3ms pred gate=device Token # 1011: 3.723ms; value: next_token_ids=tensor([27], device='cuda:0') mtp accept=0 prop=320 top1=27 accp=0.019 next=pair draft=320 prop=320 pred gate=device Token # 1012: 112.295ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.998 next=draft=2636 prop=2636 olap pair=107.0ms serial=189.9ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.1ms wait=0.1/46.5ms pred gate=device Token # 1013: 3.722ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1014: 112.174ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=907 prop=907 olap pair=106.9ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.0ms s1=185.6ms wait=0.1/46.2ms pred gate=device Token # 1015: 3.738ms; value: next_token_ids=tensor([907], device='cuda:0') mtp accept=1 prop=907 top1=907 accp=0.999 next=pair draft=12 prop=12 pred gate=device Token # 1016: 112.230ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=draft=28753 prop=28753 olap pair=107.1ms serial=189.8ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.8ms wait=0.1/46.3ms pred gate=device Token # 1017: 3.686ms; value: next_token_ids=tensor([28753], device='cuda:0') mtp accept=1 prop=28753 top1=28753 accp=1.000 next=pair draft=29490 prop=29490 pred gate=device Token # 1018: 111.954ms; value: next_token_ids=tensor([29490], device='cuda:0') mtp accept=1 prop=29490 top1=29490 accp=0.915 next=draft=27 prop=27 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=3.8ms s1=185.5ms wait=0.1/46.6ms pred gate=device Token # 1019: 3.715ms; value: next_token_ids=tensor([27], device='cuda:0') mtp accept=1 prop=27 top1=27 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 1020: 112.386ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.998 next=draft=1148 prop=1148 olap pair=107.1ms serial=189.5ms gain=82.4ms ratio=0.43 s0=4.3ms s1=185.2ms wait=0.1/45.8ms pred gate=device Token # 1021: 3.850ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=1148 top1=223 accp=0.296 next=pair draft=7163 prop=7163 pred gate=device Token # 1022: 112.323ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.999 next=draft=27521 prop=27521 olap pair=107.1ms serial=189.8ms gain=82.8ms ratio=0.44 s0=4.0ms s1=185.9ms wait=0.1/46.4ms pred gate=device Token # 1023: 3.775ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=0.998 next=pair draft=21484 prop=21484 pred gate=device Token # 1024: 111.902ms; value: next_token_ids=tensor([21484], device='cuda:0') mtp accept=1 prop=21484 top1=21484 accp=0.748 next=draft=565 prop=565 olap pair=106.6ms serial=188.3ms gain=81.7ms ratio=0.43 s0=6.5ms s1=181.7ms wait=0.2/43.4ms pred gate=device Token # 1025: 3.749ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1026: 112.125ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=907 prop=907 olap pair=106.8ms serial=189.2ms gain=82.4ms ratio=0.44 s0=4.6ms s1=184.5ms wait=0.1/45.7ms pred gate=device Token # 1027: 3.779ms; value: next_token_ids=tensor([907], device='cuda:0') mtp accept=1 prop=907 top1=907 accp=1.000 next=pair draft=12 prop=438 pred gate=device Token # 1028: 112.473ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.510 next=draft=223 prop=223 olap pair=107.2ms serial=190.0ms gain=82.8ms ratio=0.44 s0=4.2ms s1=185.8ms wait=0.1/45.9ms pred gate=device Token # 1029: 3.833ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1030: 112.570ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.3ms serial=190.0ms gain=82.7ms ratio=0.44 s0=4.2ms s1=185.7ms wait=0.1/45.9ms pred gate=device Token # 1031: 3.785ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=28602 prop=28602 pred gate=device Token # 1032: 112.194ms; value: next_token_ids=tensor([28602], device='cuda:0') mtp accept=1 prop=28602 top1=28602 accp=1.000 next=draft=1148 prop=1148 olap pair=107.0ms serial=189.8ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.0ms wait=0.1/46.6ms pred gate=device Token # 1033: 3.736ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=0.999 next=pair draft=7163 prop=7163 pred gate=device Token # 1034: 112.076ms; value: next_token_ids=tensor([14149], device='cuda:0') mtp accept=0 prop=7163 top1=14149 accp=0.075 next=draft=303 prop=303 olap pair=106.9ms serial=189.8ms gain=82.8ms ratio=0.44 s0=4.1ms s1=185.6ms wait=0.1/45.9ms pred gate=device Token # 1035: 111.826ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.974 next=draft=7163 prop=7163 olap pair=106.6ms serial=188.9ms gain=82.4ms ratio=0.44 s0=4.4ms s1=184.6ms wait=0.1/45.6ms pred gate=device Token # 1036: 3.724ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.616 next=pair draft=27521 prop=27521 pred gate=device Token # 1037: 111.927ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=21484 prop=21484 olap pair=106.7ms serial=188.9ms gain=82.2ms ratio=0.44 s0=4.7ms s1=184.3ms wait=0.1/45.3ms pred gate=device Token # 1038: 3.754ms; value: next_token_ids=tensor([21484], device='cuda:0') mtp accept=1 prop=21484 top1=21484 accp=1.000 next=pair draft=565 prop=565 pred gate=device Token # 1039: 111.966ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=1.000 next=draft=223 prop=223 olap pair=106.8ms serial=189.4ms gain=82.7ms ratio=0.44 s0=4.3ms s1=185.2ms wait=0.1/45.6ms pred gate=device Token # 1040: 3.847ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=907 prop=907 pred gate=device Token # 1041: 112.538ms; value: next_token_ids=tensor([907], device='cuda:0') mtp accept=1 prop=907 top1=907 accp=1.000 next=draft=438 prop=438 olap pair=107.3ms serial=189.7ms gain=82.4ms ratio=0.43 s0=4.6ms s1=185.1ms wait=0.1/45.5ms pred gate=device Token # 1042: 3.776ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1043: 112.316ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=7163 prop=7163 olap pair=107.0ms serial=189.9ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.4ms wait=0.1/45.3ms pred gate=device Token # 1044: 3.809ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 1045: 112.148ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=28602 prop=28602 olap pair=106.9ms serial=189.5ms gain=82.7ms ratio=0.44 s0=4.5ms s1=185.1ms wait=0.1/45.4ms pred gate=device Token # 1046: 3.762ms; value: next_token_ids=tensor([28602], device='cuda:0') mtp accept=1 prop=28602 top1=28602 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1047: 111.988ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=2636 prop=2636 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.0ms wait=0.1/45.3ms pred gate=device Token # 1048: 3.735ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.969 next=pair draft=4169 prop=4169 pred gate=device Token # 1049: 112.132ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=4169 top1=223 accp=0.222 next=draft=7163 prop=7163 olap pair=106.9ms serial=189.5ms gain=82.7ms ratio=0.44 s0=4.8ms s1=184.7ms wait=0.1/44.9ms pred gate=device Token # 1050: 112.414ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.1ms serial=189.9ms gain=82.8ms ratio=0.44 s0=4.9ms s1=185.0ms wait=0.1/44.9ms pred gate=device Token # 1051: 3.703ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 1052: 111.900ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=draft=565 prop=565 olap pair=106.7ms serial=189.1ms gain=82.4ms ratio=0.44 s0=4.9ms s1=184.3ms wait=0.1/44.9ms pred gate=device Token # 1053: 3.860ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1054: 112.108ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=7163 prop=7163 olap pair=107.0ms serial=189.6ms gain=82.7ms ratio=0.44 s0=4.8ms s1=184.8ms wait=0.1/44.8ms pred gate=device Token # 1055: 3.791ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 1056: 112.356ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=28602 prop=28602 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=4.9ms s1=185.1ms wait=0.1/44.9ms pred gate=device Token # 1057: 3.711ms; value: next_token_ids=tensor([28602], device='cuda:0') mtp accept=1 prop=28602 top1=28602 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 1058: 112.205ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=107.0ms serial=189.8ms gain=82.9ms ratio=0.44 s0=4.1ms s1=185.8ms wait=0.1/46.3ms pred gate=device Token # 1059: 3.902ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=22 prop=22 pred gate=device Token # 1060: 112.108ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=draft=303 prop=320 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.9ms s1=184.6ms wait=0.1/44.9ms pred gate=device Token # 1061: 3.778ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=320 top1=303 accp=0.701 next=pair draft=4169 prop=4169 pred gate=device Token # 1062: 112.785ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=0 prop=4169 top1=2636 accp=0.420 next=draft=4169 prop=4169 olap pair=107.5ms serial=189.5ms gain=82.0ms ratio=0.43 s0=4.6ms s1=184.9ms wait=0.1/45.5ms pred gate=device Token # 1063: 112.274ms; value: next_token_ids=tensor([4169], device='cuda:0') mtp accept=1 prop=4169 top1=4169 accp=1.000 next=draft=40180 prop=40180 olap pair=107.0ms serial=189.7ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.3ms wait=0.1/45.3ms pred gate=device Token # 1064: 3.721ms; value: next_token_ids=tensor([40180], device='cuda:0') mtp accept=1 prop=40180 top1=40180 accp=0.936 next=pair draft=22 prop=22 pred gate=device Token # 1065: 114.233ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=draft=320 prop=320 olap pair=109.0ms serial=191.8ms gain=82.8ms ratio=0.43 s0=4.4ms s1=187.4ms wait=0.1/45.4ms pred gate=device Token # 1066: 3.856ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.993 next=pair draft=113008 prop=113008 pred gate=device Token # 1067: 113.342ms; value: next_token_ids=tensor([113008], device='cuda:0') mtp accept=1 prop=113008 top1=113008 accp=0.813 next=draft=907 prop=907 olap pair=108.1ms serial=191.2ms gain=83.1ms ratio=0.43 s0=4.7ms s1=186.5ms wait=0.1/45.1ms pred gate=device Token # 1068: 3.794ms; value: next_token_ids=tensor([907], device='cuda:0') mtp accept=1 prop=907 top1=907 accp=1.000 next=pair draft=124637 prop=124637 pred gate=device Token # 1069: 112.390ms; value: next_token_ids=tensor([124637], device='cuda:0') mtp accept=1 prop=124637 top1=124637 accp=1.000 next=draft=478 prop=478 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=4.5ms s1=185.6ms wait=0.1/45.4ms pred gate=device Token # 1070: 3.791ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=pair draft=1002 prop=1002 pred gate=device Token # 1071: 112.319ms; value: next_token_ids=tensor([1002], device='cuda:0') mtp accept=1 prop=1002 top1=1002 accp=1.000 next=draft=768 prop=768 olap pair=107.1ms serial=190.1ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.6ms wait=0.1/45.4ms pred gate=device Token # 1072: 3.689ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=7163 prop=58000 pred gate=device Token # 1073: 112.156ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=0 prop=58000 top1=7163 accp=0.899 next=draft=27521 prop=27521 olap pair=107.0ms serial=189.7ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.3ms wait=0.1/45.4ms pred gate=device Token # 1074: 112.223ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=107.0ms serial=189.7ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.4ms wait=0.1/45.5ms pred gate=device Token # 1075: 3.683ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=1267 prop=1267 pred gate=device Token # 1076: 112.379ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=0.999 next=draft=223 prop=223 olap pair=107.2ms serial=190.1ms gain=82.9ms ratio=0.44 s0=4.3ms s1=185.8ms wait=0.1/45.7ms pred gate=device Token # 1077: 3.784ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=1002 prop=1002 pred gate=device Token # 1078: 112.350ms; value: next_token_ids=tensor([1002], device='cuda:0') mtp accept=1 prop=1002 top1=1002 accp=1.000 next=draft=320 prop=320 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.1ms wait=0.1/46.5ms pred gate=device Token # 1079: 3.725ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=4339 prop=4339 pred gate=device Token # 1080: 112.580ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=0.756 next=draft=768 prop=768 olap pair=107.4ms serial=189.4ms gain=82.0ms ratio=0.43 s0=4.1ms s1=185.3ms wait=0.1/46.2ms pred gate=device Token # 1081: 3.744ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=1002 prop=1002 pred gate=device Token # 1082: 113.014ms; value: next_token_ids=tensor([1002], device='cuda:0') mtp accept=1 prop=1002 top1=1002 accp=1.000 next=draft=12 prop=12 olap pair=107.8ms serial=189.0ms gain=81.2ms ratio=0.43 s0=4.4ms s1=184.6ms wait=0.1/46.1ms pred gate=device Token # 1083: 3.775ms; value: next_token_ids=tensor([982], device='cuda:0') mtp accept=0 prop=12 top1=982 accp=0.128 next=pair draft=223 prop=223 pred gate=device Token # 1084: 112.822ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=24517 prop=24517 olap pair=107.6ms serial=188.7ms gain=81.2ms ratio=0.43 s0=4.3ms s1=184.5ms wait=0.1/46.2ms pred gate=device Token # 1085: 3.707ms; value: next_token_ids=tensor([24517], device='cuda:0') mtp accept=1 prop=24517 top1=24517 accp=1.000 next=pair draft=15098 prop=15098 pred gate=device Token # 1086: 112.753ms; value: next_token_ids=tensor([29105], device='cuda:0') mtp accept=0 prop=15098 top1=29105 accp=0.072 next=draft=22 prop=20 olap pair=107.6ms serial=188.8ms gain=81.2ms ratio=0.43 s0=4.3ms s1=184.5ms wait=0.1/46.0ms pred gate=device Token # 1087: 112.756ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=0 prop=20 top1=22 accp=0.347 next=draft=1148 prop=438 olap pair=107.5ms serial=188.7ms gain=81.2ms ratio=0.43 s0=4.2ms s1=184.5ms wait=0.1/46.2ms pred gate=device Token # 1088: 112.985ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.409 next=draft=1148 prop=1148 olap pair=107.7ms serial=189.1ms gain=81.4ms ratio=0.43 s0=4.3ms s1=184.8ms wait=0.1/46.2ms pred gate=device Token # 1089: 3.722ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=1.000 next=pair draft=1002 prop=1002 pred gate=device Token # 1090: 113.018ms; value: next_token_ids=tensor([1002], device='cuda:0') mtp accept=1 prop=1002 top1=1002 accp=0.975 next=draft=12 prop=12 olap pair=107.8ms serial=189.3ms gain=81.5ms ratio=0.43 s0=4.2ms s1=185.1ms wait=0.1/46.2ms pred gate=device Token # 1091: 3.742ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=0.987 next=pair draft=24517 prop=24517 pred gate=device Token # 1092: 113.205ms; value: next_token_ids=tensor([6391], device='cuda:0') mtp accept=0 prop=24517 top1=6391 accp=0.431 next=draft=1320 prop=1320 olap pair=107.9ms serial=188.5ms gain=80.6ms ratio=0.43 s0=4.5ms s1=184.0ms wait=0.1/46.0ms pred gate=device Token # 1093: 112.895ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=draft=18 prop=18 olap pair=107.6ms serial=189.1ms gain=81.5ms ratio=0.43 s0=4.2ms s1=184.9ms wait=0.1/46.3ms pred gate=device Token # 1094: 3.687ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1095: 112.029ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=5769 prop=5769 olap pair=106.8ms serial=189.4ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.6ms wait=0.1/46.4ms pred gate=device Token # 1096: 3.725ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=1.000 next=pair draft=1320 prop=1320 pred gate=device Token # 1097: 112.055ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=draft=1320 prop=1320 olap pair=106.9ms serial=189.3ms gain=82.4ms ratio=0.44 s0=3.8ms s1=185.4ms wait=0.1/46.4ms pred gate=device Token # 1098: 3.717ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 1099: 112.641ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=1002 prop=1002 olap pair=107.3ms serial=189.5ms gain=82.2ms ratio=0.43 s0=4.1ms s1=185.4ms wait=0.1/46.2ms pred gate=device Token # 1100: 3.822ms; value: next_token_ids=tensor([1002], device='cuda:0') mtp accept=1 prop=1002 top1=1002 accp=1.000 next=pair draft=12 prop=12 pred gate=device Token # 1101: 112.262ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=draft=10475 prop=10475 olap pair=107.0ms serial=189.5ms gain=82.5ms ratio=0.44 s0=4.0ms s1=185.5ms wait=0.1/46.1ms pred gate=device Token # 1102: 3.731ms; value: next_token_ids=tensor([10475], device='cuda:0') mtp accept=1 prop=10475 top1=10475 accp=1.000 next=pair draft=35843 prop=35843 pred gate=device Token # 1103: 112.090ms; value: next_token_ids=tensor([35843], device='cuda:0') mtp accept=1 prop=35843 top1=35843 accp=1.000 next=draft=31 prop=31 olap pair=106.8ms serial=189.4ms gain=82.5ms ratio=0.44 s0=4.0ms s1=185.4ms wait=0.1/46.3ms pred gate=device Token # 1104: 3.690ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=17257 prop=17257 pred gate=device Token # 1105: 112.047ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=17257 top1=17257 accp=0.571 next=draft=17257 prop=17257 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.5ms wait=0.1/46.4ms pred gate=device Token # 1106: 112.112ms; value: next_token_ids=tensor([17257], device='cuda:0') mtp accept=1 prop=17257 top1=17257 accp=0.962 next=draft=31594 prop=29607 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.6ms wait=0.1/46.5ms pred gate=device Token # 1107: 3.675ms; value: next_token_ids=tensor([31257], device='cuda:0') mtp accept=0 prop=29607 top1=31257 accp=0.000 next=pair draft=26 prop=26 pred gate=device Token # 1108: 112.391ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=draft=1148 prop=1148 olap pair=107.2ms serial=189.9ms gain=82.7ms ratio=0.44 s0=3.9ms s1=186.0ms wait=0.1/46.3ms pred gate=device Token # 1109: 3.717ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=1148 top1=303 accp=0.034 next=pair draft=77230 prop=77230 pred gate=device Token # 1110: 113.035ms; value: next_token_ids=tensor([77230], device='cuda:0') mtp accept=1 prop=77230 top1=77230 accp=1.000 next=draft=223 prop=223 olap pair=107.8ms serial=189.2ms gain=81.4ms ratio=0.43 s0=4.3ms s1=185.0ms wait=0.1/46.1ms pred gate=device Token # 1111: 3.748ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1112: 112.296ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=0 prop=7163 top1=5769 accp=0.367 next=draft=27521 prop=27521 olap pair=107.1ms serial=189.5ms gain=82.4ms ratio=0.43 s0=3.8ms s1=185.7ms wait=0.1/46.5ms pred gate=device Token # 1113: 111.819ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=0 prop=27521 top1=1320 accp=0.000 next=draft=1320 prop=1320 olap pair=106.6ms serial=189.0ms gain=82.4ms ratio=0.44 s0=3.8ms s1=185.2ms wait=0.1/46.6ms pred gate=device Token # 1114: 113.003ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=draft=13 prop=13 olap pair=107.7ms serial=189.1ms gain=81.4ms ratio=0.43 s0=4.2ms s1=184.9ms wait=0.1/46.1ms pred gate=device Token # 1115: 3.742ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=17257 prop=17257 pred gate=device Token # 1116: 113.097ms; value: next_token_ids=tensor([17257], device='cuda:0') mtp accept=1 prop=17257 top1=17257 accp=1.000 next=draft=31257 prop=31257 olap pair=107.9ms serial=189.5ms gain=81.7ms ratio=0.43 s0=4.2ms s1=185.3ms wait=0.1/46.3ms pred gate=device Token # 1117: 3.768ms; value: next_token_ids=tensor([31257], device='cuda:0') mtp accept=1 prop=31257 top1=31257 accp=1.000 next=pair draft=26 prop=26 pred gate=device Token # 1118: 112.313ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=draft=31 prop=31 olap pair=107.1ms serial=189.1ms gain=82.0ms ratio=0.43 s0=4.1ms s1=185.0ms wait=0.1/46.3ms pred gate=device Token # 1119: 3.735ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1120: 112.069ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=106.9ms serial=189.5ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.7ms wait=0.1/46.6ms pred gate=device Token # 1121: 3.670ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=28273 prop=28273 pred gate=device Token # 1122: 112.063ms; value: next_token_ids=tensor([28273], device='cuda:0') mtp accept=1 prop=28273 top1=28273 accp=0.949 next=draft=320 prop=320 olap pair=106.8ms serial=189.0ms gain=82.2ms ratio=0.43 s0=5.1ms s1=183.9ms wait=0.1/45.1ms pred gate=device Token # 1123: 3.741ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.999 next=pair draft=2636 prop=2636 pred gate=device Token # 1124: 114.753ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=1.000 next=draft=4169 prop=4169 olap pair=107.0ms serial=189.1ms gain=82.1ms ratio=0.43 s0=4.1ms s1=185.0ms wait=0.1/46.1ms pred gate=device Token # 1125: 3.735ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=4169 top1=223 accp=0.081 next=pair draft=7163 prop=7163 pred gate=device Token # 1126: 112.914ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.998 next=draft=27521 prop=27521 olap pair=107.7ms serial=189.8ms gain=82.1ms ratio=0.43 s0=4.3ms s1=185.5ms wait=0.1/46.0ms pred gate=device Token # 1127: 3.720ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 1128: 112.698ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=draft=565 prop=565 olap pair=107.5ms serial=189.2ms gain=81.7ms ratio=0.43 s0=4.3ms s1=184.9ms wait=0.1/46.1ms pred gate=device Token # 1129: 3.813ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=0.999 next=pair draft=223 prop=223 pred gate=device Token # 1130: 112.424ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=7163 prop=7163 olap pair=107.2ms serial=190.3ms gain=83.0ms ratio=0.44 s0=3.9ms s1=186.4ms wait=0.1/46.5ms pred gate=device Token # 1131: 3.809ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 1132: 112.412ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=28273 prop=28273 olap pair=107.2ms serial=190.1ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.2ms wait=0.1/46.6ms pred gate=device Token # 1133: 3.675ms; value: next_token_ids=tensor([28273], device='cuda:0') mtp accept=1 prop=28273 top1=28273 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 1134: 112.311ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=107.1ms serial=189.8ms gain=82.7ms ratio=0.44 s0=4.1ms s1=185.7ms wait=0.1/46.0ms pred gate=device Token # 1135: 3.878ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=21 prop=21 pred gate=device Token # 1136: 112.481ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=draft=320 prop=320 olap pair=107.2ms serial=190.1ms gain=82.8ms ratio=0.44 s0=4.1ms s1=185.9ms wait=0.1/46.0ms pred gate=device Token # 1137: 3.782ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=320 top1=303 accp=0.507 next=pair draft=4169 prop=4169 pred gate=device Token # 1138: 112.252ms; value: next_token_ids=tensor([4169], device='cuda:0') mtp accept=1 prop=4169 top1=4169 accp=0.956 next=draft=40180 prop=40180 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=3.9ms s1=185.9ms wait=0.1/46.5ms pred gate=device Token # 1139: 3.723ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=0 prop=40180 top1=996 accp=0.059 next=pair draft=21 prop=21 pred gate=device Token # 1140: 112.327ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=draft=320 prop=478 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.1ms s1=185.7ms wait=0.1/46.2ms pred gate=device Token # 1141: 3.743ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.135 next=pair draft=511 prop=511 pred gate=device Token # 1142: 112.286ms; value: next_token_ids=tensor([511], device='cuda:0') mtp accept=1 prop=511 top1=511 accp=1.000 next=draft=768 prop=768 olap pair=107.1ms serial=189.5ms gain=82.4ms ratio=0.43 s0=5.9ms s1=183.5ms wait=0.2/44.2ms pred gate=device Token # 1143: 3.697ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1144: 112.457ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.777 next=draft=27521 prop=27521 olap pair=107.2ms serial=190.1ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.3ms wait=0.1/46.6ms pred gate=device Token # 1145: 3.724ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 1146: 112.619ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=draft=1267 prop=1267 olap pair=107.4ms serial=189.3ms gain=81.9ms ratio=0.43 s0=4.1ms s1=185.2ms wait=0.1/46.4ms pred gate=device Token # 1147: 3.749ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1148: 112.532ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=511 prop=511 olap pair=107.2ms serial=190.1ms gain=82.9ms ratio=0.44 s0=4.0ms s1=186.1ms wait=0.1/46.5ms pred gate=device Token # 1149: 3.758ms; value: next_token_ids=tensor([511], device='cuda:0') mtp accept=1 prop=511 top1=511 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1150: 111.911ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=511 prop=511 olap pair=106.7ms serial=189.4ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.5ms wait=0.1/46.4ms pred gate=device Token # 1151: 3.692ms; value: next_token_ids=tensor([511], device='cuda:0') mtp accept=1 prop=511 top1=511 accp=0.922 next=pair draft=12 prop=12 pred gate=device Token # 1152: 112.313ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=0.999 next=draft=25353 prop=25353 olap pair=107.2ms serial=190.3ms gain=83.2ms ratio=0.44 s0=3.9ms s1=186.5ms wait=0.1/46.6ms pred gate=device Token # 1153: 3.680ms; value: next_token_ids=tensor([25353], device='cuda:0') mtp accept=1 prop=25353 top1=25353 accp=0.964 next=pair draft=9931 prop=23921 pred gate=device Token # 1154: 111.828ms; value: next_token_ids=tensor([31899], device='cuda:0') mtp accept=0 prop=23921 top1=31899 accp=0.003 next=draft=18 prop=18 olap pair=106.7ms serial=189.3ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.4ms wait=0.1/46.6ms pred gate=device Token # 1155: 112.165ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=0 prop=18 top1=19 accp=0.024 next=draft=438 prop=438 olap pair=106.9ms serial=189.6ms gain=82.8ms ratio=0.44 s0=3.8ms s1=185.9ms wait=0.1/46.5ms pred gate=device Token # 1156: 112.388ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.999 next=draft=1148 prop=1148 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=4.1ms s1=185.9ms wait=0.1/46.1ms pred gate=device Token # 1157: 3.725ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=0.999 next=pair draft=511 prop=511 pred gate=device Token # 1158: 112.376ms; value: next_token_ids=tensor([511], device='cuda:0') mtp accept=1 prop=511 top1=511 accp=1.000 next=draft=12 prop=12 olap pair=107.1ms serial=190.1ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.7ms wait=0.1/45.5ms pred gate=device Token # 1159: 3.688ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=17154 prop=17154 pred gate=device Token # 1160: 112.261ms; value: next_token_ids=tensor([17154], device='cuda:0') mtp accept=1 prop=17154 top1=17154 accp=1.000 next=draft=1320 prop=1320 olap pair=107.0ms serial=189.7ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.4ms wait=0.1/45.4ms pred gate=device Token # 1161: 3.733ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=pair draft=18 prop=18 pred gate=device Token # 1162: 111.968ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=31 prop=31 olap pair=106.7ms serial=189.3ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.0ms wait=0.1/45.6ms pred gate=device Token # 1163: 3.750ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1164: 112.218ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=3712 prop=3712 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.4ms wait=0.1/45.5ms pred gate=device Token # 1165: 3.701ms; value: next_token_ids=tensor([3712], device='cuda:0') mtp accept=1 prop=3712 top1=3712 accp=1.000 next=pair draft=1320 prop=1320 pred gate=device Token # 1166: 112.231ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=draft=303 prop=303 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=4.0ms s1=185.8ms wait=0.1/46.4ms pred gate=device Token # 1167: 3.709ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=511 prop=511 pred gate=device Token # 1168: 112.274ms; value: next_token_ids=tensor([511], device='cuda:0') mtp accept=1 prop=511 top1=511 accp=1.000 next=draft=12 prop=12 olap pair=107.0ms serial=189.9ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.0ms wait=0.1/46.5ms pred gate=device Token # 1169: 3.749ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=6860 prop=6860 pred gate=device Token # 1170: 112.246ms; value: next_token_ids=tensor([6860], device='cuda:0') mtp accept=1 prop=6860 top1=6860 accp=1.000 next=draft=1602 prop=1602 olap pair=107.0ms serial=189.7ms gain=82.8ms ratio=0.44 s0=3.9ms s1=185.9ms wait=0.1/46.4ms pred gate=device Token # 1171: 3.691ms; value: next_token_ids=tensor([1602], device='cuda:0') mtp accept=1 prop=1602 top1=1602 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1172: 112.105ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=0.999 next=draft=20530 prop=20530 olap pair=106.9ms serial=189.6ms gain=82.8ms ratio=0.44 s0=3.9ms s1=185.8ms wait=0.1/46.5ms pred gate=device Token # 1173: 3.750ms; value: next_token_ids=tensor([20530], device='cuda:0') mtp accept=1 prop=20530 top1=20530 accp=1.000 next=pair draft=25361 prop=25361 pred gate=device Token # 1174: 111.940ms; value: next_token_ids=tensor([25361], device='cuda:0') mtp accept=1 prop=25361 top1=25361 accp=1.000 next=draft=303 prop=303 olap pair=106.8ms serial=189.3ms gain=82.5ms ratio=0.44 s0=4.3ms s1=185.0ms wait=0.1/45.6ms pred gate=device Token # 1175: 3.708ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=77230 prop=77230 pred gate=device Token # 1176: 112.061ms; value: next_token_ids=tensor([77230], device='cuda:0') mtp accept=1 prop=77230 top1=77230 accp=1.000 next=draft=223 prop=223 olap pair=106.9ms serial=189.5ms gain=82.7ms ratio=0.44 s0=4.3ms s1=185.2ms wait=0.1/45.5ms pred gate=device Token # 1177: 3.746ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1178: 111.968ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=3712 olap pair=106.8ms serial=189.5ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.1ms wait=0.1/45.5ms pred gate=device Token # 1179: 3.752ms; value: next_token_ids=tensor([3712], device='cuda:0') mtp accept=1 prop=3712 top1=27521 accp=0.729 next=pair draft=1320 prop=1320 pred gate=device Token # 1180: 112.114ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=draft=13 prop=13 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.3ms s1=185.2ms wait=0.1/45.8ms pred gate=device Token # 1181: 3.768ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=20530 prop=20530 pred gate=device Token # 1182: 112.620ms; value: next_token_ids=tensor([20530], device='cuda:0') mtp accept=1 prop=20530 top1=20530 accp=1.000 next=draft=25361 prop=25361 olap pair=107.3ms serial=189.9ms gain=82.6ms ratio=0.44 s0=4.1ms s1=185.7ms wait=0.1/46.1ms pred gate=device Token # 1183: 3.795ms; value: next_token_ids=tensor([25361], device='cuda:0') mtp accept=1 prop=25361 top1=25361 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1184: 112.300ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=7163 prop=7163 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.1ms wait=0.1/46.7ms pred gate=device Token # 1185: 3.699ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 1186: 112.107ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=25361 prop=25361 olap pair=106.9ms serial=189.5ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.7ms wait=0.1/46.6ms pred gate=device Token # 1187: 3.716ms; value: next_token_ids=tensor([25361], device='cuda:0') mtp accept=1 prop=25361 top1=25361 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1188: 112.160ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.999 next=draft=2636 prop=2636 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=4.0ms s1=185.6ms wait=0.1/46.3ms pred gate=device Token # 1189: 3.780ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.907 next=pair draft=4169 prop=4169 pred gate=device Token # 1190: 112.378ms; value: next_token_ids=tensor([4169], device='cuda:0') mtp accept=1 prop=4169 top1=4169 accp=0.996 next=draft=996 prop=996 olap pair=107.1ms serial=189.9ms gain=82.8ms ratio=0.44 s0=4.2ms s1=185.6ms wait=0.1/45.9ms pred gate=device Token # 1191: 3.766ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=0.672 next=pair draft=20 prop=20 pred gate=device Token # 1192: 112.352ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=478 prop=478 olap pair=107.2ms serial=190.1ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.3ms wait=0.1/46.6ms pred gate=device Token # 1193: 3.704ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=pair draft=1349 prop=1349 pred gate=device Token # 1194: 113.151ms; value: next_token_ids=tensor([1349], device='cuda:0') mtp accept=1 prop=1349 top1=1349 accp=1.000 next=draft=768 prop=768 olap pair=107.2ms serial=190.1ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.7ms wait=0.1/46.0ms pred gate=device Token # 1195: 4.564ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1196: 112.572ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.932 next=draft=27521 prop=27521 olap pair=107.2ms serial=189.7ms gain=82.5ms ratio=0.43 s0=6.3ms s1=183.4ms wait=0.2/43.7ms pred gate=device Token # 1197: 3.701ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 1198: 112.130ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=draft=1267 prop=1267 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.0ms s1=185.6ms wait=0.1/46.4ms pred gate=device Token # 1199: 3.789ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1200: 112.592ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=1349 prop=1349 olap pair=107.1ms serial=189.8ms gain=82.7ms ratio=0.44 s0=4.2ms s1=185.6ms wait=0.1/46.0ms pred gate=device Token # 1201: 3.746ms; value: next_token_ids=tensor([1349], device='cuda:0') mtp accept=1 prop=1349 top1=1349 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1202: 112.082ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=1349 prop=1349 olap pair=106.9ms serial=189.5ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.2ms wait=0.1/45.6ms pred gate=device Token # 1203: 3.719ms; value: next_token_ids=tensor([1349], device='cuda:0') mtp accept=1 prop=1349 top1=1349 accp=0.995 next=pair draft=12 prop=12 pred gate=device Token # 1204: 112.032ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=0.999 next=draft=22957 prop=22957 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.5ms wait=0.1/46.5ms pred gate=device Token # 1205: 3.771ms; value: next_token_ids=tensor([22957], device='cuda:0') mtp accept=1 prop=22957 top1=22957 accp=0.963 next=pair draft=31460 prop=25382 pred gate=device Token # 1206: 111.781ms; value: next_token_ids=tensor([28262], device='cuda:0') mtp accept=0 prop=25382 top1=28262 accp=0.000 next=draft=21 prop=23 olap pair=106.6ms serial=189.2ms gain=82.6ms ratio=0.44 s0=3.8ms s1=185.4ms wait=0.1/46.5ms pred gate=device Token # 1207: 112.036ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=0 prop=23 top1=24 accp=0.112 next=draft=438 prop=438 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=3.8ms s1=185.6ms wait=0.1/46.5ms pred gate=device Token # 1208: 112.171ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=1148 prop=1148 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.8ms wait=0.1/46.6ms pred gate=device Token # 1209: 3.678ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=1.000 next=pair draft=1349 prop=1349 pred gate=device Token # 1210: 112.319ms; value: next_token_ids=tensor([1349], device='cuda:0') mtp accept=1 prop=1349 top1=1349 accp=1.000 next=draft=12 prop=12 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.1ms wait=0.1/46.5ms pred gate=device Token # 1211: 3.701ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=12747 prop=22957 pred gate=device Token # 1212: 112.127ms; value: next_token_ids=tensor([22957], device='cuda:0') mtp accept=1 prop=22957 top1=22957 accp=0.261 next=draft=1320 prop=1320 olap pair=106.9ms serial=189.3ms gain=82.4ms ratio=0.44 s0=4.3ms s1=185.0ms wait=0.1/45.6ms pred gate=device Token # 1213: 3.778ms; value: next_token_ids=tensor([28262], device='cuda:0') mtp accept=0 prop=1320 top1=28262 accp=0.071 next=pair draft=24 prop=24 pred gate=device Token # 1214: 112.230ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=draft=438 prop=438 olap pair=107.0ms serial=189.7ms gain=82.6ms ratio=0.44 s0=4.1ms s1=185.5ms wait=0.1/46.1ms pred gate=device Token # 1215: 3.697ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=0 prop=438 top1=768 accp=0.290 next=pair draft=1349 prop=1349 pred gate=device Token # 1216: 113.193ms; value: next_token_ids=tensor([1349], device='cuda:0') mtp accept=1 prop=1349 top1=1349 accp=0.962 next=draft=12 prop=12 olap pair=107.1ms serial=188.8ms gain=81.7ms ratio=0.43 s0=8.7ms s1=180.1ms wait=0.2/40.9ms pred gate=device Token # 1217: 4.559ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=12747 prop=12747 pred gate=device Token # 1218: 112.359ms; value: next_token_ids=tensor([12747], device='cuda:0') mtp accept=1 prop=12747 top1=12747 accp=0.955 next=draft=1320 prop=1320 olap pair=106.9ms serial=189.4ms gain=82.5ms ratio=0.44 s0=4.4ms s1=185.0ms wait=0.1/45.9ms pred gate=device Token # 1219: 3.706ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=pair draft=18 prop=18 pred gate=device Token # 1220: 111.797ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=31 prop=31 olap pair=106.6ms serial=189.0ms gain=82.4ms ratio=0.44 s0=3.8ms s1=185.1ms wait=0.1/46.4ms pred gate=device Token # 1221: 3.755ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=6650 prop=6650 pred gate=device Token # 1222: 112.344ms; value: next_token_ids=tensor([6650], device='cuda:0') mtp accept=1 prop=6650 top1=6650 accp=1.000 next=draft=3712 prop=3712 olap pair=107.2ms serial=189.9ms gain=82.7ms ratio=0.44 s0=4.3ms s1=185.6ms wait=0.1/45.9ms pred gate=device Token # 1223: 3.702ms; value: next_token_ids=tensor([3712], device='cuda:0') mtp accept=1 prop=3712 top1=3712 accp=1.000 next=pair draft=1320 prop=1320 pred gate=device Token # 1224: 112.221ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=draft=303 prop=303 olap pair=107.0ms serial=189.6ms gain=82.6ms ratio=0.44 s0=4.3ms s1=185.3ms wait=0.1/45.7ms pred gate=device Token # 1225: 3.725ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=1349 prop=1349 pred gate=device Token # 1226: 112.105ms; value: next_token_ids=tensor([1349], device='cuda:0') mtp accept=1 prop=1349 top1=1349 accp=1.000 next=draft=12 prop=12 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=4.1ms s1=185.5ms wait=0.1/46.1ms pred gate=device Token # 1227: 3.692ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=25431 prop=25431 pred gate=device Token # 1228: 112.187ms; value: next_token_ids=tensor([25431], device='cuda:0') mtp accept=1 prop=25431 top1=25431 accp=1.000 next=draft=1450 prop=1450 olap pair=106.9ms serial=189.7ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.8ms wait=0.1/46.5ms pred gate=device Token # 1229: 3.690ms; value: next_token_ids=tensor([1450], device='cuda:0') mtp accept=1 prop=1450 top1=1450 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1230: 113.093ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=9107 prop=9107 olap pair=107.1ms serial=189.5ms gain=82.4ms ratio=0.43 s0=4.7ms s1=184.8ms wait=0.1/45.5ms pred gate=device Token # 1231: 4.723ms; value: next_token_ids=tensor([9107], device='cuda:0') mtp accept=1 prop=9107 top1=9107 accp=0.999 next=pair draft=31257 prop=31257 pred gate=device Token # 1232: 113.283ms; value: next_token_ids=tensor([31257], device='cuda:0') mtp accept=1 prop=31257 top1=31257 accp=1.000 next=draft=26 prop=26 olap pair=107.1ms serial=188.4ms gain=81.3ms ratio=0.43 s0=8.8ms s1=179.6ms wait=0.2/40.9ms pred gate=device Token # 1233: 4.587ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 1234: 113.104ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.999 next=draft=77230 prop=77230 olap pair=107.0ms serial=188.3ms gain=81.3ms ratio=0.43 s0=8.7ms s1=179.5ms wait=0.2/41.0ms pred gate=device Token # 1235: 4.558ms; value: next_token_ids=tensor([77230], device='cuda:0') mtp accept=1 prop=77230 top1=77230 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1236: 113.355ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=6650 prop=6650 olap pair=107.2ms serial=188.7ms gain=81.5ms ratio=0.43 s0=8.6ms s1=180.1ms wait=0.2/41.3ms pred gate=device Token # 1237: 4.598ms; value: next_token_ids=tensor([6650], device='cuda:0') mtp accept=1 prop=6650 top1=6650 accp=1.000 next=pair draft=3712 prop=3712 pred gate=device Token # 1238: 113.500ms; value: next_token_ids=tensor([3712], device='cuda:0') mtp accept=1 prop=3712 top1=3712 accp=1.000 next=draft=1320 prop=1320 olap pair=107.4ms serial=189.0ms gain=81.6ms ratio=0.43 s0=8.6ms s1=180.4ms wait=0.2/41.1ms pred gate=device Token # 1239: 4.665ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=pair draft=13 prop=13 pred gate=device Token # 1240: 112.957ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=9107 prop=9107 olap pair=107.1ms serial=188.5ms gain=81.4ms ratio=0.43 s0=8.6ms s1=179.9ms wait=0.2/41.0ms pred gate=device Token # 1241: 3.925ms; value: next_token_ids=tensor([9107], device='cuda:0') mtp accept=1 prop=9107 top1=9107 accp=1.000 next=pair draft=31257 prop=31257 pred gate=device Token # 1242: 112.446ms; value: next_token_ids=tensor([31257], device='cuda:0') mtp accept=1 prop=31257 top1=31257 accp=1.000 next=draft=26 prop=26 olap pair=107.2ms serial=190.1ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.3ms wait=0.1/46.5ms pred gate=device Token # 1243: 3.747ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1244: 112.247ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=0.993 next=draft=7163 prop=7163 olap pair=107.0ms serial=189.6ms gain=82.7ms ratio=0.44 s0=4.2ms s1=185.4ms wait=0.1/45.9ms pred gate=device Token # 1245: 3.766ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 1246: 112.319ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=28273 prop=28273 olap pair=107.2ms serial=189.9ms gain=82.7ms ratio=0.44 s0=4.1ms s1=185.7ms wait=0.1/46.1ms pred gate=device Token # 1247: 3.697ms; value: next_token_ids=tensor([28273], device='cuda:0') mtp accept=1 prop=28273 top1=28273 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1248: 112.083ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.975 next=draft=4169 prop=4169 olap pair=106.8ms serial=189.3ms gain=82.4ms ratio=0.44 s0=4.1ms s1=185.1ms wait=0.1/46.1ms pred gate=device Token # 1249: 3.753ms; value: next_token_ids=tensor([4169], device='cuda:0') mtp accept=1 prop=4169 top1=4169 accp=0.706 next=pair draft=996 prop=996 pred gate=device Token # 1250: 112.145ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=draft=21 prop=21 olap pair=106.9ms serial=189.4ms gain=82.6ms ratio=0.44 s0=4.0ms s1=185.5ms wait=0.1/46.5ms pred gate=device Token # 1251: 3.727ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=pair draft=478 prop=478 pred gate=device Token # 1252: 112.553ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=draft=1557 prop=1557 olap pair=107.2ms serial=189.9ms gain=82.7ms ratio=0.44 s0=4.3ms s1=185.6ms wait=0.1/45.7ms pred gate=device Token # 1253: 3.752ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=1.000 next=pair draft=768 prop=768 pred gate=device Token # 1254: 112.067ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=7163 prop=7163 olap pair=106.9ms serial=189.1ms gain=82.2ms ratio=0.43 s0=5.5ms s1=183.6ms wait=0.1/44.2ms pred gate=device Token # 1255: 3.753ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.987 next=pair draft=27521 prop=27521 pred gate=device Token # 1256: 112.313ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=107.1ms serial=189.9ms gain=82.7ms ratio=0.44 s0=4.3ms s1=185.6ms wait=0.1/45.8ms pred gate=device Token # 1257: 3.702ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=1267 prop=1267 pred gate=device Token # 1258: 112.417ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=draft=223 prop=223 olap pair=107.2ms serial=190.2ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.3ms wait=0.1/46.7ms pred gate=device Token # 1259: 3.742ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=1557 prop=1557 pred gate=device Token # 1260: 112.394ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=1.000 next=draft=320 prop=320 olap pair=107.2ms serial=190.3ms gain=83.0ms ratio=0.44 s0=3.9ms s1=186.4ms wait=0.1/46.5ms pred gate=device Token # 1261: 3.719ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=1557 prop=1557 pred gate=device Token # 1262: 112.300ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=0.998 next=draft=12 prop=12 olap pair=107.1ms serial=190.1ms gain=83.0ms ratio=0.44 s0=3.9ms s1=186.2ms wait=0.1/46.5ms pred gate=device Token # 1263: 3.731ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=0.995 next=pair draft=20192 prop=20192 pred gate=device Token # 1264: 111.919ms; value: next_token_ids=tensor([20192], device='cuda:0') mtp accept=1 prop=20192 top1=20192 accp=1.000 next=draft=24517 prop=24517 olap pair=106.8ms serial=189.3ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.4ms wait=0.1/46.5ms pred gate=device Token # 1265: 3.640ms; value: next_token_ids=tensor([28494], device='cuda:0') mtp accept=0 prop=24517 top1=28494 accp=0.000 next=pair draft=18 prop=18 pred gate=device Token # 1266: 112.348ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=0 prop=18 top1=24 accp=0.000 next=draft=438 prop=438 olap pair=107.0ms serial=188.9ms gain=81.8ms ratio=0.43 s0=4.5ms s1=184.4ms wait=0.1/45.7ms pred gate=device Token # 1267: 112.344ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.999 next=draft=1148 prop=1148 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.2ms s1=185.5ms wait=0.1/45.9ms pred gate=device Token # 1268: 3.715ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=1.000 next=pair draft=1557 prop=1557 pred gate=device Token # 1269: 112.527ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=1.000 next=draft=12 prop=12 olap pair=107.3ms serial=189.8ms gain=82.5ms ratio=0.43 s0=4.3ms s1=185.5ms wait=0.1/46.0ms pred gate=device Token # 1270: 3.740ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=10758 prop=10758 pred gate=device Token # 1271: 113.656ms; value: next_token_ids=tensor([20192], device='cuda:0') mtp accept=0 prop=10758 top1=20192 accp=0.476 next=draft=28494 prop=28494 olap pair=108.4ms serial=190.9ms gain=82.5ms ratio=0.43 s0=5.4ms s1=185.5ms wait=0.1/45.7ms pred gate=device Token # 1272: 112.407ms; value: next_token_ids=tensor([28494], device='cuda:0') mtp accept=1 prop=28494 top1=28494 accp=1.000 next=draft=24 prop=24 olap pair=107.1ms serial=189.9ms gain=82.7ms ratio=0.44 s0=4.0ms s1=185.9ms wait=0.1/47.3ms pred gate=device Token # 1273: 3.755ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=pair draft=768 prop=768 pred gate=device Token # 1274: 112.209ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.999 next=draft=1557 prop=1557 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.9ms wait=0.1/47.7ms pred gate=device Token # 1275: 3.735ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=0.948 next=pair draft=12 prop=12 pred gate=device Token # 1276: 112.236ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=draft=10758 prop=10758 olap pair=107.0ms serial=189.4ms gain=82.5ms ratio=0.44 s0=4.2ms s1=185.3ms wait=0.1/46.2ms pred gate=device Token # 1277: 3.806ms; value: next_token_ids=tensor([10758], device='cuda:0') mtp accept=1 prop=10758 top1=10758 accp=1.000 next=pair draft=1320 prop=1320 pred gate=device Token # 1278: 112.085ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=draft=18 prop=18 olap pair=106.9ms serial=189.4ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.0ms wait=0.1/45.9ms pred gate=device Token # 1279: 3.739ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1280: 112.019ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=7163 prop=7163 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.0ms wait=0.1/46.1ms pred gate=device Token # 1281: 3.701ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=5126 prop=5126 pred gate=device Token # 1282: 112.443ms; value: next_token_ids=tensor([5126], device='cuda:0') mtp accept=1 prop=5126 top1=5126 accp=0.999 next=draft=1320 prop=1320 olap pair=107.0ms serial=189.7ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.7ms wait=0.1/47.3ms pred gate=device Token # 1283: 3.727ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 1284: 112.269ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=1557 prop=1557 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=4.0ms s1=185.8ms wait=0.1/47.2ms pred gate=device Token # 1285: 3.762ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=1.000 next=pair draft=12 prop=12 pred gate=device Token # 1286: 112.330ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=draft=10862 prop=10862 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=4.2ms s1=185.8ms wait=0.1/46.6ms pred gate=device Token # 1287: 3.772ms; value: next_token_ids=tensor([10862], device='cuda:0') mtp accept=1 prop=10862 top1=10862 accp=1.000 next=pair draft=5926 prop=5926 pred gate=device Token # 1288: 112.430ms; value: next_token_ids=tensor([5926], device='cuda:0') mtp accept=1 prop=5926 top1=5926 accp=1.000 next=draft=31 prop=31 olap pair=107.2ms serial=190.1ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.7ms wait=0.1/46.2ms pred gate=device Token # 1289: 3.757ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=23369 prop=23369 pred gate=device Token # 1290: 112.094ms; value: next_token_ids=tensor([23369], device='cuda:0') mtp accept=1 prop=23369 top1=23369 accp=1.000 next=draft=31416 prop=31416 olap pair=106.9ms serial=189.6ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.2ms wait=0.1/46.5ms pred gate=device Token # 1291: 3.756ms; value: next_token_ids=tensor([31416], device='cuda:0') mtp accept=1 prop=31416 top1=31416 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 1292: 112.274ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=77230 prop=77230 olap pair=107.1ms serial=189.8ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.5ms wait=0.1/46.5ms pred gate=device Token # 1293: 3.724ms; value: next_token_ids=tensor([77230], device='cuda:0') mtp accept=1 prop=77230 top1=77230 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1294: 112.410ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=7163 prop=7163 olap pair=107.2ms serial=190.0ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.6ms wait=0.1/46.3ms pred gate=device Token # 1295: 3.692ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 1296: 112.140ms; value: next_token_ids=tensor([5126], device='cuda:0') mtp accept=0 prop=27521 top1=5126 accp=0.295 next=draft=1320 prop=1320 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.3ms wait=0.1/46.4ms pred gate=device Token # 1297: 112.275ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=draft=13 prop=13 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.3ms wait=0.1/46.1ms pred gate=device Token # 1298: 3.721ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=23369 prop=23369 pred gate=device Token # 1299: 112.387ms; value: next_token_ids=tensor([23369], device='cuda:0') mtp accept=1 prop=23369 top1=23369 accp=1.000 next=draft=31416 prop=31416 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=4.3ms s1=185.7ms wait=0.1/46.5ms pred gate=device Token # 1300: 3.761ms; value: next_token_ids=tensor([31416], device='cuda:0') mtp accept=1 prop=31416 top1=31416 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1301: 112.303ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=7163 prop=7163 olap pair=107.0ms serial=189.8ms gain=82.7ms ratio=0.44 s0=4.1ms s1=185.7ms wait=0.1/47.0ms pred gate=device Token # 1302: 3.740ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 1303: 112.256ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=31416 prop=31416 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.2ms s1=185.6ms wait=0.1/47.1ms pred gate=device Token # 1304: 3.716ms; value: next_token_ids=tensor([31416], device='cuda:0') mtp accept=1 prop=31416 top1=31416 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1305: 111.996ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=320 top1=303 accp=0.015 next=draft=94073 prop=94073 olap pair=106.8ms serial=189.4ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.6ms wait=0.1/47.6ms pred gate=device Token # 1306: 112.114ms; value: next_token_ids=tensor([94073], device='cuda:0') mtp accept=1 prop=94073 top1=94073 accp=0.992 next=draft=320 prop=320 olap pair=106.8ms serial=189.6ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.8ms wait=0.1/47.8ms pred gate=device Token # 1307: 3.760ms; value: next_token_ids=tensor([3885], device='cuda:0') mtp accept=0 prop=320 top1=3885 accp=0.000 next=pair draft=320 prop=320 pred gate=device Token # 1308: 112.329ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.999 next=draft=2636 prop=2636 olap pair=107.1ms serial=189.9ms gain=82.8ms ratio=0.44 s0=3.9ms s1=185.9ms wait=0.1/47.4ms pred gate=device Token # 1309: 3.722ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.999 next=pair draft=223 prop=223 pred gate=device Token # 1310: 112.327ms; value: next_token_ids=tensor([16992], device='cuda:0') mtp accept=0 prop=223 top1=16992 accp=0.321 next=draft=223 prop=223 olap pair=107.1ms serial=189.9ms gain=82.8ms ratio=0.44 s0=3.9ms s1=186.0ms wait=0.1/47.7ms pred gate=device Token # 1311: 112.314ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.984 next=draft=1557 prop=1557 olap pair=107.0ms serial=189.6ms gain=82.6ms ratio=0.44 s0=4.0ms s1=185.7ms wait=0.1/47.6ms pred gate=device Token # 1312: 3.770ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=1.000 next=pair draft=12 prop=12 pred gate=device Token # 1313: 112.237ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=draft=20192 prop=20192 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.8ms wait=0.1/47.7ms pred gate=device Token # 1314: 3.699ms; value: next_token_ids=tensor([20192], device='cuda:0') mtp accept=1 prop=20192 top1=20192 accp=1.000 next=pair draft=28494 prop=28494 pred gate=device Token # 1315: 112.083ms; value: next_token_ids=tensor([28494], device='cuda:0') mtp accept=1 prop=28494 top1=28494 accp=0.961 next=draft=23 prop=23 olap pair=106.9ms serial=189.4ms gain=82.5ms ratio=0.44 s0=4.0ms s1=185.4ms wait=0.1/47.3ms pred gate=device Token # 1316: 3.711ms; value: next_token_ids=tensor([23], device='cuda:0') mtp accept=1 prop=23 top1=23 accp=0.999 next=pair draft=438 prop=438 pred gate=device Token # 1317: 112.114ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.962 next=draft=223 prop=223 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.1ms wait=0.1/46.4ms pred gate=device Token # 1318: 3.708ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1319: 112.184ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.3ms wait=0.1/46.2ms pred gate=device Token # 1320: 3.678ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=31416 prop=31416 pred gate=device Token # 1321: 112.227ms; value: next_token_ids=tensor([31416], device='cuda:0') mtp accept=1 prop=31416 top1=31416 accp=1.000 next=draft=565 prop=565 olap pair=107.0ms serial=189.4ms gain=82.4ms ratio=0.43 s0=4.1ms s1=185.3ms wait=0.1/47.3ms pred gate=device Token # 1322: 3.822ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1323: 112.386ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=1557 prop=1557 olap pair=107.1ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.5ms s1=185.2ms wait=0.1/46.2ms pred gate=device Token # 1324: 3.759ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 1325: 112.424ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.999 next=draft=223 prop=223 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.6ms wait=0.1/46.1ms pred gate=device Token # 1326: 3.720ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1327: 112.507ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.2ms serial=190.1ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.7ms wait=0.1/46.2ms pred gate=device Token # 1328: 3.800ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=28782 prop=28782 pred gate=device Token # 1329: 112.095ms; value: next_token_ids=tensor([28782], device='cuda:0') mtp accept=1 prop=28782 top1=28782 accp=1.000 next=draft=320 prop=320 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.1ms wait=0.1/46.1ms pred gate=device Token # 1330: 3.731ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=320 top1=303 accp=0.211 next=pair draft=12223 prop=12223 pred gate=device Token # 1331: 113.023ms; value: next_token_ids=tensor([12223], device='cuda:0') mtp accept=1 prop=12223 top1=12223 accp=0.628 next=draft=35987 prop=35987 olap pair=107.7ms serial=190.3ms gain=82.5ms ratio=0.43 s0=4.1ms s1=186.2ms wait=0.1/47.0ms pred gate=device Token # 1332: 3.743ms; value: next_token_ids=tensor([16913], device='cuda:0') mtp accept=0 prop=35987 top1=35987 accp=0.341 next=pair draft=10095 prop=10095 pred gate=device Token # 1333: 112.340ms; value: next_token_ids=tensor([10095], device='cuda:0') mtp accept=1 prop=10095 top1=7163 accp=0.565 next=draft=1148 prop=1148 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=4.3ms s1=185.6ms wait=0.1/46.5ms pred gate=device Token # 1334: 3.747ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=1.000 next=pair draft=1557 prop=1557 pred gate=device Token # 1335: 112.285ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=0 prop=1557 top1=7163 accp=0.295 next=draft=27521 prop=27521 olap pair=107.1ms serial=189.9ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.5ms wait=0.1/46.8ms pred gate=device Token # 1336: 112.151ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.1ms wait=0.1/46.7ms pred gate=device Token # 1337: 3.737ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=0.999 next=pair draft=565 prop=565 pred gate=device Token # 1338: 112.272ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=223 accp=0.472 next=draft=223 prop=223 olap pair=107.1ms serial=189.9ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.6ms wait=0.1/46.6ms pred gate=device Token # 1339: 3.744ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1340: 112.448ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.3ms serial=190.2ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.8ms wait=0.1/46.8ms pred gate=device Token # 1341: 3.729ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=28782 prop=28782 pred gate=device Token # 1342: 112.452ms; value: next_token_ids=tensor([28782], device='cuda:0') mtp accept=1 prop=28782 top1=28782 accp=1.000 next=draft=438 prop=438 olap pair=107.2ms serial=189.3ms gain=82.1ms ratio=0.43 s0=4.5ms s1=184.8ms wait=0.1/47.0ms pred gate=device Token # 1343: 3.841ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.999 next=pair draft=565 prop=565 pred gate=device Token # 1344: 112.223ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=1.000 next=draft=10095 prop=10095 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.0ms s1=185.5ms wait=0.1/47.6ms pred gate=device Token # 1345: 3.743ms; value: next_token_ids=tensor([10095], device='cuda:0') mtp accept=1 prop=10095 top1=10095 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1346: 112.296ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=320 top1=303 accp=0.188 next=draft=2636 prop=2636 olap pair=107.0ms serial=189.2ms gain=82.2ms ratio=0.43 s0=4.2ms s1=185.0ms wait=0.1/47.2ms pred gate=device Token # 1347: 111.973ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=1.000 next=draft=19791 prop=20198 olap pair=106.7ms serial=189.1ms gain=82.5ms ratio=0.44 s0=3.9ms s1=185.2ms wait=0.1/47.6ms pred gate=device Token # 1348: 3.778ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=20198 top1=223 accp=0.217 next=pair draft=1557 prop=1557 pred gate=device Token # 1349: 112.262ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=0.998 next=draft=12 prop=12 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=4.0ms s1=185.8ms wait=0.1/47.7ms pred gate=device Token # 1350: 3.734ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=20192 prop=20192 pred gate=device Token # 1351: 112.014ms; value: next_token_ids=tensor([20192], device='cuda:0') mtp accept=1 prop=20192 top1=20192 accp=1.000 next=draft=28494 prop=28494 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.5ms wait=0.1/47.8ms pred gate=device Token # 1352: 3.697ms; value: next_token_ids=tensor([28494], device='cuda:0') mtp accept=1 prop=28494 top1=28494 accp=0.756 next=pair draft=18 prop=18 pred gate=device Token # 1353: 112.206ms; value: next_token_ids=tensor([23], device='cuda:0') mtp accept=0 prop=18 top1=23 accp=0.252 next=draft=223 prop=223 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.6ms wait=0.1/47.8ms pred gate=device Token # 1354: 112.349ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.653 next=draft=35987 prop=94073 olap pair=107.0ms serial=189.6ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.7ms wait=0.1/47.8ms pred gate=device Token # 1355: 3.775ms; value: next_token_ids=tensor([94073], device='cuda:0') mtp accept=1 prop=94073 top1=94073 accp=0.377 next=pair draft=320 prop=320 pred gate=device Token # 1356: 112.175ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=14149 prop=14149 olap pair=106.9ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.7ms s1=185.0ms wait=0.1/46.8ms pred gate=device Token # 1357: 3.748ms; value: next_token_ids=tensor([14149], device='cuda:0') mtp accept=1 prop=14149 top1=14149 accp=0.487 next=pair draft=303 prop=303 pred gate=device Token # 1358: 112.086ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.949 next=draft=1557 prop=7163 olap pair=106.9ms serial=189.4ms gain=82.5ms ratio=0.44 s0=4.1ms s1=185.3ms wait=0.1/47.4ms pred gate=device Token # 1359: 3.770ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=1557 accp=0.875 next=pair draft=27521 prop=27521 pred gate=device Token # 1360: 112.290ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.6ms s1=185.1ms wait=0.1/46.6ms pred gate=device Token # 1361: 3.729ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=0.817 next=pair draft=24106 prop=24106 pred gate=device Token # 1362: 112.429ms; value: next_token_ids=tensor([24106], device='cuda:0') mtp accept=1 prop=24106 top1=24106 accp=0.946 next=draft=223 prop=223 olap pair=107.1ms serial=189.8ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.5ms wait=0.1/46.9ms pred gate=device Token # 1363: 3.864ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=1557 prop=1557 pred gate=device Token # 1364: 112.108ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=1.000 next=draft=35015 prop=35015 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.1ms wait=0.1/46.6ms pred gate=device Token # 1365: 3.781ms; value: next_token_ids=tensor([35015], device='cuda:0') mtp accept=1 prop=35015 top1=35015 accp=0.821 next=pair draft=223 prop=223 pred gate=device Token # 1366: 112.064ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=20192 prop=20192 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.0ms wait=0.1/46.8ms pred gate=device Token # 1367: 3.757ms; value: next_token_ids=tensor([20192], device='cuda:0') mtp accept=1 prop=20192 top1=20192 accp=1.000 next=pair draft=28494 prop=28494 pred gate=device Token # 1368: 112.044ms; value: next_token_ids=tensor([28494], device='cuda:0') mtp accept=1 prop=28494 top1=28494 accp=0.990 next=draft=19 prop=18 olap pair=106.8ms serial=189.3ms gain=82.5ms ratio=0.44 s0=4.3ms s1=184.9ms wait=0.1/46.7ms pred gate=device Token # 1369: 3.786ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=0 prop=18 top1=24 accp=0.036 next=pair draft=16 prop=16 pred gate=device Token # 1370: 112.219ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=13476 prop=13476 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=4.1ms s1=185.5ms wait=0.1/47.3ms pred gate=device Token # 1371: 3.711ms; value: next_token_ids=tensor([1173], device='cuda:0') mtp accept=0 prop=13476 top1=1173 accp=0.000 next=pair draft=1148 prop=1148 pred gate=device Token # 1372: 112.407ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=1148 top1=303 accp=0.489 next=draft=2636 prop=2636 olap pair=107.2ms serial=190.0ms gain=82.9ms ratio=0.44 s0=4.0ms s1=186.0ms wait=0.1/47.5ms pred gate=device Token # 1373: 112.119ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=1.000 next=draft=27342 prop=27342 olap pair=106.9ms serial=189.4ms gain=82.5ms ratio=0.44 s0=3.9ms s1=185.5ms wait=0.1/47.8ms pred gate=device Token # 1374: 3.740ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=27342 top1=16992 accp=0.321 next=pair draft=1557 prop=1557 pred gate=device Token # 1375: 112.325ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=0.977 next=draft=12 prop=12 olap pair=107.0ms serial=189.6ms gain=82.6ms ratio=0.44 s0=3.8ms s1=185.7ms wait=0.1/47.9ms pred gate=device Token # 1376: 3.822ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=20192 prop=20192 pred gate=device Token # 1377: 114.729ms; value: next_token_ids=tensor([20192], device='cuda:0') mtp accept=1 prop=20192 top1=20192 accp=1.000 next=draft=28494 prop=28494 olap pair=107.1ms serial=189.8ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.9ms wait=0.1/47.9ms pred gate=device Token # 1378: 3.722ms; value: next_token_ids=tensor([28494], device='cuda:0') mtp accept=1 prop=28494 top1=28494 accp=1.000 next=pair draft=24 prop=24 pred gate=device Token # 1379: 112.185ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=0.987 next=draft=438 prop=438 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.1ms s1=185.6ms wait=0.1/47.4ms pred gate=device Token # 1380: 3.734ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.963 next=pair draft=223 prop=223 pred gate=device Token # 1381: 112.608ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.997 next=draft=84968 prop=84968 olap pair=107.3ms serial=190.2ms gain=82.9ms ratio=0.44 s0=4.2ms s1=186.0ms wait=0.1/47.4ms pred gate=device Token # 1382: 3.735ms; value: next_token_ids=tensor([84968], device='cuda:0') mtp accept=1 prop=84968 top1=84968 accp=1.000 next=pair draft=303 prop=4339 pred gate=device Token # 1383: 112.347ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=0.191 next=draft=223 prop=223 olap pair=107.1ms serial=189.9ms gain=82.8ms ratio=0.44 s0=4.1ms s1=185.8ms wait=0.1/47.3ms pred gate=device Token # 1384: 3.747ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=545 accp=0.348 next=pair draft=7163 prop=7163 pred gate=device Token # 1385: 112.329ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.1ms serial=189.8ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.9ms wait=0.1/47.9ms pred gate=device Token # 1386: 3.716ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=31416 prop=31416 pred gate=device Token # 1387: 112.307ms; value: next_token_ids=tensor([31416], device='cuda:0') mtp accept=1 prop=31416 top1=31416 accp=1.000 next=draft=1148 prop=1148 olap pair=107.0ms serial=189.4ms gain=82.4ms ratio=0.43 s0=4.4ms s1=185.0ms wait=0.1/46.9ms pred gate=device Token # 1388: 3.705ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=0.974 next=pair draft=94073 prop=94073 pred gate=device Token # 1389: 112.708ms; value: next_token_ids=tensor([1300], device='cuda:0') mtp accept=0 prop=94073 top1=67384 accp=0.007 next=draft=4339 prop=4339 olap pair=107.5ms serial=190.5ms gain=83.0ms ratio=0.44 s0=4.3ms s1=186.2ms wait=0.1/47.0ms pred gate=device Token # 1390: 112.345ms; value: next_token_ids=tensor([1834], device='cuda:0') mtp accept=0 prop=4339 top1=1834 accp=0.317 next=draft=27972 prop=27972 olap pair=107.1ms serial=189.8ms gain=82.7ms ratio=0.44 s0=4.0ms s1=185.9ms wait=0.1/47.7ms pred gate=device Token # 1391: 112.240ms; value: next_token_ids=tensor([27972], device='cuda:0') mtp accept=1 prop=27972 top1=27972 accp=0.979 next=draft=768 prop=320 olap pair=107.0ms serial=189.6ms gain=82.6ms ratio=0.44 s0=4.0ms s1=185.6ms wait=0.1/47.7ms pred gate=device Token # 1392: 3.749ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.380 next=pair draft=14785 prop=14785 pred gate=device Token # 1393: 112.257ms; value: next_token_ids=tensor([9691], device='cuda:0') mtp accept=0 prop=14785 top1=9691 accp=0.423 next=draft=4339 prop=4339 olap pair=107.0ms serial=189.6ms gain=82.6ms ratio=0.44 s0=4.1ms s1=185.5ms wait=0.1/47.5ms pred gate=device Token # 1394: 112.308ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=0.999 next=draft=768 prop=768 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=3.9ms s1=185.9ms wait=0.1/47.9ms pred gate=device Token # 1395: 3.742ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=1557 prop=1557 pred gate=device Token # 1396: 112.495ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=1.000 next=draft=12 prop=12 olap pair=107.2ms serial=190.0ms gain=82.8ms ratio=0.44 s0=4.1ms s1=186.0ms wait=0.1/47.6ms pred gate=device Token # 1397: 3.844ms; value: next_token_ids=tensor([982], device='cuda:0') mtp accept=0 prop=12 top1=982 accp=0.346 next=pair draft=223 prop=223 pred gate=device Token # 1398: 112.258ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=20192 prop=20192 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.2ms s1=185.5ms wait=0.1/47.6ms pred gate=device Token # 1399: 3.861ms; value: next_token_ids=tensor([20192], device='cuda:0') mtp accept=1 prop=20192 top1=20192 accp=0.726 next=pair draft=28494 prop=28494 pred gate=device Token # 1400: 112.460ms; value: next_token_ids=tensor([28494], device='cuda:0') mtp accept=1 prop=28494 top1=28494 accp=1.000 next=draft=24 prop=24 olap pair=107.2ms serial=189.4ms gain=82.2ms ratio=0.43 s0=4.0ms s1=185.4ms wait=0.1/47.8ms pred gate=device Token # 1401: 3.883ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1402: 112.450ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.830 next=draft=1557 prop=1557 olap pair=107.2ms serial=190.0ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.6ms wait=0.1/47.0ms pred gate=device Token # 1403: 3.758ms; value: next_token_ids=tensor([19902], device='cuda:0') mtp accept=0 prop=1557 top1=19902 accp=0.025 next=pair draft=2821 prop=2821 pred gate=device Token # 1404: 112.117ms; value: next_token_ids=tensor([2821], device='cuda:0') mtp accept=1 prop=2821 top1=2821 accp=0.882 next=draft=768 prop=768 olap pair=106.7ms serial=189.1ms gain=82.4ms ratio=0.44 s0=4.5ms s1=184.6ms wait=0.1/46.6ms pred gate=device Token # 1405: 3.726ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=4339 prop=2541 pred gate=device Token # 1406: 112.209ms; value: next_token_ids=tensor([2541], device='cuda:0') mtp accept=1 prop=2541 top1=1557 accp=0.149 next=draft=2311 prop=2311 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.5ms s1=185.2ms wait=0.1/46.7ms pred gate=device Token # 1407: 3.750ms; value: next_token_ids=tensor([2311], device='cuda:0') mtp accept=1 prop=2311 top1=2311 accp=0.899 next=pair draft=20251 prop=20251 pred gate=device Token # 1408: 112.195ms; value: next_token_ids=tensor([20251], device='cuda:0') mtp accept=1 prop=20251 top1=20251 accp=0.900 next=draft=320 prop=320 olap pair=107.0ms serial=189.6ms gain=82.7ms ratio=0.44 s0=4.5ms s1=185.2ms wait=0.1/46.6ms pred gate=device Token # 1409: 3.746ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.880 next=pair draft=1557 prop=7163 pred gate=device Token # 1410: 112.419ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.126 next=draft=27521 prop=27521 olap pair=107.1ms serial=190.0ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.6ms wait=0.1/46.7ms pred gate=device Token # 1411: 3.755ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 1412: 112.137ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=0.999 next=draft=24106 prop=24106 olap pair=106.8ms serial=189.4ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.0ms wait=0.1/46.8ms pred gate=device Token # 1413: 3.735ms; value: next_token_ids=tensor([24106], device='cuda:0') mtp accept=1 prop=24106 top1=24106 accp=0.799 next=pair draft=223 prop=223 pred gate=device Token # 1414: 112.331ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=1557 prop=1557 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.3ms wait=0.1/46.9ms pred gate=device Token # 1415: 3.738ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1416: 112.136ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.936 next=draft=4339 prop=1557 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.2ms wait=0.1/46.8ms pred gate=device Token # 1417: 3.798ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=0.469 next=pair draft=12 prop=12 pred gate=device Token # 1418: 112.644ms; value: next_token_ids=tensor([982], device='cuda:0') mtp accept=0 prop=12 top1=982 accp=0.115 next=draft=223 prop=223 olap pair=107.3ms serial=190.4ms gain=83.0ms ratio=0.44 s0=4.4ms s1=185.9ms wait=0.1/46.8ms pred gate=device Token # 1419: 112.568ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=20192 prop=20192 olap pair=107.3ms serial=190.2ms gain=83.0ms ratio=0.44 s0=4.4ms s1=185.8ms wait=0.1/46.9ms pred gate=device Token # 1420: 3.741ms; value: next_token_ids=tensor([20192], device='cuda:0') mtp accept=1 prop=20192 top1=20192 accp=1.000 next=pair draft=28494 prop=28494 pred gate=device Token # 1421: 112.248ms; value: next_token_ids=tensor([28494], device='cuda:0') mtp accept=1 prop=28494 top1=28494 accp=1.000 next=draft=24 prop=24 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.3ms wait=0.1/46.9ms pred gate=device Token # 1422: 3.757ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=0.935 next=pair draft=438 prop=438 pred gate=device Token # 1423: 112.182ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=0 prop=438 top1=768 accp=0.010 next=draft=1557 prop=1557 olap pair=107.0ms serial=189.6ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.2ms wait=0.1/46.9ms pred gate=device Token # 1424: 112.774ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=0 prop=1557 top1=4339 accp=0.029 next=draft=223 prop=223 olap pair=107.4ms serial=190.3ms gain=82.9ms ratio=0.44 s0=4.3ms s1=185.9ms wait=0.1/47.1ms pred gate=device Token # 1425: 112.473ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.855 next=draft=20192 prop=20192 olap pair=107.1ms serial=189.7ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.3ms wait=0.1/46.9ms pred gate=device Token # 1426: 3.797ms; value: next_token_ids=tensor([1059], device='cuda:0') mtp accept=0 prop=20192 top1=20192 accp=0.839 next=pair draft=12 prop=12 pred gate=device Token # 1427: 112.256ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=0.965 next=draft=20192 prop=20192 olap pair=107.0ms serial=189.6ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.2ms wait=0.1/46.9ms pred gate=device Token # 1428: 3.803ms; value: next_token_ids=tensor([20192], device='cuda:0') mtp accept=1 prop=20192 top1=20192 accp=1.000 next=pair draft=28494 prop=28494 pred gate=device Token # 1429: 114.038ms; value: next_token_ids=tensor([28494], device='cuda:0') mtp accept=1 prop=28494 top1=28494 accp=1.000 next=draft=24 prop=24 olap pair=108.7ms serial=190.2ms gain=81.5ms ratio=0.43 s0=4.7ms s1=185.5ms wait=0.1/47.1ms pred gate=device Token # 1430: 3.779ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 1431: 113.056ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.993 next=draft=223 prop=223 olap pair=107.7ms serial=188.8ms gain=81.1ms ratio=0.43 s0=4.8ms s1=184.0ms wait=0.1/47.2ms pred gate=device Token # 1432: 3.780ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=6924 prop=6924 pred gate=device Token # 1433: 113.125ms; value: next_token_ids=tensor([6924], device='cuda:0') mtp accept=1 prop=6924 top1=6924 accp=1.000 next=draft=24518 prop=24518 olap pair=107.9ms serial=189.2ms gain=81.3ms ratio=0.43 s0=4.8ms s1=184.4ms wait=0.1/47.3ms pred gate=device Token # 1434: 3.778ms; value: next_token_ids=tensor([24518], device='cuda:0') mtp accept=1 prop=24518 top1=24518 accp=1.000 next=pair draft=23243 prop=23243 pred gate=device Token # 1435: 112.997ms; value: next_token_ids=tensor([23243], device='cuda:0') mtp accept=1 prop=23243 top1=23243 accp=1.000 next=draft=303 prop=303 olap pair=107.7ms serial=188.9ms gain=81.2ms ratio=0.43 s0=4.8ms s1=184.2ms wait=0.1/47.0ms pred gate=device Token # 1436: 3.788ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=65388 prop=65388 pred gate=device Token # 1437: 113.035ms; value: next_token_ids=tensor([65388], device='cuda:0') mtp accept=1 prop=65388 top1=65388 accp=1.000 next=draft=223 prop=223 olap pair=107.8ms serial=189.0ms gain=81.3ms ratio=0.43 s0=4.8ms s1=184.3ms wait=0.1/47.0ms pred gate=device Token # 1438: 4.987ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=20192 prop=20192 pred gate=device Token # 1439: 113.320ms; value: next_token_ids=tensor([20192], device='cuda:0') mtp accept=1 prop=20192 top1=20192 accp=1.000 next=draft=28494 prop=28494 olap pair=108.0ms serial=190.5ms gain=82.5ms ratio=0.43 s0=4.6ms s1=185.9ms wait=0.1/46.9ms pred gate=device Token # 1440: 3.780ms; value: next_token_ids=tensor([28494], device='cuda:0') mtp accept=1 prop=28494 top1=28494 accp=1.000 next=pair draft=24 prop=24 pred gate=device Token # 1441: 112.368ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=draft=223 prop=438 olap pair=107.0ms serial=189.6ms gain=82.6ms ratio=0.44 s0=4.1ms s1=185.6ms wait=0.1/47.5ms pred gate=device Token # 1442: 3.866ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.188 next=pair draft=223 prop=223 pred gate=device Token # 1443: 112.232ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=7163 prop=7163 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.0ms s1=185.8ms wait=0.1/47.9ms pred gate=device Token # 1444: 3.817ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 1445: 112.337ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=31416 prop=31416 olap pair=107.0ms serial=189.0ms gain=82.0ms ratio=0.43 s0=6.9ms s1=182.1ms wait=0.2/44.4ms pred gate=device Token # 1446: 3.737ms; value: next_token_ids=tensor([31416], device='cuda:0') mtp accept=1 prop=31416 top1=31416 accp=1.000 next=pair draft=303 prop=1148 pred gate=device Token # 1447: 112.353ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=0 prop=1148 top1=320 accp=0.027 next=draft=2636 prop=2636 olap pair=107.1ms serial=189.8ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.9ms wait=0.1/47.9ms pred gate=device Token # 1448: 112.559ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.969 next=draft=223 prop=56463 olap pair=107.3ms serial=190.2ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.4ms wait=0.1/48.1ms pred gate=device Token # 1449: 3.744ms; value: next_token_ids=tensor([56463], device='cuda:0') mtp accept=1 prop=56463 top1=223 accp=0.561 next=pair draft=223 prop=223 pred gate=device Token # 1450: 112.349ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.827 next=draft=7163 prop=7163 olap pair=107.1ms serial=189.8ms gain=82.7ms ratio=0.44 s0=4.2ms s1=185.6ms wait=0.1/47.6ms pred gate=device Token # 1451: 3.786ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 1452: 112.553ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=31416 prop=31416 olap pair=107.3ms serial=190.1ms gain=82.8ms ratio=0.44 s0=4.0ms s1=186.1ms wait=0.1/47.9ms pred gate=device Token # 1453: 3.771ms; value: next_token_ids=tensor([31416], device='cuda:0') mtp accept=1 prop=31416 top1=31416 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1454: 112.531ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.930 next=draft=1318 prop=2636 olap pair=107.3ms serial=189.5ms gain=82.2ms ratio=0.43 s0=4.3ms s1=185.2ms wait=0.1/47.3ms pred gate=device Token # 1455: 3.808ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.332 next=pair draft=223 prop=223 pred gate=device Token # 1456: 112.416ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.665 next=draft=7163 prop=7163 olap pair=107.2ms serial=190.0ms gain=82.8ms ratio=0.44 s0=3.9ms s1=186.1ms wait=0.1/48.0ms pred gate=device Token # 1457: 3.768ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=0 prop=7163 top1=1557 accp=0.408 next=pair draft=12 prop=12 pred gate=device Token # 1458: 112.435ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=0.998 next=draft=20192 prop=20192 olap pair=107.1ms serial=189.9ms gain=82.8ms ratio=0.44 s0=3.9ms s1=186.0ms wait=0.1/47.9ms pred gate=device Token # 1459: 3.704ms; value: next_token_ids=tensor([20192], device='cuda:0') mtp accept=1 prop=20192 top1=20192 accp=1.000 next=pair draft=28494 prop=28494 pred gate=device Token # 1460: 112.280ms; value: next_token_ids=tensor([28494], device='cuda:0') mtp accept=1 prop=28494 top1=28494 accp=1.000 next=draft=24 prop=23 olap pair=107.1ms serial=189.9ms gain=82.8ms ratio=0.44 s0=3.8ms s1=186.1ms wait=0.1/48.2ms pred gate=device Token # 1461: 3.770ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=0 prop=23 top1=24 accp=0.743 next=pair draft=438 prop=438 pred gate=device Token # 1462: 112.678ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.718 next=draft=223 prop=223 olap pair=107.3ms serial=189.5ms gain=82.2ms ratio=0.43 s0=6.9ms s1=182.6ms wait=0.2/44.4ms pred gate=device Token # 1463: 3.795ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1464: 112.371ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.1ms serial=189.7ms gain=82.6ms ratio=0.44 s0=4.1ms s1=185.6ms wait=0.1/47.8ms pred gate=device Token # 1465: 3.796ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=31416 prop=31416 pred gate=device Token # 1466: 112.265ms; value: next_token_ids=tensor([31416], device='cuda:0') mtp accept=1 prop=31416 top1=31416 accp=1.000 next=draft=303 prop=303 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.1ms s1=185.6ms wait=0.1/47.8ms pred gate=device Token # 1467: 3.782ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.999 next=pair draft=1318 prop=1318 pred gate=device Token # 1468: 112.478ms; value: next_token_ids=tensor([1318], device='cuda:0') mtp accept=1 prop=1318 top1=1318 accp=0.999 next=draft=223 prop=223 olap pair=107.3ms serial=190.2ms gain=82.9ms ratio=0.44 s0=4.2ms s1=186.0ms wait=0.1/47.9ms pred gate=device Token # 1469: 3.791ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=0 prop=223 top1=223 accp=0.868 next=pair draft=27521 prop=27521 pred gate=device Token # 1470: 112.353ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.8ms wait=0.1/48.3ms pred gate=device Token # 1471: 3.769ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=547 prop=547 pred gate=device Token # 1472: 112.204ms; value: next_token_ids=tensor([547], device='cuda:0') mtp accept=1 prop=547 top1=547 accp=0.999 next=draft=3885 prop=3885 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.6ms wait=0.1/48.2ms pred gate=device Token # 1473: 3.836ms; value: next_token_ids=tensor([3885], device='cuda:0') mtp accept=1 prop=3885 top1=3885 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1474: 112.319ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=2636 prop=2636 olap pair=107.1ms serial=189.6ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.2ms wait=0.1/47.1ms pred gate=device Token # 1475: 3.771ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1476: 112.315ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.999 next=draft=1557 prop=1557 olap pair=107.1ms serial=189.9ms gain=82.8ms ratio=0.44 s0=3.9ms s1=186.0ms wait=0.1/48.3ms pred gate=device Token # 1477: 3.766ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=1.000 next=pair draft=12 prop=12 pred gate=device Token # 1478: 113.078ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=draft=20192 prop=20192 olap pair=107.8ms serial=189.1ms gain=81.3ms ratio=0.43 s0=4.4ms s1=184.7ms wait=0.1/47.8ms pred gate=device Token # 1479: 3.756ms; value: next_token_ids=tensor([20192], device='cuda:0') mtp accept=1 prop=20192 top1=20192 accp=1.000 next=pair draft=28494 prop=28494 pred gate=device Token # 1480: 112.540ms; value: next_token_ids=tensor([28494], device='cuda:0') mtp accept=1 prop=28494 top1=28494 accp=0.998 next=draft=23 prop=23 olap pair=107.3ms serial=190.2ms gain=82.9ms ratio=0.44 s0=4.5ms s1=185.7ms wait=0.1/47.6ms pred gate=device Token # 1481: 3.810ms; value: next_token_ids=tensor([23], device='cuda:0') mtp accept=1 prop=23 top1=23 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 1482: 112.464ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.998 next=draft=223 prop=223 olap pair=107.2ms serial=190.0ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.2ms wait=0.1/48.6ms pred gate=device Token # 1483: 3.840ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1484: 112.570ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.3ms serial=190.2ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.4ms wait=0.1/48.6ms pred gate=device Token # 1485: 3.746ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=31416 prop=31416 pred gate=device Token # 1486: 112.225ms; value: next_token_ids=tensor([31416], device='cuda:0') mtp accept=1 prop=31416 top1=31416 accp=1.000 next=draft=565 prop=565 olap pair=107.0ms serial=189.6ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.8ms wait=0.1/48.3ms pred gate=device Token # 1487: 3.797ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1488: 112.581ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=1557 prop=1557 olap pair=107.2ms serial=190.2ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.4ms wait=0.1/48.5ms pred gate=device Token # 1489: 3.764ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 1490: 112.425ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=107.1ms serial=189.9ms gain=82.7ms ratio=0.44 s0=4.0ms s1=185.8ms wait=0.1/48.0ms pred gate=device Token # 1491: 3.817ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1492: 113.369ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.3ms serial=190.1ms gain=82.8ms ratio=0.44 s0=4.1ms s1=186.0ms wait=0.1/47.9ms pred gate=device Token # 1493: 4.692ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=28782 prop=28782 pred gate=device Token # 1494: 112.487ms; value: next_token_ids=tensor([28782], device='cuda:0') mtp accept=1 prop=28782 top1=28782 accp=1.000 next=draft=320 prop=303 olap pair=107.0ms serial=188.4ms gain=81.3ms ratio=0.43 s0=8.6ms s1=179.8ms wait=0.2/42.3ms pred gate=device Token # 1495: 3.807ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.146 next=pair draft=12223 prop=12223 pred gate=device Token # 1496: 112.338ms; value: next_token_ids=tensor([12223], device='cuda:0') mtp accept=1 prop=12223 top1=12223 accp=0.998 next=draft=547 prop=547 olap pair=107.0ms serial=189.9ms gain=82.8ms ratio=0.44 s0=3.8ms s1=186.1ms wait=0.1/48.6ms pred gate=device Token # 1497: 3.796ms; value: next_token_ids=tensor([547], device='cuda:0') mtp accept=1 prop=547 top1=547 accp=1.000 next=pair draft=10095 prop=10095 pred gate=device Token # 1498: 112.569ms; value: next_token_ids=tensor([10095], device='cuda:0') mtp accept=1 prop=10095 top1=10095 accp=1.000 next=draft=320 prop=320 olap pair=107.3ms serial=189.9ms gain=82.6ms ratio=0.44 s0=4.2ms s1=185.8ms wait=0.1/47.7ms pred gate=device Token # 1499: 3.775ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=1557 prop=1557 pred gate=device Token # 1500: 112.616ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=0.862 next=draft=12 prop=12 olap pair=107.4ms serial=190.3ms gain=83.0ms ratio=0.44 s0=4.1ms s1=186.3ms wait=0.1/48.0ms pred gate=device Token # 1501: 3.812ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=20192 prop=20192 pred gate=device Token # 1502: 112.151ms; value: next_token_ids=tensor([20192], device='cuda:0') mtp accept=1 prop=20192 top1=20192 accp=1.000 next=draft=28494 prop=28494 olap pair=106.9ms serial=189.6ms gain=82.6ms ratio=0.44 s0=3.8ms s1=185.7ms wait=0.1/48.5ms pred gate=device Token # 1503: 3.728ms; value: next_token_ids=tensor([28494], device='cuda:0') mtp accept=1 prop=28494 top1=28494 accp=1.000 next=pair draft=22 prop=22 pred gate=device Token # 1504: 112.554ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=draft=438 prop=438 olap pair=107.3ms serial=190.3ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.5ms wait=0.1/48.6ms pred gate=device Token # 1505: 3.810ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.990 next=pair draft=223 prop=223 pred gate=device Token # 1506: 112.277ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=7163 prop=7163 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=3.8ms s1=185.9ms wait=0.1/48.5ms pred gate=device Token # 1507: 3.808ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 1508: 112.676ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=28782 prop=28782 olap pair=107.2ms serial=190.0ms gain=82.8ms ratio=0.44 s0=4.3ms s1=185.7ms wait=0.1/47.9ms pred gate=device Token # 1509: 3.787ms; value: next_token_ids=tensor([28782], device='cuda:0') mtp accept=1 prop=28782 top1=28782 accp=1.000 next=pair draft=565 prop=565 pred gate=device Token # 1510: 112.473ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=1.000 next=draft=223 prop=223 olap pair=107.2ms serial=190.0ms gain=82.8ms ratio=0.44 s0=4.0ms s1=186.0ms wait=0.1/48.3ms pred gate=device Token # 1511: 3.859ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=1557 prop=1557 pred gate=device Token # 1512: 112.654ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=1.000 next=draft=438 prop=438 olap pair=107.4ms serial=190.4ms gain=83.0ms ratio=0.44 s0=4.0ms s1=186.4ms wait=0.1/48.2ms pred gate=device Token # 1513: 3.843ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.999 next=pair draft=223 prop=223 pred gate=device Token # 1514: 112.773ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=7163 prop=7163 olap pair=107.3ms serial=190.3ms gain=83.0ms ratio=0.44 s0=3.9ms s1=186.4ms wait=0.1/48.3ms pred gate=device Token # 1515: 3.853ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 1516: 112.472ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=28940 prop=28940 olap pair=107.1ms serial=189.7ms gain=82.6ms ratio=0.44 s0=4.2ms s1=185.5ms wait=0.1/47.6ms pred gate=device Token # 1517: 3.736ms; value: next_token_ids=tensor([28940], device='cuda:0') mtp accept=1 prop=28940 top1=28940 accp=1.000 next=pair draft=320 prop=303 pred gate=device Token # 1518: 112.194ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.461 next=draft=547 prop=547 olap pair=106.9ms serial=189.4ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.5ms wait=0.1/48.4ms pred gate=device Token # 1519: 3.873ms; value: next_token_ids=tensor([547], device='cuda:0') mtp accept=1 prop=547 top1=547 accp=1.000 next=pair draft=9107 prop=9107 pred gate=device Token # 1520: 113.400ms; value: next_token_ids=tensor([9107], device='cuda:0') mtp accept=1 prop=9107 top1=9107 accp=1.000 next=draft=320 prop=320 olap pair=107.3ms serial=189.3ms gain=82.0ms ratio=0.43 s0=6.4ms s1=182.8ms wait=0.2/45.1ms pred gate=device Token # 1521: 4.628ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=1557 prop=1557 pred gate=device Token # 1522: 112.838ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=0.981 next=draft=12 prop=12 olap pair=107.5ms serial=190.0ms gain=82.5ms ratio=0.43 s0=4.3ms s1=185.6ms wait=0.1/47.8ms pred gate=device Token # 1523: 3.796ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=20192 prop=20192 pred gate=device Token # 1524: 112.117ms; value: next_token_ids=tensor([20192], device='cuda:0') mtp accept=1 prop=20192 top1=20192 accp=1.000 next=draft=28494 prop=28494 olap pair=106.9ms serial=189.6ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.7ms wait=0.1/48.4ms pred gate=device Token # 1525: 3.762ms; value: next_token_ids=tensor([28494], device='cuda:0') mtp accept=1 prop=28494 top1=28494 accp=0.991 next=pair draft=21 prop=21 pred gate=device Token # 1526: 112.612ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=0.999 next=draft=438 prop=438 olap pair=107.3ms serial=190.2ms gain=82.9ms ratio=0.44 s0=4.0ms s1=186.2ms wait=0.1/48.2ms pred gate=device Token # 1527: 3.844ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.995 next=pair draft=223 prop=223 pred gate=device Token # 1528: 112.518ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=7163 prop=7163 olap pair=107.2ms serial=190.1ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.2ms wait=0.1/48.5ms pred gate=device Token # 1529: 3.752ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 1530: 112.463ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=28940 prop=28940 olap pair=107.0ms serial=189.5ms gain=82.5ms ratio=0.44 s0=4.2ms s1=185.3ms wait=0.1/47.4ms pred gate=device Token # 1531: 3.835ms; value: next_token_ids=tensor([28940], device='cuda:0') mtp accept=1 prop=28940 top1=28940 accp=1.000 next=pair draft=565 prop=15 pred gate=device Token # 1532: 112.917ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=0 prop=15 top1=565 accp=0.947 next=draft=223 prop=223 olap pair=107.6ms serial=189.8ms gain=82.2ms ratio=0.43 s0=4.1ms s1=185.7ms wait=0.1/48.1ms pred gate=device Token # 1533: 112.836ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.978 next=draft=1557 prop=1557 olap pair=107.4ms serial=190.4ms gain=83.0ms ratio=0.44 s0=4.0ms s1=186.4ms wait=0.1/48.2ms pred gate=device Token # 1534: 3.775ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 1535: 112.362ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=107.1ms serial=189.8ms gain=82.8ms ratio=0.44 s0=3.8ms s1=186.0ms wait=0.1/48.6ms pred gate=device Token # 1536: 3.881ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1537: 112.479ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.2ms wait=0.1/48.5ms pred gate=device Token # 1538: 3.818ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=25624 prop=25624 pred gate=device Token # 1539: 112.282ms; value: next_token_ids=tensor([25624], device='cuda:0') mtp accept=1 prop=25624 top1=25624 accp=1.000 next=draft=303 prop=303 olap pair=107.1ms serial=189.8ms gain=82.7ms ratio=0.44 s0=4.2ms s1=185.5ms wait=0.1/47.6ms pred gate=device Token # 1540: 3.885ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.986 next=pair draft=547 prop=547 pred gate=device Token # 1541: 113.276ms; value: next_token_ids=tensor([547], device='cuda:0') mtp accept=1 prop=547 top1=547 accp=1.000 next=draft=7336 prop=7336 olap pair=107.1ms serial=189.6ms gain=82.5ms ratio=0.44 s0=5.2ms s1=184.4ms wait=0.2/46.7ms pred gate=device Token # 1542: 4.605ms; value: next_token_ids=tensor([7336], device='cuda:0') mtp accept=1 prop=7336 top1=7336 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1543: 112.513ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=1557 prop=1557 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=4.0ms s1=186.0ms wait=0.1/48.3ms pred gate=device Token # 1544: 3.790ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=1.000 next=pair draft=12 prop=12 pred gate=device Token # 1545: 112.459ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=draft=20192 prop=20192 olap pair=107.2ms serial=190.2ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.4ms wait=0.1/48.6ms pred gate=device Token # 1546: 3.805ms; value: next_token_ids=tensor([20192], device='cuda:0') mtp accept=1 prop=20192 top1=20192 accp=1.000 next=pair draft=28494 prop=28494 pred gate=device Token # 1547: 112.289ms; value: next_token_ids=tensor([28494], device='cuda:0') mtp accept=1 prop=28494 top1=28494 accp=1.000 next=draft=20 prop=20 olap pair=107.1ms serial=189.8ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.9ms wait=0.1/48.4ms pred gate=device Token # 1548: 3.808ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 1549: 112.569ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=107.3ms serial=190.2ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.4ms wait=0.1/48.5ms pred gate=device Token # 1550: 3.801ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1551: 112.422ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.2ms serial=189.8ms gain=82.7ms ratio=0.44 s0=3.9ms s1=186.0ms wait=0.1/48.4ms pred gate=device Token # 1552: 3.846ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=25624 prop=25624 pred gate=device Token # 1553: 112.287ms; value: next_token_ids=tensor([25624], device='cuda:0') mtp accept=1 prop=25624 top1=25624 accp=1.000 next=draft=565 prop=565 olap pair=107.1ms serial=189.9ms gain=82.8ms ratio=0.44 s0=3.8ms s1=186.0ms wait=0.1/48.5ms pred gate=device Token # 1554: 3.863ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=0.999 next=pair draft=223 prop=223 pred gate=device Token # 1555: 112.600ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.997 next=draft=1557 prop=1557 olap pair=107.3ms serial=190.2ms gain=82.9ms ratio=0.44 s0=4.3ms s1=185.9ms wait=0.1/47.4ms pred gate=device Token # 1556: 3.796ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 1557: 112.163ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=106.9ms serial=189.6ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.1ms wait=0.1/46.9ms pred gate=device Token # 1558: 3.835ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1559: 112.647ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.4ms serial=190.2ms gain=82.8ms ratio=0.44 s0=4.7ms s1=185.5ms wait=0.1/46.8ms pred gate=device Token # 1560: 3.806ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=27183 prop=27183 pred gate=device Token # 1561: 112.353ms; value: next_token_ids=tensor([27183], device='cuda:0') mtp accept=1 prop=27183 top1=27183 accp=1.000 next=draft=303 prop=303 olap pair=107.1ms serial=189.8ms gain=82.7ms ratio=0.44 s0=4.9ms s1=185.0ms wait=0.1/46.5ms pred gate=device Token # 1562: 3.850ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=547 prop=547 pred gate=device Token # 1563: 112.253ms; value: next_token_ids=tensor([547], device='cuda:0') mtp accept=1 prop=547 top1=547 accp=1.000 next=draft=3045 prop=3045 olap pair=106.9ms serial=189.4ms gain=82.5ms ratio=0.44 s0=4.9ms s1=184.5ms wait=0.1/46.5ms pred gate=device Token # 1564: 3.770ms; value: next_token_ids=tensor([3045], device='cuda:0') mtp accept=1 prop=3045 top1=3045 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1565: 112.439ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=1557 prop=1557 olap pair=107.2ms serial=190.0ms gain=82.8ms ratio=0.44 s0=4.9ms s1=185.0ms wait=0.1/46.5ms pred gate=device Token # 1566: 3.777ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=1.000 next=pair draft=12 prop=12 pred gate=device Token # 1567: 112.425ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=draft=20192 prop=20192 olap pair=107.1ms serial=189.9ms gain=82.8ms ratio=0.44 s0=4.9ms s1=184.9ms wait=0.1/46.4ms pred gate=device Token # 1568: 3.768ms; value: next_token_ids=tensor([20192], device='cuda:0') mtp accept=1 prop=20192 top1=20192 accp=1.000 next=pair draft=28494 prop=28494 pred gate=device Token # 1569: 112.191ms; value: next_token_ids=tensor([28494], device='cuda:0') mtp accept=1 prop=28494 top1=28494 accp=1.000 next=draft=19 prop=19 olap pair=107.0ms serial=189.5ms gain=82.5ms ratio=0.44 s0=4.9ms s1=184.6ms wait=0.1/46.4ms pred gate=device Token # 1570: 3.742ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 1571: 112.428ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=107.1ms serial=190.0ms gain=82.8ms ratio=0.44 s0=4.5ms s1=185.5ms wait=0.1/47.1ms pred gate=device Token # 1572: 3.855ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1573: 112.581ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.3ms serial=190.2ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.8ms wait=0.1/47.1ms pred gate=device Token # 1574: 3.744ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=27183 prop=27183 pred gate=device Token # 1575: 112.274ms; value: next_token_ids=tensor([27183], device='cuda:0') mtp accept=1 prop=27183 top1=27183 accp=1.000 next=draft=565 prop=565 olap pair=107.1ms serial=189.7ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.3ms wait=0.1/47.1ms pred gate=device Token # 1576: 3.774ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1577: 112.872ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.999 next=draft=1557 prop=1557 olap pair=107.5ms serial=190.7ms gain=83.2ms ratio=0.44 s0=4.3ms s1=186.4ms wait=0.1/47.4ms pred gate=device Token # 1578: 3.780ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 1579: 112.204ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.1ms wait=0.1/47.2ms pred gate=device Token # 1580: 3.838ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1581: 112.805ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.5ms serial=190.7ms gain=83.2ms ratio=0.44 s0=4.4ms s1=186.3ms wait=0.1/47.2ms pred gate=device Token # 1582: 3.794ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=28983 prop=28983 pred gate=device Token # 1583: 112.581ms; value: next_token_ids=tensor([28983], device='cuda:0') mtp accept=1 prop=28983 top1=28983 accp=1.000 next=draft=303 prop=303 olap pair=107.3ms serial=190.3ms gain=83.0ms ratio=0.44 s0=3.9ms s1=186.4ms wait=0.1/48.4ms pred gate=device Token # 1584: 3.854ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=547 prop=547 pred gate=device Token # 1585: 112.374ms; value: next_token_ids=tensor([547], device='cuda:0') mtp accept=1 prop=547 top1=547 accp=1.000 next=draft=2170 prop=2170 olap pair=107.1ms serial=189.8ms gain=82.8ms ratio=0.44 s0=3.9ms s1=185.9ms wait=0.1/48.2ms pred gate=device Token # 1586: 3.850ms; value: next_token_ids=tensor([2170], device='cuda:0') mtp accept=1 prop=2170 top1=2170 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1587: 112.364ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=1557 prop=1557 olap pair=107.1ms serial=189.8ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.4ms wait=0.1/47.1ms pred gate=device Token # 1588: 3.804ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=1.000 next=pair draft=12 prop=12 pred gate=device Token # 1589: 112.838ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=draft=20192 prop=20192 olap pair=107.5ms serial=190.6ms gain=83.1ms ratio=0.44 s0=4.2ms s1=186.4ms wait=0.1/47.4ms pred gate=device Token # 1590: 3.788ms; value: next_token_ids=tensor([20192], device='cuda:0') mtp accept=1 prop=20192 top1=20192 accp=1.000 next=pair draft=28494 prop=28494 pred gate=device Token # 1591: 112.392ms; value: next_token_ids=tensor([28494], device='cuda:0') mtp accept=1 prop=28494 top1=28494 accp=1.000 next=draft=18 prop=18 olap pair=107.1ms serial=189.9ms gain=82.8ms ratio=0.44 s0=4.1ms s1=185.8ms wait=0.1/47.8ms pred gate=device Token # 1592: 3.750ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 1593: 113.775ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=107.5ms serial=189.6ms gain=82.1ms ratio=0.43 s0=5.5ms s1=184.1ms wait=0.1/46.3ms pred gate=device Token # 1594: 4.913ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1595: 113.611ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=108.1ms serial=190.5ms gain=82.3ms ratio=0.43 s0=4.3ms s1=186.2ms wait=0.1/48.1ms pred gate=device Token # 1596: 3.884ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=28983 prop=28983 pred gate=device Token # 1597: 112.273ms; value: next_token_ids=tensor([28983], device='cuda:0') mtp accept=1 prop=28983 top1=28983 accp=1.000 next=draft=565 prop=565 olap pair=107.0ms serial=189.7ms gain=82.6ms ratio=0.44 s0=3.9ms s1=185.7ms wait=0.1/48.2ms pred gate=device Token # 1598: 3.785ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1599: 112.755ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.997 next=draft=1557 prop=1557 olap pair=107.4ms serial=190.5ms gain=83.1ms ratio=0.44 s0=3.9ms s1=186.6ms wait=0.1/48.4ms pred gate=device Token # 1600: 3.802ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 1601: 113.549ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=107.4ms serial=189.7ms gain=82.2ms ratio=0.43 s0=5.8ms s1=183.9ms wait=0.2/45.9ms pred gate=device Token # 1602: 3.995ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1603: 112.880ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.5ms serial=190.4ms gain=82.9ms ratio=0.44 s0=4.0ms s1=186.4ms wait=0.1/48.2ms pred gate=device Token # 1604: 3.783ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=21814 prop=21814 pred gate=device Token # 1605: 112.504ms; value: next_token_ids=tensor([21814], device='cuda:0') mtp accept=1 prop=21814 top1=21814 accp=1.000 next=draft=303 prop=303 olap pair=107.2ms serial=190.2ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.3ms wait=0.1/48.6ms pred gate=device Token # 1606: 3.821ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=547 prop=547 pred gate=device Token # 1607: 112.480ms; value: next_token_ids=tensor([547], device='cuda:0') mtp accept=1 prop=547 top1=547 accp=1.000 next=draft=511 prop=511 olap pair=107.2ms serial=190.2ms gain=83.0ms ratio=0.44 s0=3.9ms s1=186.3ms wait=0.1/48.4ms pred gate=device Token # 1608: 3.783ms; value: next_token_ids=tensor([511], device='cuda:0') mtp accept=1 prop=511 top1=511 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1609: 113.176ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=1557 prop=1557 olap pair=107.1ms serial=188.5ms gain=81.5ms ratio=0.43 s0=7.8ms s1=180.7ms wait=0.2/43.7ms pred gate=device Token # 1610: 4.825ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=0.978 next=pair draft=12 prop=12 pred gate=device Token # 1611: 113.006ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=draft=20192 prop=20192 olap pair=107.4ms serial=188.9ms gain=81.5ms ratio=0.43 s0=8.8ms s1=180.1ms wait=0.2/42.3ms pred gate=device Token # 1612: 3.854ms; value: next_token_ids=tensor([20192], device='cuda:0') mtp accept=1 prop=20192 top1=20192 accp=1.000 next=pair draft=26362 prop=26362 pred gate=device Token # 1613: 113.598ms; value: next_token_ids=tensor([26362], device='cuda:0') mtp accept=1 prop=26362 top1=26362 accp=1.000 next=draft=27 prop=27 olap pair=107.5ms serial=190.0ms gain=82.5ms ratio=0.43 s0=6.1ms s1=183.9ms wait=0.2/45.6ms pred gate=device Token # 1614: 3.910ms; value: next_token_ids=tensor([27], device='cuda:0') mtp accept=1 prop=27 top1=27 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 1615: 113.364ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=107.2ms serial=189.0ms gain=81.8ms ratio=0.43 s0=8.1ms s1=181.0ms wait=0.2/43.2ms pred gate=device Token # 1616: 4.689ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1617: 112.985ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.5ms serial=189.5ms gain=82.0ms ratio=0.43 s0=7.8ms s1=181.7ms wait=0.2/43.4ms pred gate=device Token # 1618: 3.817ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=21814 prop=21814 pred gate=device Token # 1619: 112.749ms; value: next_token_ids=tensor([21814], device='cuda:0') mtp accept=1 prop=21814 top1=21814 accp=1.000 next=draft=565 prop=565 olap pair=107.5ms serial=189.2ms gain=81.7ms ratio=0.43 s0=5.8ms s1=183.4ms wait=0.2/45.9ms pred gate=device Token # 1620: 3.833ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1621: 112.717ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=1557 prop=1557 olap pair=107.4ms serial=190.7ms gain=83.3ms ratio=0.44 s0=3.9ms s1=186.7ms wait=0.1/48.4ms pred gate=device Token # 1622: 3.789ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 1623: 112.524ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=107.2ms serial=190.1ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.2ms wait=0.1/48.4ms pred gate=device Token # 1624: 3.865ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1625: 112.577ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.3ms serial=190.2ms gain=83.0ms ratio=0.44 s0=3.9ms s1=186.4ms wait=0.1/48.4ms pred gate=device Token # 1626: 3.936ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=27582 prop=27582 pred gate=device Token # 1627: 112.457ms; value: next_token_ids=tensor([27582], device='cuda:0') mtp accept=1 prop=27582 top1=27582 accp=1.000 next=draft=320 prop=303 olap pair=107.2ms serial=190.0ms gain=82.9ms ratio=0.44 s0=4.0ms s1=186.1ms wait=0.1/48.3ms pred gate=device Token # 1628: 3.839ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=0 prop=303 top1=320 accp=0.542 next=pair draft=2636 prop=2636 pred gate=device Token # 1629: 112.899ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.996 next=draft=223 prop=223 olap pair=107.3ms serial=189.9ms gain=82.6ms ratio=0.43 s0=4.4ms s1=185.5ms wait=0.1/47.7ms pred gate=device Token # 1630: 3.836ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.999 next=pair draft=7163 prop=7163 pred gate=device Token # 1631: 113.602ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=0 prop=7163 top1=1557 accp=0.260 next=draft=12 prop=12 olap pair=107.4ms serial=189.1ms gain=81.7ms ratio=0.43 s0=7.2ms s1=181.9ms wait=0.2/44.4ms pred gate=device Token # 1632: 113.023ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=0.998 next=draft=20192 prop=20192 olap pair=107.4ms serial=189.9ms gain=82.5ms ratio=0.43 s0=5.2ms s1=184.7ms wait=0.1/46.3ms pred gate=device Token # 1633: 3.831ms; value: next_token_ids=tensor([20192], device='cuda:0') mtp accept=1 prop=20192 top1=20192 accp=1.000 next=pair draft=26362 prop=26362 pred gate=device Token # 1634: 112.803ms; value: next_token_ids=tensor([26362], device='cuda:0') mtp accept=1 prop=26362 top1=26362 accp=1.000 next=draft=27 prop=27 olap pair=107.5ms serial=189.3ms gain=81.7ms ratio=0.43 s0=4.7ms s1=184.6ms wait=0.1/47.4ms pred gate=device Token # 1635: 3.773ms; value: next_token_ids=tensor([27], device='cuda:0') mtp accept=1 prop=27 top1=27 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 1636: 112.875ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=107.2ms serial=189.1ms gain=81.8ms ratio=0.43 s0=6.9ms s1=182.2ms wait=0.2/44.6ms pred gate=device Token # 1637: 3.800ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1638: 113.139ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.8ms serial=189.6ms gain=81.8ms ratio=0.43 s0=4.2ms s1=185.4ms wait=0.1/48.3ms pred gate=device Token # 1639: 3.848ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=27582 prop=27582 pred gate=device Token # 1640: 113.087ms; value: next_token_ids=tensor([27582], device='cuda:0') mtp accept=1 prop=27582 top1=27582 accp=1.000 next=draft=320 prop=303 olap pair=107.8ms serial=189.7ms gain=82.0ms ratio=0.43 s0=4.2ms s1=185.5ms wait=0.1/48.0ms pred gate=device Token # 1641: 3.922ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=0 prop=303 top1=320 accp=0.547 next=pair draft=3996 prop=3996 pred gate=device Token # 1642: 113.022ms; value: next_token_ids=tensor([4272], device='cuda:0') mtp accept=0 prop=3996 top1=4272 accp=0.143 next=draft=223 prop=223 olap pair=107.7ms serial=189.6ms gain=81.9ms ratio=0.43 s0=4.2ms s1=185.4ms wait=0.1/48.0ms pred gate=device Token # 1643: 112.757ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.973 next=draft=7163 prop=7163 olap pair=107.3ms serial=190.2ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.3ms wait=0.1/48.3ms pred gate=device Token # 1644: 3.813ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 1645: 112.461ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=107.1ms serial=189.5ms gain=82.4ms ratio=0.43 s0=5.4ms s1=184.1ms wait=0.1/46.4ms pred gate=device Token # 1646: 3.830ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=565 prop=565 pred gate=device Token # 1647: 112.980ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=1.000 next=draft=223 prop=223 olap pair=107.6ms serial=190.3ms gain=82.7ms ratio=0.43 s0=4.3ms s1=186.0ms wait=0.1/47.9ms pred gate=device Token # 1648: 3.857ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1649: 114.333ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=107.5ms serial=189.2ms gain=81.7ms ratio=0.43 s0=7.7ms s1=181.6ms wait=0.2/43.6ms pred gate=device Token # 1650: 3.886ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=27582 prop=27582 pred gate=device Token # 1651: 113.826ms; value: next_token_ids=tensor([27582], device='cuda:0') mtp accept=1 prop=27582 top1=27582 accp=1.000 next=draft=438 prop=438 olap pair=107.6ms serial=189.6ms gain=82.0ms ratio=0.43 s0=7.0ms s1=182.6ms wait=0.2/44.4ms pred gate=device Token # 1652: 4.722ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1653: 112.761ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=553 prop=553 olap pair=107.3ms serial=189.6ms gain=82.4ms ratio=0.43 s0=5.6ms s1=184.0ms wait=0.1/46.2ms pred gate=device Token # 1654: 3.780ms; value: next_token_ids=tensor([553], device='cuda:0') mtp accept=1 prop=553 top1=553 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 1655: 112.896ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.941 next=draft=2636 prop=2636 olap pair=107.5ms serial=189.9ms gain=82.4ms ratio=0.43 s0=4.2ms s1=185.7ms wait=0.1/47.7ms pred gate=device Token # 1656: 3.926ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.961 next=pair draft=4169 prop=4169 pred gate=device Token # 1657: 112.859ms; value: next_token_ids=tensor([4169], device='cuda:0') mtp accept=1 prop=4169 top1=4169 accp=1.000 next=draft=40180 prop=40180 olap pair=107.6ms serial=189.8ms gain=82.2ms ratio=0.43 s0=5.5ms s1=184.3ms wait=0.1/46.4ms pred gate=device Token # 1658: 3.862ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=0 prop=40180 top1=996 accp=0.091 next=pair draft=553 prop=553 pred gate=device Token # 1659: 112.605ms; value: next_token_ids=tensor([553], device='cuda:0') mtp accept=1 prop=553 top1=553 accp=1.000 next=draft=320 prop=320 olap pair=107.3ms serial=189.6ms gain=82.3ms ratio=0.43 s0=6.4ms s1=183.2ms wait=0.2/45.4ms pred gate=device Token # 1660: 3.822ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.803 next=pair draft=113008 prop=113008 pred gate=device Token # 1661: 112.901ms; value: next_token_ids=tensor([113008], device='cuda:0') mtp accept=1 prop=113008 top1=113008 accp=0.999 next=draft=1557 prop=1557 olap pair=107.6ms serial=189.4ms gain=81.8ms ratio=0.43 s0=4.2ms s1=185.2ms wait=0.1/48.2ms pred gate=device Token # 1662: 3.871ms; value: next_token_ids=tensor([1557], device='cuda:0') mtp accept=1 prop=1557 top1=1557 accp=1.000 next=pair draft=124637 prop=124637 pred gate=device Token # 1663: 112.446ms; value: next_token_ids=tensor([124637], device='cuda:0') mtp accept=1 prop=124637 top1=124637 accp=1.000 next=draft=478 prop=478 olap pair=107.2ms serial=189.9ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.4ms wait=0.1/47.7ms pred gate=device Token # 1664: 3.817ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=pair draft=2181 prop=2181 pred gate=device Token # 1665: 112.803ms; value: next_token_ids=tensor([2181], device='cuda:0') mtp accept=1 prop=2181 top1=2181 accp=0.866 next=draft=768 prop=768 olap pair=107.6ms serial=190.3ms gain=82.7ms ratio=0.43 s0=4.1ms s1=186.2ms wait=0.1/48.1ms pred gate=device Token # 1666: 3.753ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1667: 112.594ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.999 next=draft=27521 prop=27521 olap pair=107.3ms serial=189.9ms gain=82.6ms ratio=0.44 s0=6.1ms s1=183.9ms wait=0.2/45.8ms pred gate=device Token # 1668: 3.762ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 1669: 112.243ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=draft=1267 prop=1267 olap pair=107.0ms serial=189.0ms gain=82.0ms ratio=0.43 s0=6.4ms s1=182.6ms wait=0.2/45.3ms pred gate=device Token # 1670: 3.781ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1671: 112.787ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=2181 prop=2181 olap pair=107.3ms serial=189.1ms gain=81.8ms ratio=0.43 s0=7.5ms s1=181.6ms wait=0.2/44.0ms pred gate=device Token # 1672: 3.775ms; value: next_token_ids=tensor([2181], device='cuda:0') mtp accept=1 prop=2181 top1=2181 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1673: 112.244ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=2181 prop=2181 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.8ms wait=0.1/48.5ms pred gate=device Token # 1674: 3.732ms; value: next_token_ids=tensor([2181], device='cuda:0') mtp accept=1 prop=2181 top1=2181 accp=0.995 next=pair draft=12 prop=982 pred gate=device Token # 1675: 112.421ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=0 prop=982 top1=12 accp=0.580 next=draft=18335 prop=18335 olap pair=107.1ms serial=190.0ms gain=82.8ms ratio=0.44 s0=3.8ms s1=186.1ms wait=0.1/48.4ms pred gate=device Token # 1676: 112.537ms; value: next_token_ids=tensor([18335], device='cuda:0') mtp accept=1 prop=18335 top1=18335 accp=1.000 next=draft=14456 prop=14456 olap pair=107.3ms serial=190.0ms gain=82.7ms ratio=0.44 s0=4.0ms s1=186.0ms wait=0.1/48.1ms pred gate=device Token # 1677: 3.812ms; value: next_token_ids=tensor([6793], device='cuda:0') mtp accept=0 prop=14456 top1=6793 accp=0.016 next=pair draft=18 prop=18 pred gate=device Token # 1678: 113.276ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=0 prop=18 top1=21 accp=0.000 next=draft=438 prop=438 olap pair=107.2ms serial=189.5ms gain=82.2ms ratio=0.43 s0=6.1ms s1=183.3ms wait=0.2/45.9ms pred gate=device Token # 1679: 112.937ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=1148 prop=1148 olap pair=107.3ms serial=189.1ms gain=81.8ms ratio=0.43 s0=7.7ms s1=181.4ms wait=0.2/43.7ms pred gate=device Token # 1680: 3.766ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=1.000 next=pair draft=2181 prop=2181 pred gate=device Token # 1681: 112.604ms; value: next_token_ids=tensor([2181], device='cuda:0') mtp accept=1 prop=2181 top1=2181 accp=1.000 next=draft=12 prop=12 olap pair=107.4ms serial=189.5ms gain=82.2ms ratio=0.43 s0=7.3ms s1=182.2ms wait=0.2/44.2ms pred gate=device Token # 1682: 3.716ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=18335 prop=18335 pred gate=device Token # 1683: 112.473ms; value: next_token_ids=tensor([18335], device='cuda:0') mtp accept=1 prop=18335 top1=18335 accp=1.000 next=draft=6793 prop=6793 olap pair=107.2ms serial=190.1ms gain=82.9ms ratio=0.44 s0=4.0ms s1=186.1ms wait=0.1/48.3ms pred gate=device Token # 1684: 3.780ms; value: next_token_ids=tensor([6793], device='cuda:0') mtp accept=1 prop=6793 top1=6793 accp=1.000 next=pair draft=21 prop=21 pred gate=device Token # 1685: 112.398ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=draft=768 prop=768 olap pair=107.2ms serial=189.4ms gain=82.3ms ratio=0.43 s0=4.3ms s1=185.1ms wait=0.1/47.7ms pred gate=device Token # 1686: 3.737ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.998 next=pair draft=2181 prop=2181 pred gate=device Token # 1687: 112.458ms; value: next_token_ids=tensor([2181], device='cuda:0') mtp accept=1 prop=2181 top1=2181 accp=0.955 next=draft=12 prop=12 olap pair=107.1ms serial=189.8ms gain=82.7ms ratio=0.44 s0=4.1ms s1=185.7ms wait=0.1/47.7ms pred gate=device Token # 1688: 3.755ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=14461 prop=14461 pred gate=device Token # 1689: 113.006ms; value: next_token_ids=tensor([14461], device='cuda:0') mtp accept=1 prop=14461 top1=18335 accp=0.590 next=draft=1320 prop=1320 olap pair=107.7ms serial=189.9ms gain=82.2ms ratio=0.43 s0=4.1ms s1=185.8ms wait=0.1/48.1ms pred gate=device Token # 1690: 3.724ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=pair draft=18 prop=18 pred gate=device Token # 1691: 112.058ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=31 prop=31 olap pair=106.8ms serial=189.3ms gain=82.5ms ratio=0.44 s0=3.8ms s1=185.5ms wait=0.1/48.5ms pred gate=device Token # 1692: 3.741ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=5769 prop=5769 pred gate=device Token # 1693: 112.807ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=1.000 next=draft=4314 prop=4314 olap pair=107.5ms serial=189.5ms gain=82.0ms ratio=0.43 s0=4.3ms s1=185.2ms wait=0.1/48.0ms pred gate=device Token # 1694: 3.738ms; value: next_token_ids=tensor([4314], device='cuda:0') mtp accept=1 prop=4314 top1=4314 accp=1.000 next=pair draft=1320 prop=1320 pred gate=device Token # 1695: 112.506ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=draft=303 prop=303 olap pair=107.3ms serial=189.7ms gain=82.4ms ratio=0.43 s0=4.3ms s1=185.4ms wait=0.1/47.7ms pred gate=device Token # 1696: 3.761ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=2181 prop=2181 pred gate=device Token # 1697: 112.848ms; value: next_token_ids=tensor([2181], device='cuda:0') mtp accept=1 prop=2181 top1=2181 accp=1.000 next=draft=12 prop=12 olap pair=107.6ms serial=189.5ms gain=82.0ms ratio=0.43 s0=4.3ms s1=185.3ms wait=0.1/47.8ms pred gate=device Token # 1698: 3.769ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=28400 prop=28400 pred gate=device Token # 1699: 112.379ms; value: next_token_ids=tensor([28400], device='cuda:0') mtp accept=1 prop=28400 top1=28400 accp=1.000 next=draft=3600 prop=3600 olap pair=107.1ms serial=189.2ms gain=82.1ms ratio=0.43 s0=7.1ms s1=182.2ms wait=0.2/44.8ms pred gate=device Token # 1700: 3.781ms; value: next_token_ids=tensor([3600], device='cuda:0') mtp accept=1 prop=3600 top1=3600 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1701: 112.594ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=9489 prop=9489 olap pair=107.4ms serial=189.8ms gain=82.5ms ratio=0.43 s0=4.1ms s1=185.7ms wait=0.1/48.2ms pred gate=device Token # 1702: 3.742ms; value: next_token_ids=tensor([9489], device='cuda:0') mtp accept=1 prop=9489 top1=9489 accp=1.000 next=pair draft=31257 prop=31257 pred gate=device Token # 1703: 113.027ms; value: next_token_ids=tensor([31257], device='cuda:0') mtp accept=1 prop=31257 top1=31257 accp=1.000 next=draft=21 prop=21 olap pair=107.8ms serial=189.2ms gain=81.4ms ratio=0.43 s0=4.3ms s1=184.9ms wait=0.1/47.9ms pred gate=device Token # 1704: 3.742ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 1705: 113.849ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=77230 prop=77230 olap pair=107.7ms serial=189.0ms gain=81.3ms ratio=0.43 s0=5.5ms s1=183.5ms wait=0.1/46.6ms pred gate=device Token # 1706: 4.652ms; value: next_token_ids=tensor([77230], device='cuda:0') mtp accept=1 prop=77230 top1=77230 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1707: 112.538ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=5769 prop=5769 olap pair=107.1ms serial=189.3ms gain=82.2ms ratio=0.43 s0=5.7ms s1=183.6ms wait=0.2/46.1ms pred gate=device Token # 1708: 3.716ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=1.000 next=pair draft=4314 prop=4314 pred gate=device Token # 1709: 112.567ms; value: next_token_ids=tensor([4314], device='cuda:0') mtp accept=1 prop=4314 top1=4314 accp=0.999 next=draft=1320 prop=1320 olap pair=107.3ms serial=189.6ms gain=82.2ms ratio=0.43 s0=4.0ms s1=185.5ms wait=0.1/48.3ms pred gate=device Token # 1710: 3.756ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=pair draft=13 prop=13 pred gate=device Token # 1711: 112.571ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=9489 prop=9489 olap pair=107.2ms serial=189.3ms gain=82.1ms ratio=0.43 s0=6.6ms s1=182.7ms wait=0.2/45.2ms pred gate=device Token # 1712: 3.777ms; value: next_token_ids=tensor([9489], device='cuda:0') mtp accept=1 prop=9489 top1=9489 accp=1.000 next=pair draft=31257 prop=31257 pred gate=device Token # 1713: 112.620ms; value: next_token_ids=tensor([31257], device='cuda:0') mtp accept=1 prop=31257 top1=31257 accp=1.000 next=draft=21 prop=21 olap pair=107.4ms serial=189.3ms gain=82.0ms ratio=0.43 s0=5.2ms s1=184.2ms wait=0.1/46.9ms pred gate=device Token # 1714: 3.737ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1715: 113.444ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=7163 prop=7163 olap pair=107.3ms serial=189.4ms gain=82.0ms ratio=0.43 s0=7.1ms s1=182.3ms wait=0.2/44.9ms pred gate=device Token # 1716: 4.640ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 1717: 112.754ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=28389 prop=28389 olap pair=107.2ms serial=188.6ms gain=81.4ms ratio=0.43 s0=8.0ms s1=180.6ms wait=0.2/43.7ms pred gate=device Token # 1718: 3.750ms; value: next_token_ids=tensor([28389], device='cuda:0') mtp accept=1 prop=28389 top1=28389 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1719: 112.742ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.998 next=draft=2636 prop=2636 olap pair=107.5ms serial=189.1ms gain=81.7ms ratio=0.43 s0=6.3ms s1=182.8ms wait=0.2/45.6ms pred gate=device Token # 1720: 3.954ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=1.000 next=pair draft=4169 prop=4169 pred gate=device Token # 1721: 113.327ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=4169 top1=223 accp=0.340 next=draft=7163 prop=7163 olap pair=108.1ms serial=189.7ms gain=81.6ms ratio=0.43 s0=4.3ms s1=185.4ms wait=0.1/47.9ms pred gate=device Token # 1722: 113.201ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.995 next=draft=27521 prop=27521 olap pair=107.8ms serial=189.7ms gain=81.9ms ratio=0.43 s0=4.2ms s1=185.5ms wait=0.1/48.1ms pred gate=device Token # 1723: 3.698ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 1724: 112.511ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=draft=565 prop=565 olap pair=107.3ms serial=190.2ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.4ms wait=0.1/48.6ms pred gate=device Token # 1725: 3.850ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1726: 112.668ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=7163 prop=7163 olap pair=107.4ms serial=189.8ms gain=82.4ms ratio=0.43 s0=6.8ms s1=182.9ms wait=0.2/44.7ms pred gate=device Token # 1727: 3.761ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 1728: 112.804ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=28389 prop=28389 olap pair=107.5ms serial=190.5ms gain=83.0ms ratio=0.44 s0=3.9ms s1=186.6ms wait=0.1/48.3ms pred gate=device Token # 1729: 3.747ms; value: next_token_ids=tensor([28389], device='cuda:0') mtp accept=1 prop=28389 top1=28389 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 1730: 112.935ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=223 prop=223 olap pair=107.7ms serial=189.6ms gain=81.9ms ratio=0.43 s0=4.4ms s1=185.2ms wait=0.1/47.7ms pred gate=device Token # 1731: 3.818ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=26 prop=26 pred gate=device Token # 1732: 113.181ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=draft=303 prop=303 olap pair=108.0ms serial=190.0ms gain=82.1ms ratio=0.43 s0=4.2ms s1=185.8ms wait=0.1/48.1ms pred gate=device Token # 1733: 3.823ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.983 next=pair draft=4169 prop=4169 pred gate=device Token # 1734: 112.671ms; value: next_token_ids=tensor([4169], device='cuda:0') mtp accept=1 prop=4169 top1=4169 accp=1.000 next=draft=996 prop=996 olap pair=107.4ms serial=189.8ms gain=82.5ms ratio=0.43 s0=4.2ms s1=185.6ms wait=0.1/47.9ms pred gate=device Token # 1735: 3.811ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=pair draft=26 prop=26 pred gate=device Token # 1736: 112.851ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=draft=478 prop=478 olap pair=107.7ms serial=189.4ms gain=81.7ms ratio=0.43 s0=5.4ms s1=183.9ms wait=0.1/46.5ms pred gate=device Token # 1737: 3.807ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.985 next=pair draft=1942 prop=1942 pred gate=device Token # 1738: 113.162ms; value: next_token_ids=tensor([1942], device='cuda:0') mtp accept=1 prop=1942 top1=1942 accp=1.000 next=draft=768 prop=768 olap pair=108.0ms serial=189.5ms gain=81.6ms ratio=0.43 s0=4.4ms s1=185.1ms wait=0.1/48.0ms pred gate=device Token # 1739: 3.696ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=7163 prop=7163 pred gate=device Token # 1740: 112.924ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.994 next=draft=27521 prop=27521 olap pair=107.7ms serial=189.9ms gain=82.2ms ratio=0.43 s0=4.2ms s1=185.7ms wait=0.1/48.2ms pred gate=device Token # 1741: 3.680ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 1742: 112.294ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=draft=1267 prop=1267 olap pair=107.0ms serial=189.3ms gain=82.3ms ratio=0.43 s0=5.3ms s1=184.0ms wait=0.1/46.8ms pred gate=device Token # 1743: 3.764ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1744: 112.576ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=1942 prop=1942 olap pair=107.3ms serial=190.2ms gain=82.9ms ratio=0.44 s0=4.3ms s1=185.9ms wait=0.1/47.5ms pred gate=device Token # 1745: 3.761ms; value: next_token_ids=tensor([1942], device='cuda:0') mtp accept=1 prop=1942 top1=1942 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1746: 112.396ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=1942 prop=1942 olap pair=107.2ms serial=190.0ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.6ms wait=0.1/47.1ms pred gate=device Token # 1747: 3.776ms; value: next_token_ids=tensor([1942], device='cuda:0') mtp accept=1 prop=1942 top1=1942 accp=0.908 next=pair draft=12 prop=12 pred gate=device Token # 1748: 112.263ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=0.998 next=draft=17872 prop=17872 olap pair=107.0ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.4ms wait=0.1/47.3ms pred gate=device Token # 1749: 3.724ms; value: next_token_ids=tensor([17872], device='cuda:0') mtp accept=1 prop=17872 top1=17872 accp=1.000 next=pair draft=16046 prop=18572 pred gate=device Token # 1750: 112.103ms; value: next_token_ids=tensor([23427], device='cuda:0') mtp accept=0 prop=18572 top1=23427 accp=0.002 next=draft=27 prop=26 olap pair=106.9ms serial=189.7ms gain=82.8ms ratio=0.44 s0=4.3ms s1=185.4ms wait=0.1/47.4ms pred gate=device Token # 1751: 112.342ms; value: next_token_ids=tensor([27], device='cuda:0') mtp accept=0 prop=26 top1=27 accp=0.499 next=draft=438 prop=438 olap pair=107.0ms serial=189.9ms gain=82.8ms ratio=0.44 s0=4.0ms s1=185.9ms wait=0.1/48.3ms pred gate=device Token # 1752: 112.651ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=1148 prop=1148 olap pair=107.3ms serial=190.3ms gain=83.0ms ratio=0.44 s0=3.9ms s1=186.5ms wait=0.1/48.4ms pred gate=device Token # 1753: 3.665ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=1.000 next=pair draft=1942 prop=1942 pred gate=device Token # 1754: 112.491ms; value: next_token_ids=tensor([1942], device='cuda:0') mtp accept=1 prop=1942 top1=1942 accp=1.000 next=draft=12 prop=12 olap pair=107.3ms serial=190.1ms gain=82.8ms ratio=0.44 s0=3.9ms s1=186.2ms wait=0.1/48.5ms pred gate=device Token # 1755: 3.708ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=17872 prop=17872 pred gate=device Token # 1756: 112.332ms; value: next_token_ids=tensor([17872], device='cuda:0') mtp accept=1 prop=17872 top1=17872 accp=1.000 next=draft=23427 prop=23427 olap pair=107.1ms serial=189.9ms gain=82.8ms ratio=0.44 s0=4.0ms s1=185.9ms wait=0.1/48.1ms pred gate=device Token # 1757: 3.791ms; value: next_token_ids=tensor([23427], device='cuda:0') mtp accept=1 prop=23427 top1=23427 accp=0.999 next=pair draft=27 prop=27 pred gate=device Token # 1758: 112.449ms; value: next_token_ids=tensor([27], device='cuda:0') mtp accept=1 prop=27 top1=27 accp=1.000 next=draft=768 prop=768 olap pair=107.2ms serial=190.0ms gain=82.8ms ratio=0.44 s0=4.1ms s1=185.9ms wait=0.1/48.0ms pred gate=device Token # 1759: 3.945ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=1942 prop=1942 pred gate=device Token # 1760: 112.637ms; value: next_token_ids=tensor([1942], device='cuda:0') mtp accept=1 prop=1942 top1=1942 accp=0.997 next=draft=12 prop=12 olap pair=107.4ms serial=190.2ms gain=82.9ms ratio=0.44 s0=4.0ms s1=186.2ms wait=0.1/48.1ms pred gate=device Token # 1761: 3.697ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=13161 prop=13161 pred gate=device Token # 1762: 112.333ms; value: next_token_ids=tensor([13161], device='cuda:0') mtp accept=1 prop=13161 top1=13161 accp=1.000 next=draft=1320 prop=1320 olap pair=107.1ms serial=189.9ms gain=82.8ms ratio=0.44 s0=4.0ms s1=185.9ms wait=0.1/48.2ms pred gate=device Token # 1763: 3.772ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=pair draft=18 prop=18 pred gate=device Token # 1764: 112.106ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=31 prop=31 olap pair=106.9ms serial=189.3ms gain=82.5ms ratio=0.44 s0=3.9ms s1=185.4ms wait=0.1/48.4ms pred gate=device Token # 1765: 3.734ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=6650 prop=6650 pred gate=device Token # 1766: 112.307ms; value: next_token_ids=tensor([6650], device='cuda:0') mtp accept=1 prop=6650 top1=6650 accp=1.000 next=draft=6391 prop=6391 olap pair=107.1ms serial=190.0ms gain=82.8ms ratio=0.44 s0=4.0ms s1=186.0ms wait=0.1/48.3ms pred gate=device Token # 1767: 3.717ms; value: next_token_ids=tensor([6391], device='cuda:0') mtp accept=1 prop=6391 top1=6391 accp=1.000 next=pair draft=1320 prop=1320 pred gate=device Token # 1768: 112.530ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=draft=303 prop=303 olap pair=107.3ms serial=189.7ms gain=82.3ms ratio=0.43 s0=3.9ms s1=185.7ms wait=0.1/48.4ms pred gate=device Token # 1769: 3.746ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=1942 prop=1942 pred gate=device Token # 1770: 112.634ms; value: next_token_ids=tensor([1942], device='cuda:0') mtp accept=1 prop=1942 top1=1942 accp=1.000 next=draft=12 prop=12 olap pair=107.4ms serial=189.9ms gain=82.6ms ratio=0.43 s0=4.3ms s1=185.7ms wait=0.1/47.7ms pred gate=device Token # 1771: 3.735ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=13506 prop=13506 pred gate=device Token # 1772: 112.406ms; value: next_token_ids=tensor([13506], device='cuda:0') mtp accept=1 prop=13506 top1=13506 accp=1.000 next=draft=4362 prop=4362 olap pair=107.2ms serial=190.0ms gain=82.8ms ratio=0.44 s0=4.0ms s1=186.0ms wait=0.1/48.1ms pred gate=device Token # 1773: 3.703ms; value: next_token_ids=tensor([4362], device='cuda:0') mtp accept=1 prop=4362 top1=4362 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1774: 112.463ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=7207 prop=7207 olap pair=107.2ms serial=190.0ms gain=82.8ms ratio=0.44 s0=3.9ms s1=186.1ms wait=0.1/48.4ms pred gate=device Token # 1775: 3.785ms; value: next_token_ids=tensor([7207], device='cuda:0') mtp accept=1 prop=7207 top1=7207 accp=1.000 next=pair draft=31257 prop=31257 pred gate=device Token # 1776: 112.432ms; value: next_token_ids=tensor([31257], device='cuda:0') mtp accept=1 prop=31257 top1=31257 accp=1.000 next=draft=21 prop=21 olap pair=107.2ms serial=189.8ms gain=82.6ms ratio=0.44 s0=4.0ms s1=185.8ms wait=0.1/48.1ms pred gate=device Token # 1777: 3.730ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 1778: 113.262ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=77230 prop=77230 olap pair=107.3ms serial=189.5ms gain=82.2ms ratio=0.43 s0=5.6ms s1=183.9ms wait=0.1/46.4ms pred gate=device Token # 1779: 4.601ms; value: next_token_ids=tensor([77230], device='cuda:0') mtp accept=1 prop=77230 top1=77230 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1780: 113.803ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=6650 prop=6650 olap pair=107.6ms serial=189.4ms gain=81.7ms ratio=0.43 s0=8.9ms s1=180.5ms wait=0.2/42.3ms pred gate=device Token # 1781: 4.621ms; value: next_token_ids=tensor([6650], device='cuda:0') mtp accept=1 prop=6650 top1=6650 accp=1.000 next=pair draft=6391 prop=6391 pred gate=device Token # 1782: 113.745ms; value: next_token_ids=tensor([6391], device='cuda:0') mtp accept=1 prop=6391 top1=6391 accp=1.000 next=draft=1320 prop=1320 olap pair=107.5ms serial=189.1ms gain=81.6ms ratio=0.43 s0=8.6ms s1=180.4ms wait=0.2/42.9ms pred gate=device Token # 1783: 4.524ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=pair draft=13 prop=13 pred gate=device Token # 1784: 113.858ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=7207 prop=7207 olap pair=107.5ms serial=189.3ms gain=81.8ms ratio=0.43 s0=8.6ms s1=180.7ms wait=0.2/42.7ms pred gate=device Token # 1785: 4.673ms; value: next_token_ids=tensor([7207], device='cuda:0') mtp accept=1 prop=7207 top1=7207 accp=1.000 next=pair draft=31257 prop=31257 pred gate=device Token # 1786: 112.712ms; value: next_token_ids=tensor([31257], device='cuda:0') mtp accept=1 prop=31257 top1=31257 accp=1.000 next=draft=21 prop=21 olap pair=107.4ms serial=190.2ms gain=82.8ms ratio=0.44 s0=4.3ms s1=185.9ms wait=0.1/47.6ms pred gate=device Token # 1787: 3.713ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1788: 112.779ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=0.980 next=draft=7163 prop=7163 olap pair=107.5ms serial=190.5ms gain=83.0ms ratio=0.44 s0=3.9ms s1=186.6ms wait=0.1/48.6ms pred gate=device Token # 1789: 3.711ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 1790: 112.453ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=28389 prop=28389 olap pair=107.2ms serial=190.0ms gain=82.8ms ratio=0.44 s0=3.9ms s1=186.1ms wait=0.1/48.5ms pred gate=device Token # 1791: 3.720ms; value: next_token_ids=tensor([28389], device='cuda:0') mtp accept=1 prop=28389 top1=28389 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1792: 112.769ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.998 next=draft=4169 prop=4169 olap pair=107.5ms serial=190.4ms gain=82.9ms ratio=0.44 s0=4.1ms s1=186.2ms wait=0.1/47.8ms pred gate=device Token # 1793: 3.775ms; value: next_token_ids=tensor([4169], device='cuda:0') mtp accept=1 prop=4169 top1=4169 accp=0.987 next=pair draft=996 prop=996 pred gate=device Token # 1794: 112.609ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=draft=26 prop=26 olap pair=107.4ms serial=190.2ms gain=82.8ms ratio=0.44 s0=4.2ms s1=186.0ms wait=0.1/47.7ms pred gate=device Token # 1795: 3.736ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=pair draft=478 prop=478 pred gate=device Token # 1796: 112.951ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.825 next=draft=3286 prop=3286 olap pair=107.8ms serial=191.2ms gain=83.4ms ratio=0.44 s0=4.0ms s1=187.2ms wait=0.1/48.3ms pred gate=device Token # 1797: 3.711ms; value: next_token_ids=tensor([3286], device='cuda:0') mtp accept=1 prop=3286 top1=3286 accp=0.999 next=pair draft=768 prop=768 pred gate=device Token # 1798: 112.671ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=7163 prop=7163 olap pair=107.5ms serial=190.7ms gain=83.2ms ratio=0.44 s0=3.9ms s1=186.8ms wait=0.1/48.3ms pred gate=device Token # 1799: 3.780ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 1800: 112.733ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=107.5ms serial=190.7ms gain=83.2ms ratio=0.44 s0=3.9ms s1=186.8ms wait=0.1/48.6ms pred gate=device Token # 1801: 3.722ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=1267 prop=1267 pred gate=device Token # 1802: 112.579ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=draft=223 prop=223 olap pair=107.3ms serial=190.2ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.4ms wait=0.1/48.6ms pred gate=device Token # 1803: 3.773ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=3286 prop=3286 pred gate=device Token # 1804: 112.437ms; value: next_token_ids=tensor([3286], device='cuda:0') mtp accept=1 prop=3286 top1=3286 accp=1.000 next=draft=320 prop=320 olap pair=107.2ms serial=190.1ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.3ms wait=0.1/48.6ms pred gate=device Token # 1805: 3.732ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=3286 prop=3286 pred gate=device Token # 1806: 112.621ms; value: next_token_ids=tensor([3286], device='cuda:0') mtp accept=1 prop=3286 top1=3286 accp=1.000 next=draft=12 prop=12 olap pair=107.4ms serial=190.3ms gain=83.0ms ratio=0.44 s0=4.0ms s1=186.4ms wait=0.1/48.4ms pred gate=device Token # 1807: 3.764ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=9489 prop=9489 pred gate=device Token # 1808: 111.994ms; value: next_token_ids=tensor([9489], device='cuda:0') mtp accept=1 prop=9489 top1=9489 accp=1.000 next=draft=27897 prop=27897 olap pair=106.8ms serial=189.4ms gain=82.5ms ratio=0.44 s0=3.9ms s1=185.5ms wait=0.1/48.4ms pred gate=device Token # 1809: 3.741ms; value: next_token_ids=tensor([15098], device='cuda:0') mtp accept=0 prop=27897 top1=15098 accp=0.395 next=pair draft=18 prop=18 pred gate=device Token # 1810: 112.363ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=0 prop=18 top1=20 accp=0.000 next=draft=438 prop=438 olap pair=107.1ms serial=189.8ms gain=82.7ms ratio=0.44 s0=3.9ms s1=185.9ms wait=0.1/48.3ms pred gate=device Token # 1811: 112.627ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=1148 prop=1148 olap pair=107.3ms serial=190.2ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.3ms wait=0.1/48.3ms pred gate=device Token # 1812: 3.723ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=1.000 next=pair draft=3286 prop=3286 pred gate=device Token # 1813: 112.545ms; value: next_token_ids=tensor([3286], device='cuda:0') mtp accept=1 prop=3286 top1=3286 accp=1.000 next=draft=12 prop=12 olap pair=107.3ms serial=190.1ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.2ms wait=0.1/48.4ms pred gate=device Token # 1814: 3.694ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=9489 prop=9489 pred gate=device Token # 1815: 112.464ms; value: next_token_ids=tensor([9489], device='cuda:0') mtp accept=1 prop=9489 top1=9489 accp=1.000 next=draft=15098 prop=1320 olap pair=107.3ms serial=190.3ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.4ms wait=0.1/48.6ms pred gate=device Token # 1816: 3.737ms; value: next_token_ids=tensor([15098], device='cuda:0') mtp accept=0 prop=1320 top1=15098 accp=0.610 next=pair draft=20 prop=20 pred gate=device Token # 1817: 112.274ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=0.998 next=draft=768 prop=768 olap pair=107.0ms serial=189.2ms gain=82.2ms ratio=0.43 s0=5.0ms s1=184.2ms wait=0.1/47.3ms pred gate=device Token # 1818: 3.722ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=3286 prop=3286 pred gate=device Token # 1819: 112.577ms; value: next_token_ids=tensor([3286], device='cuda:0') mtp accept=1 prop=3286 top1=3286 accp=0.878 next=draft=12 prop=12 olap pair=107.3ms serial=190.3ms gain=83.0ms ratio=0.44 s0=3.8ms s1=186.5ms wait=0.1/48.6ms pred gate=device Token # 1820: 3.724ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=6793 prop=6793 pred gate=device Token # 1821: 112.472ms; value: next_token_ids=tensor([6793], device='cuda:0') mtp accept=1 prop=6793 top1=6793 accp=0.998 next=draft=1320 prop=1320 olap pair=107.2ms serial=189.8ms gain=82.5ms ratio=0.43 s0=4.2ms s1=185.6ms wait=0.1/47.7ms pred gate=device Token # 1822: 3.739ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=pair draft=18 prop=18 pred gate=device Token # 1823: 112.004ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=31 prop=31 olap pair=106.8ms serial=189.3ms gain=82.5ms ratio=0.44 s0=3.8ms s1=185.4ms wait=0.1/48.5ms pred gate=device Token # 1824: 3.807ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=5769 prop=5769 pred gate=device Token # 1825: 115.336ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=1.000 next=draft=3712 prop=3712 olap pair=110.2ms serial=193.0ms gain=82.9ms ratio=0.43 s0=4.8ms s1=188.3ms wait=0.1/46.9ms pred gate=device Token # 1826: 3.706ms; value: next_token_ids=tensor([3712], device='cuda:0') mtp accept=1 prop=3712 top1=3712 accp=1.000 next=pair draft=1320 prop=1320 pred gate=device Token # 1827: 112.493ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=draft=303 prop=303 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.1ms wait=0.1/48.6ms pred gate=device Token # 1828: 3.757ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=3286 prop=3286 pred gate=device Token # 1829: 112.436ms; value: next_token_ids=tensor([3286], device='cuda:0') mtp accept=1 prop=3286 top1=3286 accp=1.000 next=draft=12 prop=12 olap pair=107.2ms serial=189.8ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.4ms wait=0.1/47.3ms pred gate=device Token # 1830: 3.753ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=25282 prop=25282 pred gate=device Token # 1831: 112.369ms; value: next_token_ids=tensor([25282], device='cuda:0') mtp accept=1 prop=25282 top1=25282 accp=1.000 next=draft=3425 prop=3425 olap pair=107.2ms serial=189.8ms gain=82.7ms ratio=0.44 s0=4.3ms s1=185.5ms wait=0.1/47.5ms pred gate=device Token # 1832: 3.717ms; value: next_token_ids=tensor([3425], device='cuda:0') mtp accept=1 prop=3425 top1=3425 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1833: 112.319ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=14843 prop=14843 olap pair=107.1ms serial=189.8ms gain=82.6ms ratio=0.44 s0=4.2ms s1=185.6ms wait=0.1/47.7ms pred gate=device Token # 1834: 3.862ms; value: next_token_ids=tensor([14843], device='cuda:0') mtp accept=1 prop=14843 top1=14843 accp=1.000 next=pair draft=30942 prop=30942 pred gate=device Token # 1835: 112.306ms; value: next_token_ids=tensor([30942], device='cuda:0') mtp accept=1 prop=30942 top1=30942 accp=1.000 next=draft=20 prop=20 olap pair=107.1ms serial=189.9ms gain=82.8ms ratio=0.44 s0=3.9ms s1=186.0ms wait=0.1/48.5ms pred gate=device Token # 1836: 3.720ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 1837: 112.467ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=77230 prop=77230 olap pair=107.2ms serial=189.7ms gain=82.5ms ratio=0.43 s0=4.4ms s1=185.3ms wait=0.1/47.2ms pred gate=device Token # 1838: 3.722ms; value: next_token_ids=tensor([77230], device='cuda:0') mtp accept=1 prop=77230 top1=77230 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1839: 112.468ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=5769 prop=5769 olap pair=107.3ms serial=190.2ms gain=82.9ms ratio=0.44 s0=4.0ms s1=186.2ms wait=0.1/48.4ms pred gate=device Token # 1840: 3.776ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=1.000 next=pair draft=3712 prop=3712 pred gate=device Token # 1841: 112.838ms; value: next_token_ids=tensor([3712], device='cuda:0') mtp accept=1 prop=3712 top1=3712 accp=0.999 next=draft=1320 prop=1320 olap pair=107.5ms serial=189.9ms gain=82.3ms ratio=0.43 s0=4.1ms s1=185.7ms wait=0.1/48.1ms pred gate=device Token # 1842: 3.729ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=pair draft=13 prop=13 pred gate=device Token # 1843: 113.160ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=14843 prop=14843 olap pair=107.8ms serial=189.4ms gain=81.6ms ratio=0.43 s0=5.9ms s1=183.5ms wait=0.2/45.5ms pred gate=device Token # 1844: 3.857ms; value: next_token_ids=tensor([14843], device='cuda:0') mtp accept=1 prop=14843 top1=14843 accp=1.000 next=pair draft=30942 prop=30942 pred gate=device Token # 1845: 112.759ms; value: next_token_ids=tensor([30942], device='cuda:0') mtp accept=1 prop=30942 top1=30942 accp=1.000 next=draft=20 prop=20 olap pair=107.5ms serial=190.7ms gain=83.2ms ratio=0.44 s0=3.9ms s1=186.8ms wait=0.1/48.5ms pred gate=device Token # 1846: 3.751ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1847: 112.669ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=0.994 next=draft=7163 prop=7163 olap pair=107.4ms serial=190.3ms gain=82.9ms ratio=0.44 s0=4.1ms s1=186.2ms wait=0.1/48.1ms pred gate=device Token # 1848: 3.749ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 1849: 112.611ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=28691 prop=28691 olap pair=107.4ms serial=189.3ms gain=81.9ms ratio=0.43 s0=4.3ms s1=185.1ms wait=0.1/47.7ms pred gate=device Token # 1850: 3.694ms; value: next_token_ids=tensor([28691], device='cuda:0') mtp accept=1 prop=28691 top1=28691 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1851: 112.242ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=4169 prop=4169 olap pair=107.0ms serial=189.5ms gain=82.5ms ratio=0.44 s0=4.3ms s1=185.1ms wait=0.1/47.4ms pred gate=device Token # 1852: 3.790ms; value: next_token_ids=tensor([4169], device='cuda:0') mtp accept=1 prop=4169 top1=4169 accp=1.000 next=pair draft=996 prop=996 pred gate=device Token # 1853: 112.073ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=draft=511 prop=511 olap pair=106.8ms serial=189.0ms gain=82.2ms ratio=0.44 s0=5.0ms s1=184.1ms wait=0.1/46.8ms pred gate=device Token # 1854: 3.730ms; value: next_token_ids=tensor([511], device='cuda:0') mtp accept=1 prop=511 top1=511 accp=1.000 next=pair draft=478 prop=478 pred gate=device Token # 1855: 113.747ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.995 next=draft=3354 prop=3354 olap pair=107.6ms serial=189.5ms gain=81.9ms ratio=0.43 s0=5.2ms s1=184.3ms wait=0.1/46.7ms pred gate=device Token # 1856: 4.030ms; value: next_token_ids=tensor([3354], device='cuda:0') mtp accept=1 prop=3354 top1=3354 accp=0.985 next=pair draft=768 prop=768 pred gate=device Token # 1857: 112.404ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=7163 prop=7163 olap pair=107.3ms serial=190.2ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.7ms wait=0.1/47.8ms pred gate=device Token # 1858: 3.844ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=0.998 next=pair draft=27521 prop=27521 pred gate=device Token # 1859: 112.545ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=107.3ms serial=189.6ms gain=82.3ms ratio=0.43 s0=5.4ms s1=184.2ms wait=0.1/46.4ms pred gate=device Token # 1860: 3.789ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=1267 prop=1267 pred gate=device Token # 1861: 112.634ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=draft=223 prop=223 olap pair=107.4ms serial=190.1ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.7ms wait=0.1/47.4ms pred gate=device Token # 1862: 3.831ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=3354 prop=3354 pred gate=device Token # 1863: 112.602ms; value: next_token_ids=tensor([3354], device='cuda:0') mtp accept=1 prop=3354 top1=3354 accp=1.000 next=draft=320 prop=320 olap pair=107.4ms serial=190.0ms gain=82.7ms ratio=0.43 s0=4.4ms s1=185.7ms wait=0.1/47.7ms pred gate=device Token # 1864: 3.749ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=3354 prop=3354 pred gate=device Token # 1865: 113.404ms; value: next_token_ids=tensor([3354], device='cuda:0') mtp accept=1 prop=3354 top1=3354 accp=1.000 next=draft=12 prop=12 olap pair=108.2ms serial=190.5ms gain=82.3ms ratio=0.43 s0=4.2ms s1=186.3ms wait=0.1/48.2ms pred gate=device Token # 1866: 3.723ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=15659 prop=15659 pred gate=device Token # 1867: 112.807ms; value: next_token_ids=tensor([15659], device='cuda:0') mtp accept=1 prop=15659 top1=15659 accp=1.000 next=draft=31583 prop=26960 olap pair=107.6ms serial=189.0ms gain=81.4ms ratio=0.43 s0=4.3ms s1=184.7ms wait=0.1/47.9ms pred gate=device Token # 1868: 3.656ms; value: next_token_ids=tensor([31583], device='cuda:0') mtp accept=0 prop=26960 top1=31583 accp=0.437 next=pair draft=22 prop=23 pred gate=device Token # 1869: 112.791ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=0 prop=23 top1=22 accp=0.342 next=draft=438 prop=438 olap pair=107.6ms serial=189.1ms gain=81.5ms ratio=0.43 s0=4.3ms s1=184.8ms wait=0.1/47.8ms pred gate=device Token # 1870: 112.606ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=1148 prop=1148 olap pair=107.3ms serial=190.1ms gain=82.8ms ratio=0.44 s0=4.3ms s1=185.8ms wait=0.1/47.6ms pred gate=device Token # 1871: 3.727ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=1.000 next=pair draft=3354 prop=3354 pred gate=device Token # 1872: 112.509ms; value: next_token_ids=tensor([3354], device='cuda:0') mtp accept=1 prop=3354 top1=3354 accp=1.000 next=draft=12 prop=12 olap pair=107.3ms serial=190.1ms gain=82.8ms ratio=0.44 s0=4.3ms s1=185.8ms wait=0.1/47.7ms pred gate=device Token # 1873: 3.707ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=15659 prop=15659 pred gate=device Token # 1874: 112.553ms; value: next_token_ids=tensor([15659], device='cuda:0') mtp accept=1 prop=15659 top1=15659 accp=0.999 next=draft=31583 prop=31583 olap pair=107.4ms serial=190.0ms gain=82.6ms ratio=0.43 s0=4.2ms s1=185.8ms wait=0.1/47.9ms pred gate=device Token # 1875: 3.784ms; value: next_token_ids=tensor([31583], device='cuda:0') mtp accept=1 prop=31583 top1=31583 accp=1.000 next=pair draft=22 prop=22 pred gate=device Token # 1876: 112.382ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=draft=768 prop=768 olap pair=107.2ms serial=189.1ms gain=81.9ms ratio=0.43 s0=5.0ms s1=184.1ms wait=0.1/46.8ms pred gate=device Token # 1877: 3.728ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=3354 prop=3354 pred gate=device Token # 1878: 112.478ms; value: next_token_ids=tensor([3354], device='cuda:0') mtp accept=1 prop=3354 top1=3354 accp=1.000 next=draft=12 prop=12 olap pair=107.2ms serial=190.1ms gain=82.9ms ratio=0.44 s0=4.0ms s1=186.2ms wait=0.1/48.4ms pred gate=device Token # 1879: 3.690ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=9186 prop=9186 pred gate=device Token # 1880: 113.444ms; value: next_token_ids=tensor([9186], device='cuda:0') mtp accept=1 prop=9186 top1=9186 accp=1.000 next=draft=1320 prop=1320 olap pair=107.4ms serial=190.0ms gain=82.6ms ratio=0.43 s0=4.9ms s1=185.1ms wait=0.1/46.9ms pred gate=device Token # 1881: 4.530ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=pair draft=18 prop=18 pred gate=device Token # 1882: 112.494ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=31 prop=31 olap pair=107.0ms serial=188.5ms gain=81.4ms ratio=0.43 s0=8.5ms s1=180.0ms wait=0.2/43.3ms pred gate=device Token # 1883: 3.757ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=6650 prop=6650 pred gate=device Token # 1884: 112.719ms; value: next_token_ids=tensor([6650], device='cuda:0') mtp accept=1 prop=6650 top1=6650 accp=1.000 next=draft=792 prop=792 olap pair=107.6ms serial=190.1ms gain=82.5ms ratio=0.43 s0=3.9ms s1=186.1ms wait=0.1/48.5ms pred gate=device Token # 1885: 3.684ms; value: next_token_ids=tensor([792], device='cuda:0') mtp accept=1 prop=792 top1=792 accp=1.000 next=pair draft=1320 prop=1320 pred gate=device Token # 1886: 112.336ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=draft=303 prop=303 olap pair=107.2ms serial=189.2ms gain=82.1ms ratio=0.43 s0=5.1ms s1=184.1ms wait=0.1/47.0ms pred gate=device Token # 1887: 3.765ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=3354 prop=3354 pred gate=device Token # 1888: 112.932ms; value: next_token_ids=tensor([3354], device='cuda:0') mtp accept=1 prop=3354 top1=3354 accp=1.000 next=draft=12 prop=12 olap pair=107.6ms serial=189.1ms gain=81.5ms ratio=0.43 s0=4.3ms s1=184.8ms wait=0.1/48.1ms pred gate=device Token # 1889: 3.713ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=21466 prop=21466 pred gate=device Token # 1890: 112.457ms; value: next_token_ids=tensor([21466], device='cuda:0') mtp accept=1 prop=21466 top1=21466 accp=1.000 next=draft=2240 prop=2240 olap pair=107.2ms serial=189.6ms gain=82.4ms ratio=0.43 s0=5.4ms s1=184.2ms wait=0.2/46.7ms pred gate=device Token # 1891: 3.677ms; value: next_token_ids=tensor([2240], device='cuda:0') mtp accept=1 prop=2240 top1=2240 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1892: 112.529ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=10488 prop=10488 olap pair=107.3ms serial=190.1ms gain=82.7ms ratio=0.44 s0=4.3ms s1=185.8ms wait=0.1/47.8ms pred gate=device Token # 1893: 3.768ms; value: next_token_ids=tensor([10488], device='cuda:0') mtp accept=1 prop=10488 top1=10488 accp=1.000 next=pair draft=30793 prop=30793 pred gate=device Token # 1894: 112.106ms; value: next_token_ids=tensor([30793], device='cuda:0') mtp accept=1 prop=30793 top1=30793 accp=1.000 next=draft=20 prop=20 olap pair=106.9ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.1ms wait=0.1/47.3ms pred gate=device Token # 1895: 3.711ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 1896: 112.369ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=77230 prop=77230 olap pair=107.1ms serial=190.0ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.6ms wait=0.1/47.3ms pred gate=device Token # 1897: 3.738ms; value: next_token_ids=tensor([77230], device='cuda:0') mtp accept=1 prop=77230 top1=77230 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1898: 112.729ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=6650 prop=6650 olap pair=107.5ms serial=190.7ms gain=83.2ms ratio=0.44 s0=4.4ms s1=186.3ms wait=0.1/47.4ms pred gate=device Token # 1899: 3.783ms; value: next_token_ids=tensor([6650], device='cuda:0') mtp accept=1 prop=6650 top1=6650 accp=0.999 next=pair draft=792 prop=792 pred gate=device Token # 1900: 112.237ms; value: next_token_ids=tensor([792], device='cuda:0') mtp accept=1 prop=792 top1=792 accp=1.000 next=draft=1320 prop=1320 olap pair=107.1ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.3ms wait=0.1/47.3ms pred gate=device Token # 1901: 3.729ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=pair draft=13 prop=13 pred gate=device Token # 1902: 112.485ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=10488 prop=10488 olap pair=107.2ms serial=190.1ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.7ms wait=0.1/47.2ms pred gate=device Token # 1903: 3.825ms; value: next_token_ids=tensor([10488], device='cuda:0') mtp accept=1 prop=10488 top1=10488 accp=1.000 next=pair draft=30793 prop=30793 pred gate=device Token # 1904: 112.555ms; value: next_token_ids=tensor([30793], device='cuda:0') mtp accept=1 prop=30793 top1=30793 accp=1.000 next=draft=20 prop=20 olap pair=107.3ms serial=190.3ms gain=83.0ms ratio=0.44 s0=4.4ms s1=185.9ms wait=0.1/47.1ms pred gate=device Token # 1905: 3.754ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1906: 112.360ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=0.998 next=draft=7163 prop=7163 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.6ms wait=0.1/47.1ms pred gate=device Token # 1907: 3.736ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 1908: 112.280ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=21817 prop=21817 olap pair=107.1ms serial=189.9ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.6ms wait=0.1/47.3ms pred gate=device Token # 1909: 3.732ms; value: next_token_ids=tensor([21817], device='cuda:0') mtp accept=1 prop=21817 top1=21817 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1910: 112.444ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.999 next=draft=4169 prop=4169 olap pair=107.1ms serial=189.9ms gain=82.7ms ratio=0.44 s0=4.1ms s1=185.7ms wait=0.1/47.7ms pred gate=device Token # 1911: 3.769ms; value: next_token_ids=tensor([4169], device='cuda:0') mtp accept=1 prop=4169 top1=4169 accp=1.000 next=pair draft=996 prop=996 pred gate=device Token # 1912: 112.214ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=draft=13959 prop=13959 olap pair=107.0ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.0ms s1=185.6ms wait=0.1/48.3ms pred gate=device Token # 1913: 3.785ms; value: next_token_ids=tensor([13959], device='cuda:0') mtp accept=1 prop=13959 top1=13959 accp=1.000 next=pair draft=478 prop=1148 pred gate=device Token # 1914: 112.419ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=0.140 next=draft=1207 prop=1207 olap pair=107.2ms serial=189.9ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.5ms wait=0.1/47.3ms pred gate=device Token # 1915: 3.725ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=0 prop=1207 top1=7163 accp=0.103 next=pair draft=27521 prop=27521 pred gate=device Token # 1916: 112.512ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=107.2ms serial=190.1ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.7ms wait=0.1/47.1ms pred gate=device Token # 1917: 3.750ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=565 prop=15 pred gate=device Token # 1918: 112.351ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=565 accp=0.984 next=draft=7163 prop=7163 olap pair=107.1ms serial=189.8ms gain=82.7ms ratio=0.44 s0=4.1ms s1=185.7ms wait=0.1/47.9ms pred gate=device Token # 1919: 3.803ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 1920: 112.454ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=21817 prop=21817 olap pair=107.2ms serial=190.1ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.3ms wait=0.1/48.5ms pred gate=device Token # 1921: 3.783ms; value: next_token_ids=tensor([21817], device='cuda:0') mtp accept=1 prop=21817 top1=21817 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1922: 112.375ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=13959 prop=13959 olap pair=107.1ms serial=189.8ms gain=82.8ms ratio=0.44 s0=3.9ms s1=185.9ms wait=0.1/48.3ms pred gate=device Token # 1923: 3.739ms; value: next_token_ids=tensor([13959], device='cuda:0') mtp accept=1 prop=13959 top1=13959 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 1924: 112.096ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.929 next=draft=13959 prop=13959 olap pair=106.8ms serial=189.3ms gain=82.5ms ratio=0.44 s0=3.9ms s1=185.4ms wait=0.1/48.4ms pred gate=device Token # 1925: 3.739ms; value: next_token_ids=tensor([13959], device='cuda:0') mtp accept=1 prop=13959 top1=13959 accp=0.999 next=pair draft=1267 prop=1267 pred gate=device Token # 1926: 112.529ms; value: next_token_ids=tensor([58000], device='cuda:0') mtp accept=0 prop=1267 top1=58000 accp=0.029 next=draft=3354 prop=3354 olap pair=107.3ms serial=190.2ms gain=83.0ms ratio=0.44 s0=3.9ms s1=186.3ms wait=0.1/48.4ms pred gate=device Token # 1927: 112.679ms; value: next_token_ids=tensor([3354], device='cuda:0') mtp accept=1 prop=3354 top1=3354 accp=1.000 next=draft=1148 prop=1148 olap pair=107.3ms serial=190.1ms gain=82.8ms ratio=0.44 s0=4.5ms s1=185.6ms wait=0.1/47.1ms pred gate=device Token # 1928: 3.818ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=0 prop=1148 top1=859 accp=0.051 next=pair draft=22 prop=22 pred gate=device Token # 1929: 112.400ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=0.999 next=draft=12 prop=12 olap pair=107.1ms serial=189.7ms gain=82.6ms ratio=0.44 s0=4.2ms s1=185.5ms wait=0.1/47.7ms pred gate=device Token # 1930: 3.786ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=0.993 next=pair draft=3354 prop=3354 pred gate=device Token # 1931: 112.232ms; value: next_token_ids=tensor([3354], device='cuda:0') mtp accept=1 prop=3354 top1=3354 accp=1.000 next=draft=31 prop=31 olap pair=107.0ms serial=189.5ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.1ms wait=0.1/47.1ms pred gate=device Token # 1932: 3.747ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=10925 prop=10925 pred gate=device Token # 1933: 112.536ms; value: next_token_ids=tensor([10925], device='cuda:0') mtp accept=1 prop=10925 top1=10925 accp=1.000 next=draft=303 prop=303 olap pair=107.3ms serial=190.2ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.8ms wait=0.1/47.2ms pred gate=device Token # 1934: 3.783ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=4169 prop=4169 pred gate=device Token # 1935: 112.348ms; value: next_token_ids=tensor([4169], device='cuda:0') mtp accept=1 prop=4169 top1=4169 accp=0.994 next=draft=996 prop=23 olap pair=107.1ms serial=189.7ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.3ms wait=0.1/47.0ms pred gate=device Token # 1936: 3.839ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=0 prop=23 top1=996 accp=0.955 next=pair draft=1942 prop=1942 pred gate=device Token # 1937: 112.415ms; value: next_token_ids=tensor([1942], device='cuda:0') mtp accept=1 prop=1942 top1=1942 accp=1.000 next=draft=320 prop=320 olap pair=107.2ms serial=189.8ms gain=82.6ms ratio=0.44 s0=4.2ms s1=185.6ms wait=0.1/47.9ms pred gate=device Token # 1938: 3.781ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.899 next=pair draft=2636 prop=2636 pred gate=device Token # 1939: 112.609ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=0.998 next=draft=4169 prop=4169 olap pair=107.4ms serial=189.1ms gain=81.7ms ratio=0.43 s0=4.3ms s1=184.8ms wait=0.1/47.8ms pred gate=device Token # 1940: 3.770ms; value: next_token_ids=tensor([4169], device='cuda:0') mtp accept=1 prop=4169 top1=4169 accp=0.912 next=pair draft=996 prop=996 pred gate=device Token # 1941: 112.221ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=0.982 next=draft=15206 prop=15206 olap pair=106.9ms serial=189.5ms gain=82.5ms ratio=0.44 s0=4.2ms s1=185.3ms wait=0.1/47.6ms pred gate=device Token # 1942: 3.880ms; value: next_token_ids=tensor([1942], device='cuda:0') mtp accept=0 prop=15206 top1=15206 accp=0.601 next=pair draft=1148 prop=1148 pred gate=device Token # 1943: 112.560ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=0.774 next=draft=14149 prop=14149 olap pair=107.3ms serial=190.3ms gain=83.0ms ratio=0.44 s0=3.9ms s1=186.3ms wait=0.1/48.3ms pred gate=device Token # 1944: 3.713ms; value: next_token_ids=tensor([14149], device='cuda:0') mtp accept=1 prop=14149 top1=14149 accp=0.997 next=pair draft=303 prop=303 pred gate=device Token # 1945: 112.343ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.999 next=draft=13959 prop=13959 olap pair=107.1ms serial=189.3ms gain=82.2ms ratio=0.43 s0=4.2ms s1=185.1ms wait=0.1/47.7ms pred gate=device Token # 1946: 3.804ms; value: next_token_ids=tensor([13959], device='cuda:0') mtp accept=1 prop=13959 top1=13959 accp=0.943 next=pair draft=1267 prop=1267 pred gate=device Token # 1947: 112.417ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=draft=223 prop=223 olap pair=107.2ms serial=190.0ms gain=82.8ms ratio=0.44 s0=4.1ms s1=186.0ms wait=0.1/47.8ms pred gate=device Token # 1948: 3.834ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=3354 prop=3354 pred gate=device Token # 1949: 112.424ms; value: next_token_ids=tensor([3354], device='cuda:0') mtp accept=1 prop=3354 top1=3354 accp=1.000 next=draft=438 prop=438 olap pair=107.2ms serial=190.0ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.6ms wait=0.1/47.1ms pred gate=device Token # 1950: 3.766ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.993 next=pair draft=223 prop=223 pred gate=device Token # 1951: 112.717ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=13959 prop=13959 olap pair=107.4ms serial=188.8ms gain=81.4ms ratio=0.43 s0=4.7ms s1=184.1ms wait=0.1/47.1ms pred gate=device Token # 1952: 3.748ms; value: next_token_ids=tensor([13959], device='cuda:0') mtp accept=1 prop=13959 top1=13959 accp=1.000 next=pair draft=15 prop=15 pred gate=device Token # 1953: 112.289ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=0 prop=15 top1=565 accp=0.152 next=draft=223 prop=223 olap pair=107.0ms serial=189.7ms gain=82.6ms ratio=0.44 s0=4.4ms s1=185.2ms wait=0.1/47.1ms pred gate=device Token # 1954: 113.155ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=22 prop=22 olap pair=107.7ms serial=189.5ms gain=81.7ms ratio=0.43 s0=4.7ms s1=184.7ms wait=0.1/47.2ms pred gate=device Token # 1955: 3.822ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=pair draft=12 prop=12 pred gate=device Token # 1956: 113.016ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=draft=3354 prop=3354 olap pair=107.7ms serial=188.9ms gain=81.1ms ratio=0.43 s0=4.8ms s1=184.0ms wait=0.1/47.1ms pred gate=device Token # 1957: 3.772ms; value: next_token_ids=tensor([3354], device='cuda:0') mtp accept=1 prop=3354 top1=3354 accp=1.000 next=pair draft=438 prop=438 pred gate=device Token # 1958: 113.053ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=0.822 next=draft=223 prop=223 olap pair=107.8ms serial=188.9ms gain=81.1ms ratio=0.43 s0=4.8ms s1=184.1ms wait=0.1/47.3ms pred gate=device Token # 1959: 3.876ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=13959 prop=13959 pred gate=device Token # 1960: 113.333ms; value: next_token_ids=tensor([13959], device='cuda:0') mtp accept=1 prop=13959 top1=13959 accp=1.000 next=draft=15 prop=15 olap pair=108.1ms serial=189.7ms gain=81.6ms ratio=0.43 s0=4.6ms s1=185.1ms wait=0.1/47.6ms pred gate=device Token # 1961: 3.850ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=pair draft=10925 prop=10925 pred gate=device Token # 1962: 115.364ms; value: next_token_ids=tensor([10925], device='cuda:0') mtp accept=1 prop=10925 top1=10925 accp=1.000 next=draft=31 prop=31 olap pair=107.9ms serial=189.4ms gain=81.5ms ratio=0.43 s0=4.3ms s1=185.1ms wait=0.1/48.2ms pred gate=device Token # 1963: 3.870ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=1942 prop=1942 pred gate=device Token # 1964: 113.142ms; value: next_token_ids=tensor([1942], device='cuda:0') mtp accept=1 prop=1942 top1=1942 accp=1.000 next=draft=320 prop=320 olap pair=107.9ms serial=189.5ms gain=81.6ms ratio=0.43 s0=4.3ms s1=185.2ms wait=0.1/48.0ms pred gate=device Token # 1965: 3.782ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=2636 prop=2636 pred gate=device Token # 1966: 112.987ms; value: next_token_ids=tensor([2636], device='cuda:0') mtp accept=1 prop=2636 top1=2636 accp=1.000 next=draft=1532 prop=1532 olap pair=107.7ms serial=189.8ms gain=82.1ms ratio=0.43 s0=4.3ms s1=185.5ms wait=0.1/47.7ms pred gate=device Token # 1967: 3.796ms; value: next_token_ids=tensor([1532], device='cuda:0') mtp accept=1 prop=1532 top1=1532 accp=0.956 next=pair draft=996 prop=996 pred gate=device Token # 1968: 112.398ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=0.988 next=draft=1267 prop=1267 olap pair=107.1ms serial=190.0ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.6ms wait=0.1/47.1ms pred gate=device Token # 1969: 3.792ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1970: 112.677ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=3354 prop=3354 olap pair=107.3ms serial=190.2ms gain=82.9ms ratio=0.44 s0=4.4ms s1=185.8ms wait=0.1/47.2ms pred gate=device Token # 1971: 3.784ms; value: next_token_ids=tensor([3354], device='cuda:0') mtp accept=1 prop=3354 top1=3354 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1972: 112.615ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=0 prop=223 top1=438 accp=0.603 next=draft=223 prop=223 olap pair=107.4ms serial=190.5ms gain=83.1ms ratio=0.44 s0=4.5ms s1=186.0ms wait=0.1/46.9ms pred gate=device Token # 1973: 112.697ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=1942 prop=1942 olap pair=107.4ms serial=190.5ms gain=83.1ms ratio=0.44 s0=4.4ms s1=186.0ms wait=0.1/47.0ms pred gate=device Token # 1974: 3.721ms; value: next_token_ids=tensor([1942], device='cuda:0') mtp accept=1 prop=1942 top1=1942 accp=1.000 next=pair draft=1148 prop=478 pred gate=device Token # 1975: 112.744ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.422 next=draft=113008 prop=113008 olap pair=107.5ms serial=190.8ms gain=83.3ms ratio=0.44 s0=4.4ms s1=186.4ms wait=0.1/47.2ms pred gate=device Token # 1976: 3.708ms; value: next_token_ids=tensor([2491], device='cuda:0') mtp accept=0 prop=113008 top1=5719 accp=0.000 next=pair draft=768 prop=768 pred gate=device Token # 1977: 112.340ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=7163 prop=7163 olap pair=107.1ms serial=189.8ms gain=82.7ms ratio=0.44 s0=4.1ms s1=185.7ms wait=0.1/48.2ms pred gate=device Token # 1978: 3.704ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 1979: 112.694ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=19698 prop=19698 olap pair=107.4ms serial=190.4ms gain=82.9ms ratio=0.44 s0=4.1ms s1=186.3ms wait=0.1/48.2ms pred gate=device Token # 1980: 3.685ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=pair draft=1267 prop=1267 pred gate=device Token # 1981: 112.810ms; value: next_token_ids=tensor([1267], device='cuda:0') mtp accept=1 prop=1267 top1=1267 accp=1.000 next=draft=223 prop=223 olap pair=107.6ms serial=190.1ms gain=82.5ms ratio=0.43 s0=4.1ms s1=185.9ms wait=0.1/48.4ms pred gate=device Token # 1982: 3.814ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=2491 prop=2491 pred gate=device Token # 1983: 112.738ms; value: next_token_ids=tensor([2491], device='cuda:0') mtp accept=1 prop=2491 top1=2491 accp=1.000 next=draft=320 prop=320 olap pair=107.4ms serial=190.3ms gain=82.8ms ratio=0.44 s0=4.4ms s1=185.9ms wait=0.1/47.2ms pred gate=device Token # 1984: 3.726ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=2491 prop=2491 pred gate=device Token # 1985: 112.401ms; value: next_token_ids=tensor([2491], device='cuda:0') mtp accept=1 prop=2491 top1=2491 accp=0.999 next=draft=12 prop=12 olap pair=107.2ms serial=190.1ms gain=82.9ms ratio=0.44 s0=4.2ms s1=185.9ms wait=0.1/47.8ms pred gate=device Token # 1986: 3.772ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=0.999 next=pair draft=13035 prop=13035 pred gate=device Token # 1987: 112.517ms; value: next_token_ids=tensor([13035], device='cuda:0') mtp accept=1 prop=13035 top1=13035 accp=1.000 next=draft=6895 prop=5769 olap pair=107.3ms serial=190.1ms gain=82.8ms ratio=0.44 s0=4.1ms s1=186.0ms wait=0.1/48.1ms pred gate=device Token # 1988: 3.756ms; value: next_token_ids=tensor([4460], device='cuda:0') mtp accept=0 prop=5769 top1=4460 accp=0.001 next=pair draft=22 prop=22 pred gate=device Token # 1989: 113.151ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=0 prop=22 top1=20 accp=0.065 next=draft=438 prop=438 olap pair=107.0ms serial=189.3ms gain=82.3ms ratio=0.43 s0=5.0ms s1=184.3ms wait=0.1/46.8ms pred gate=device Token # 1990: 113.014ms; value: next_token_ids=tensor([438], device='cuda:0') mtp accept=1 prop=438 top1=438 accp=1.000 next=draft=1148 prop=1148 olap pair=107.5ms serial=190.3ms gain=82.8ms ratio=0.44 s0=4.8ms s1=185.6ms wait=0.1/47.4ms pred gate=device Token # 1991: 3.735ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=1.000 next=pair draft=2491 prop=2491 pred gate=device Token # 1992: 112.683ms; value: next_token_ids=tensor([2491], device='cuda:0') mtp accept=1 prop=2491 top1=2491 accp=1.000 next=draft=12 prop=12 olap pair=107.5ms serial=190.7ms gain=83.2ms ratio=0.44 s0=3.8ms s1=186.8ms wait=0.1/48.5ms pred gate=device Token # 1993: 3.719ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=13035 prop=13035 pred gate=device Token # 1994: 112.114ms; value: next_token_ids=tensor([13035], device='cuda:0') mtp accept=1 prop=13035 top1=13035 accp=1.000 next=draft=4460 prop=4460 olap pair=107.0ms serial=189.8ms gain=82.8ms ratio=0.44 s0=3.8ms s1=186.0ms wait=0.1/48.6ms pred gate=device Token # 1995: 3.762ms; value: next_token_ids=tensor([4460], device='cuda:0') mtp accept=1 prop=4460 top1=4460 accp=0.998 next=pair draft=20 prop=20 pred gate=device Token # 1996: 112.245ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=768 prop=768 olap pair=107.1ms serial=189.8ms gain=82.7ms ratio=0.44 s0=4.4ms s1=185.4ms wait=0.1/47.1ms pred gate=device Token # 1997: 3.720ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=2491 prop=2491 pred gate=device Token # 1998: 112.675ms; value: next_token_ids=tensor([2491], device='cuda:0') mtp accept=1 prop=2491 top1=2491 accp=1.000 next=draft=12 prop=12 olap pair=107.4ms serial=190.4ms gain=83.0ms ratio=0.44 s0=4.4ms s1=186.0ms wait=0.1/47.1ms pred gate=device Token # 1999: 3.735ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=9663 prop=9663 pred gate=device Token # 2000: 112.653ms; value: next_token_ids=tensor([9663], device='cuda:0') mtp accept=1 prop=9663 top1=9663 accp=1.000 next=draft=1320 prop=1320 olap pair=107.4ms serial=190.4ms gain=83.0ms ratio=0.44 s0=4.4ms s1=186.0ms wait=0.1/47.3ms pred gate=device Token # 2001: 3.751ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=pair draft=18 prop=18 pred gate=device Token # 2002: 112.188ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=31 prop=31 olap pair=106.9ms serial=189.7ms gain=82.8ms ratio=0.44 s0=3.9ms s1=185.8ms wait=0.1/48.4ms pred gate=device Token # 2003: 3.758ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=6650 prop=6650 pred gate=device Token # 2004: 112.321ms; value: next_token_ids=tensor([6650], device='cuda:0') mtp accept=1 prop=6650 top1=6650 accp=1.000 next=draft=5126 prop=5126 olap pair=107.1ms serial=189.9ms gain=82.8ms ratio=0.44 s0=3.9ms s1=186.0ms wait=0.1/48.5ms pred gate=device Token # 2005: 3.718ms; value: next_token_ids=tensor([5126], device='cuda:0') mtp accept=1 prop=5126 top1=5126 accp=1.000 next=pair draft=1320 prop=1320 pred gate=device Token # 2006: 112.361ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=draft=303 prop=303 olap pair=107.2ms serial=190.0ms gain=82.9ms ratio=0.44 s0=3.8ms s1=186.2ms wait=0.1/48.6ms pred gate=device Token # 2007: 3.790ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=2491 prop=2491 pred gate=device Token # 2008: 112.644ms; value: next_token_ids=tensor([2491], device='cuda:0') mtp accept=1 prop=2491 top1=2491 accp=1.000 next=draft=12 prop=12 olap pair=107.4ms serial=190.5ms gain=83.1ms ratio=0.44 s0=3.9ms s1=186.6ms wait=0.1/48.5ms pred gate=device Token # 2009: 3.778ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=pair draft=14666 prop=14666 pred gate=device Token # 2010: 113.875ms; value: next_token_ids=tensor([14666], device='cuda:0') mtp accept=1 prop=14666 top1=14666 accp=1.000 next=draft=736 prop=736 olap pair=107.7ms serial=189.8ms gain=82.1ms ratio=0.43 s0=4.8ms s1=185.0ms wait=0.1/47.2ms pred gate=device Token # 2011: 4.591ms; value: next_token_ids=tensor([736], device='cuda:0') mtp accept=1 prop=736 top1=736 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 2012: 113.040ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=10186 prop=10186 olap pair=107.6ms serial=189.1ms gain=81.5ms ratio=0.43 s0=5.3ms s1=183.7ms wait=0.1/46.5ms pred gate=device Token # 2013: 3.723ms; value: next_token_ids=tensor([10186], device='cuda:0') mtp accept=1 prop=10186 top1=10186 accp=1.000 next=pair draft=29291 prop=29291 pred gate=device Token # 2014: 113.098ms; value: next_token_ids=tensor([29291], device='cuda:0') mtp accept=1 prop=29291 top1=29291 accp=1.000 next=draft=22 prop=22 olap pair=107.1ms serial=189.2ms gain=82.1ms ratio=0.43 s0=4.9ms s1=184.3ms wait=0.1/47.0ms pred gate=device Token # 2015: 4.596ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 2016: 112.890ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.996 next=draft=77230 prop=77230 olap pair=107.2ms serial=188.6ms gain=81.4ms ratio=0.43 s0=8.8ms s1=179.7ms wait=0.2/42.7ms pred gate=device Token # 2017: 3.778ms; value: next_token_ids=tensor([77230], device='cuda:0') mtp accept=1 prop=77230 top1=77230 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 2018: 112.534ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=6650 prop=6650 olap pair=107.3ms serial=190.2ms gain=82.9ms ratio=0.44 s0=3.9ms s1=186.3ms wait=0.1/48.4ms pred gate=device Token # 2019: 3.726ms; value: next_token_ids=tensor([6650], device='cuda:0') mtp accept=1 prop=6650 top1=6650 accp=1.000 next=pair draft=5126 prop=5126 pred gate=device Token # 2020: 112.710ms; value: next_token_ids=tensor([5126], device='cuda:0') mtp accept=1 prop=5126 top1=5126 accp=1.000 next=draft=1320 prop=1320 olap pair=107.4ms serial=190.5ms gain=83.1ms ratio=0.44 s0=3.8ms s1=186.6ms wait=0.1/48.5ms pred gate=device Token # 2021: 3.734ms; value: next_token_ids=tensor([1320], device='cuda:0') mtp accept=1 prop=1320 top1=1320 accp=1.000 next=pair draft=13 prop=13 pred gate=device Token # 2022: 112.579ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=draft=10186 prop=10186 olap pair=107.4ms serial=190.5ms gain=83.1ms ratio=0.44 s0=3.8ms s1=186.6ms wait=0.1/48.6ms pred gate=device Token # 2023: 3.857ms; value: next_token_ids=tensor([10186], device='cuda:0') mtp accept=1 prop=10186 top1=10186 accp=1.000 next=pair draft=29291 prop=29291 pred gate=device Token # 2024: 113.038ms; value: next_token_ids=tensor([29291], device='cuda:0') mtp accept=1 prop=29291 top1=29291 accp=1.000 next=draft=22 prop=22 olap pair=107.6ms serial=189.6ms gain=82.0ms ratio=0.43 s0=4.1ms s1=185.5ms wait=0.1/48.4ms pred gate=device Token # 2025: 3.746ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 2026: 113.093ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=7163 prop=7163 olap pair=107.8ms serial=189.1ms gain=81.3ms ratio=0.43 s0=4.3ms s1=184.8ms wait=0.1/48.2ms pred gate=device Token # 2027: 3.870ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=pair draft=27521 prop=27521 pred gate=device Token # 2028: 112.535ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=draft=26484 prop=26484 olap pair=107.3ms serial=189.9ms gain=82.6ms ratio=0.44 s0=4.0ms s1=185.9ms wait=0.1/48.1ms pred gate=device Token # 2029: 3.711ms; value: next_token_ids=tensor([26484], device='cuda:0') mtp accept=1 prop=26484 top1=26484 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 2030: 112.675ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=4169 prop=4169 olap pair=107.4ms serial=190.5ms gain=83.1ms ratio=0.44 s0=4.0ms s1=186.4ms wait=0.1/48.2ms pred gate=device Token # 2031: 3.794ms; value: next_token_ids=tensor([4169], device='cuda:0') mtp accept=1 prop=4169 top1=4169 accp=1.000 next=pair draft=996 prop=996 pred gate=device Token # 2032: 112.079ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=draft=1942 prop=1942 olap pair=106.9ms serial=189.4ms gain=82.5ms ratio=0.44 s0=4.1ms s1=185.3ms wait=0.1/48.0ms pred gate=device Token # 2033: 3.765ms; value: next_token_ids=tensor([1942], device='cuda:0') mtp accept=1 prop=1942 top1=1942 accp=1.000 next=pair draft=478 prop=478 pred gate=device Token # 2034: 112.763ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.999 next=draft=4414 prop=4414 olap pair=107.5ms serial=190.6ms gain=83.1ms ratio=0.44 s0=4.4ms s1=186.2ms wait=0.1/47.0ms pred gate=device Token # 2035: 3.744ms; value: next_token_ids=tensor([4414], device='cuda:0') mtp accept=1 prop=4414 top1=4414 accp=0.945 next=pair draft=768 prop=768 pred gate=device Token # 2036: 113.524ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=7163 prop=7163 olap pair=108.4ms serial=190.9ms gain=82.5ms ratio=0.43 s0=4.4ms s1=186.6ms wait=0.1/47.6ms pred gate=device