[2026-04-08 08:01:13.017959 INFO duck_llm] 这是一条信息日志 [2026-04-08 08:01:13.017998 WARN duck_llm] 这是一条警告日志 [2026-04-08 08:01:13.018001 ERROR duck_llm] 这是一条错误日志 [2026-04-08 08:01:13.018204 INFO utils] Selected DPDK lcores: master=0, workers=[2, 4, 6, 8], all_performance_core_representatives=[0, 2, 4, 6, 8, 10, 12, 14] EAL: Detected CPU lcores: 32 EAL: Detected NUMA nodes: 1 EAL: Detected shared linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: VFIO support initialized EAL: Using IOMMU type 1 (Type 1) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) [2026-04-08 08:01:15.061748 INFO dpdk_workers] DPDK initialized successfully. Found 4 ports. [2026-04-08 08:01:15.061764 INFO dpdk_workers] Port 0 device name: 0000:01:00.0 [2026-04-08 08:01:15.061766 INFO dpdk_workers] Port 0 IP address: 10.21.1.1 [2026-04-08 08:01:15.061768 INFO dpdk_workers] Port 0 Broadcast address: 10.21.1.255 [2026-04-08 08:01:15.061771 INFO dpdk_workers] Port 1 device name: 0000:01:00.1 [2026-04-08 08:01:15.061772 INFO dpdk_workers] Port 1 IP address: 10.21.2.1 [2026-04-08 08:01:15.061774 INFO dpdk_workers] Port 1 Broadcast address: 10.21.2.255 [2026-04-08 08:01:15.061779 INFO dpdk_workers] Port 2 device name: 0000:01:00.2 [2026-04-08 08:01:15.061782 INFO dpdk_workers] Port 2 IP address: 10.21.3.1 [2026-04-08 08:01:15.061783 INFO dpdk_workers] Port 2 Broadcast address: 10.21.3.255 [2026-04-08 08:01:15.061785 INFO dpdk_workers] Port 3 device name: 0000:01:00.3 [2026-04-08 08:01:15.061786 INFO dpdk_workers] Port 3 IP address: 10.21.4.1 [2026-04-08 08:01:15.061787 INFO dpdk_workers] Port 3 Broadcast address: 10.21.4.255 [2026-04-08 08:01:15.061789 INFO dpdk_workers] Available netifs list: [(10.21.1.255, 0, 10.21.1.1), (10.21.2.255, 1, 10.21.2.1), (10.21.3.255, 2, 10.21.3.1), (10.21.4.255, 3, 10.21.4.1)] [2026-04-08 08:01:15.061795 INFO dpdk_workers] Starting worker #0: (bcast_ip: 10.21.1.255, port_id: 0, lcore_id: 2, host_ip: 10.21.1.1) [2026-04-08 08:01:15.061821 INFO dpdk_workers] Starting worker #1: (bcast_ip: 10.21.2.255, port_id: 1, lcore_id: 4, host_ip: 10.21.2.1) [2026-04-08 08:01:15.061848 INFO dpdk_workers] Initializing worker port 0 on lcore 2... [2026-04-08 08:01:15.063783 INFO dpdk_workers] Starting worker #2: (bcast_ip: 10.21.3.255, port_id: 2, lcore_id: 6, host_ip: 10.21.3.1) [2026-04-08 08:01:15.063807 INFO dpdk_workers] Starting worker #3: (bcast_ip: 10.21.4.255, port_id: 3, lcore_id: 8, host_ip: 10.21.4.1) [2026-04-08 08:01:15.063840 INFO dpdk_workers] Initializing worker port 1 on lcore 4... [2026-04-08 08:01:15.065803 INFO dpdk_workers] Initializing worker port 2 on lcore 6... [2026-04-08 08:01:15.067814 INFO dpdk_workers] Initializing worker port 3 on lcore 8... ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 0). ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 1). ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 2). ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 3). [2026-04-08 08:01:18.738574 INFO dpdk_workers] Worker port 2 initialized successfully. [2026-04-08 08:01:19.556980 INFO dpdk_workers] Worker port 3 initialized successfully. [2026-04-08 08:01:19.591855 INFO dpdk_workers] Worker port 0 initialized successfully. [2026-04-08 08:01:19.593783 INFO dpdk_workers] Worker port 1 initialized successfully. [2026-04-08 08:01:19.593811 INFO dpdk_workers] Workers initialized successfully. 4 workers running. [2026-04-08 08:01:19.594080 INFO utils] Binding master thread to cores (excluding workers): [0, 1, 3, 5, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31] [2026-04-08 08:01:19.594089 INFO utils] set_thread_affinity(tid 1359726, cores [0, 1, 3, 5, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]): 0 [2026-04-08 08:01:19.595111 INFO dpdk_workers] Run command Ping all time: send 1.0 us, recv 1014.4 us [2026-04-08 08:01:19.645169 INFO dpdk_workers] Run command Ping all time: send 0.3 us, recv 0.5 us [2026-04-08 08:01:19.695225 INFO dpdk_workers] Run command Ping all time: send 0.2 us, recv 0.4 us [2026-04-08 08:01:19.745281 INFO dpdk_workers] Run command Ping all time: send 0.2 us, recv 0.4 us [2026-04-08 08:01:19.795337 INFO dpdk_workers] Run command Ping all time: send 0.2 us, recv 0.5 us [2026-04-08 08:01:19.845393 INFO dpdk_workers] Run command Ping all time: send 0.2 us, recv 0.4 us [2026-04-08 08:01:19.895449 INFO dpdk_workers] Run command Ping all time: send 0.3 us, recv 0.3 us [2026-04-08 08:01:19.945504 INFO dpdk_workers] Run command Ping all time: send 0.3 us, recv 0.4 us [2026-04-08 08:01:19.995561 INFO dpdk_workers] Run command Ping all time: send 0.4 us, recv 0.5 us [2026-04-08 08:01:20.045620 INFO dpdk_workers] Run command Ping all time: send 0.2 us, recv 0.5 us [2026-04-08 08:01:20.095741 INFO dpdk_workers] Found 32 ducks in duck-ips-multi-netifs.txt [2026-04-08 08:01:20.095745 INFO dpdk_workers] Duck #0: 10.21.1.101 (bcast_ip: 10.21.1.255) [2026-04-08 08:01:20.095747 INFO dpdk_workers] Duck #1: 10.21.1.102 (bcast_ip: 10.21.1.255) [2026-04-08 08:01:20.095749 INFO dpdk_workers] Duck #2: 10.21.1.103 (bcast_ip: 10.21.1.255) [2026-04-08 08:01:20.095751 INFO dpdk_workers] Duck #3: 10.21.1.104 (bcast_ip: 10.21.1.255) [2026-04-08 08:01:20.095753 INFO dpdk_workers] Duck #4: 10.21.1.105 (bcast_ip: 10.21.1.255) [2026-04-08 08:01:20.095755 INFO dpdk_workers] Duck #5: 10.21.1.106 (bcast_ip: 10.21.1.255) [2026-04-08 08:01:20.095757 INFO dpdk_workers] Duck #6: 10.21.1.107 (bcast_ip: 10.21.1.255) [2026-04-08 08:01:20.095758 INFO dpdk_workers] Duck #7: 10.21.1.108 (bcast_ip: 10.21.1.255) [2026-04-08 08:01:20.095760 INFO dpdk_workers] Duck #8: 10.21.2.101 (bcast_ip: 10.21.2.255) [2026-04-08 08:01:20.095763 INFO dpdk_workers] Duck #9: 10.21.2.102 (bcast_ip: 10.21.2.255) [2026-04-08 08:01:20.095765 INFO dpdk_workers] Duck #10: 10.21.2.103 (bcast_ip: 10.21.2.255) [2026-04-08 08:01:20.095767 INFO dpdk_workers] Duck #11: 10.21.2.104 (bcast_ip: 10.21.2.255) [2026-04-08 08:01:20.095769 INFO dpdk_workers] Duck #12: 10.21.2.105 (bcast_ip: 10.21.2.255) [2026-04-08 08:01:20.095771 INFO dpdk_workers] Duck #13: 10.21.2.106 (bcast_ip: 10.21.2.255) [2026-04-08 08:01:20.095773 INFO dpdk_workers] Duck #14: 10.21.2.107 (bcast_ip: 10.21.2.255) [2026-04-08 08:01:20.095777 INFO dpdk_workers] Duck #15: 10.21.2.108 (bcast_ip: 10.21.2.255) [2026-04-08 08:01:20.095780 INFO dpdk_workers] Duck #16: 10.21.3.101 (bcast_ip: 10.21.3.255) [2026-04-08 08:01:20.095783 INFO dpdk_workers] Duck #17: 10.21.3.102 (bcast_ip: 10.21.3.255) [2026-04-08 08:01:20.095785 INFO dpdk_workers] Duck #18: 10.21.3.103 (bcast_ip: 10.21.3.255) [2026-04-08 08:01:20.095787 INFO dpdk_workers] Duck #19: 10.21.3.104 (bcast_ip: 10.21.3.255) [2026-04-08 08:01:20.095789 INFO dpdk_workers] Duck #20: 10.21.3.105 (bcast_ip: 10.21.3.255) [2026-04-08 08:01:20.095791 INFO dpdk_workers] Duck #21: 10.21.3.106 (bcast_ip: 10.21.3.255) [2026-04-08 08:01:20.095793 INFO dpdk_workers] Duck #22: 10.21.3.107 (bcast_ip: 10.21.3.255) [2026-04-08 08:01:20.095795 INFO dpdk_workers] Duck #23: 10.21.3.108 (bcast_ip: 10.21.3.255) [2026-04-08 08:01:20.095796 INFO dpdk_workers] Duck #24: 10.21.4.101 (bcast_ip: 10.21.4.255) [2026-04-08 08:01:20.095798 INFO dpdk_workers] Duck #25: 10.21.4.102 (bcast_ip: 10.21.4.255) [2026-04-08 08:01:20.095800 INFO dpdk_workers] Duck #26: 10.21.4.103 (bcast_ip: 10.21.4.255) [2026-04-08 08:01:20.095802 INFO dpdk_workers] Duck #27: 10.21.4.104 (bcast_ip: 10.21.4.255) [2026-04-08 08:01:20.095804 INFO dpdk_workers] Duck #28: 10.21.4.105 (bcast_ip: 10.21.4.255) [2026-04-08 08:01:20.095806 INFO dpdk_workers] Duck #29: 10.21.4.106 (bcast_ip: 10.21.4.255) [2026-04-08 08:01:20.095808 INFO dpdk_workers] Duck #30: 10.21.4.107 (bcast_ip: 10.21.4.255) [2026-04-08 08:01:20.095812 INFO dpdk_workers] Duck #31: 10.21.4.108 (bcast_ip: 10.21.4.255) [2026-04-08 08:01:20.397846 INFO dpdk_workers] [Worker 0]: 10.21.1.101 [2026-04-08 08:01:20.397850 INFO dpdk_workers] [Worker 0]: 10.21.1.102 [2026-04-08 08:01:20.397852 INFO dpdk_workers] [Worker 0]: 10.21.1.103 [2026-04-08 08:01:20.397853 INFO dpdk_workers] [Worker 0]: 10.21.1.104 [2026-04-08 08:01:20.397855 INFO dpdk_workers] [Worker 0]: 10.21.1.105 [2026-04-08 08:01:20.397857 INFO dpdk_workers] [Worker 0]: 10.21.1.106 [2026-04-08 08:01:20.397858 INFO dpdk_workers] [Worker 0]: 10.21.1.107 [2026-04-08 08:01:20.397860 INFO dpdk_workers] [Worker 0]: 10.21.1.108 [2026-04-08 08:01:20.397863 INFO dpdk_workers] [Worker 1]: 10.21.2.101 [2026-04-08 08:01:20.397865 INFO dpdk_workers] [Worker 1]: 10.21.2.102 [2026-04-08 08:01:20.397867 INFO dpdk_workers] [Worker 1]: 10.21.2.103 [2026-04-08 08:01:20.397870 INFO dpdk_workers] [Worker 1]: 10.21.2.104 [2026-04-08 08:01:20.397872 INFO dpdk_workers] [Worker 1]: 10.21.2.105 [2026-04-08 08:01:20.397874 INFO dpdk_workers] [Worker 1]: 10.21.2.106 [2026-04-08 08:01:20.397875 INFO dpdk_workers] [Worker 1]: 10.21.2.107 [2026-04-08 08:01:20.397877 INFO dpdk_workers] [Worker 1]: 10.21.2.108 [2026-04-08 08:01:20.397880 INFO dpdk_workers] [Worker 2]: 10.21.3.101 [2026-04-08 08:01:20.397882 INFO dpdk_workers] [Worker 2]: 10.21.3.102 [2026-04-08 08:01:20.397884 INFO dpdk_workers] [Worker 2]: 10.21.3.103 [2026-04-08 08:01:20.397885 INFO dpdk_workers] [Worker 2]: 10.21.3.104 [2026-04-08 08:01:20.397887 INFO dpdk_workers] [Worker 2]: 10.21.3.105 [2026-04-08 08:01:20.397889 INFO dpdk_workers] [Worker 2]: 10.21.3.106 [2026-04-08 08:01:20.397891 INFO dpdk_workers] [Worker 2]: 10.21.3.107 [2026-04-08 08:01:20.397892 INFO dpdk_workers] [Worker 2]: 10.21.3.108 [2026-04-08 08:01:20.398910 INFO dpdk_workers] [Worker 3]: 10.21.4.101 [2026-04-08 08:01:20.398912 INFO dpdk_workers] [Worker 3]: 10.21.4.102 [2026-04-08 08:01:20.398913 INFO dpdk_workers] [Worker 3]: 10.21.4.103 [2026-04-08 08:01:20.398915 INFO dpdk_workers] [Worker 3]: 10.21.4.104 [2026-04-08 08:01:20.398916 INFO dpdk_workers] [Worker 3]: 10.21.4.105 [2026-04-08 08:01:20.398918 INFO dpdk_workers] [Worker 3]: 10.21.4.106 [2026-04-08 08:01:20.398919 INFO dpdk_workers] [Worker 3]: 10.21.4.107 [2026-04-08 08:01:20.398921 INFO dpdk_workers] [Worker 3]: 10.21.4.108 [2026-04-08 08:01:20.398923 INFO dpdk_workers] init_ducks done [2026-04-08 08:01:20.400113 INFO dpdk_ducks] Initialized 4 DPDK duck workers [2026-04-08 08:01:20.400127 INFO dpdk_ducks] DPDK duck worker 0: DpdkDuckWorker { worker_idx: 0, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (0, 8) } [2026-04-08 08:01:20.400132 INFO dpdk_ducks] DPDK duck worker 1: DpdkDuckWorker { worker_idx: 1, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (8, 16) } [2026-04-08 08:01:20.400135 INFO dpdk_ducks] DPDK duck worker 2: DpdkDuckWorker { worker_idx: 2, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (16, 24) } [2026-04-08 08:01:20.400137 INFO dpdk_ducks] DPDK duck worker 3: DpdkDuckWorker { worker_idx: 3, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (24, 32) } [2026-04-08 08:01:20.400143 INFO buffer_manager] Initializing buffer manager [2026-04-08 08:01:20.400145 INFO buffer_manager] Buffer manager initialized: ELF BufferAllocator { begin: 0, end: 10485760, current: 0 }, input BufferAllocator { begin: 10485760, end: 104857600, current: 10485760 }, weights BufferAllocator { begin: 104923136, end: 32212254720, current: 104923136 } [2026-04-08 08:01:20.400149 INFO fp8_dpdk_common] fp9 persistent judge enabled by default; set DUCK_FP9_PERSISTENT_JUDGE=0 to disable [2026-04-08 08:01:20.400560 INFO buffer_manager] Added kernel fp9_kernels at (0, 91664) [2026-04-08 08:01:20.400596 INFO fp8_dpdk_common] fp9 persistent judge: opened 32 sessions [2026-04-08 08:01:20.400599 INFO fp8_dpdk_common] fp9 persistent judge: force-opened 32 fresh sessions for new init [2026-04-08 08:01:20.400600 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init(tp_size=32) [2026-04-08 08:01:20.400602 INFO fp8_moe_dpdk] fp8_moe_dpdk: init(tp_size=32) [2026-04-08 08:01:20.772631 INFO weight_cache] weight_cache: header hit tp_size=32 num_slots=62 finished_slots=62 [2026-04-08 08:01:21.099633 INFO buffer_manager] Allocated weights buffer at (104923136, 0) [2026-04-08 08:01:21.099656 INFO buffer_manager] Allocated weights buffer at (104923136, 4128768) [2026-04-08 08:01:21.099658 INFO buffer_manager] Allocated weights buffer at (109051904, 516096) [2026-04-08 08:01:21.099660 INFO buffer_manager] Allocated weights buffer at (109568000, 2016) [2026-04-08 08:01:21.099661 INFO buffer_manager] Allocated weights buffer at (109572096, 4128768) [2026-04-08 08:01:21.099663 INFO buffer_manager] Allocated weights buffer at (113700864, 516096) [2026-04-08 08:01:21.099664 INFO buffer_manager] Allocated weights buffer at (114216960, 2016) [2026-04-08 08:01:21.099666 INFO buffer_manager] Allocated weights buffer at (114221056, 4128768) [2026-04-08 08:01:21.099667 INFO buffer_manager] Allocated weights buffer at (118349824, 516096) [2026-04-08 08:01:21.099669 INFO buffer_manager] Allocated weights buffer at (118865920, 2016) [2026-04-08 08:01:21.099670 INFO buffer_manager] Allocated weights buffer at (118870016, 0) [2026-04-08 08:01:21.099672 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init_layer_cached(layer_idx=0, cache_slot=0) planned desc only [2026-04-08 08:01:21.192044 INFO buffer_manager] Allocated weights buffer at (118870016, 0) [2026-04-08 08:01:21.192064 INFO buffer_manager] Allocated weights buffer at (118870016, 4128768) [2026-04-08 08:01:21.192065 INFO buffer_manager] Allocated weights buffer at (122998784, 516096) [2026-04-08 08:01:21.192067 INFO buffer_manager] Allocated weights buffer at (123514880, 2016) [2026-04-08 08:01:21.192069 INFO buffer_manager] Allocated weights buffer at (123518976, 4128768) [2026-04-08 08:01:21.192070 INFO buffer_manager] Allocated weights buffer at (127647744, 516096) [2026-04-08 08:01:21.192072 INFO buffer_manager] Allocated weights buffer at (128163840, 2016) [2026-04-08 08:01:21.192073 INFO buffer_manager] Allocated weights buffer at (128167936, 4128768) [2026-04-08 08:01:21.192075 INFO buffer_manager] Allocated weights buffer at (132296704, 516096) [2026-04-08 08:01:21.192076 INFO buffer_manager] Allocated weights buffer at (132812800, 2016) [2026-04-08 08:01:21.192078 INFO buffer_manager] Allocated weights buffer at (132816896, 0) [2026-04-08 08:01:21.192079 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init_layer_cached(layer_idx=1, cache_slot=1) planned desc only [2026-04-08 08:01:21.278568 INFO buffer_manager] Allocated weights buffer at (132816896, 0) [2026-04-08 08:01:21.278588 INFO buffer_manager] Allocated weights buffer at (132816896, 4128768) [2026-04-08 08:01:21.278590 INFO buffer_manager] Allocated weights buffer at (136945664, 516096) [2026-04-08 08:01:21.278592 INFO buffer_manager] Allocated weights buffer at (137461760, 2016) [2026-04-08 08:01:21.278598 INFO buffer_manager] Allocated weights buffer at (137465856, 4128768) [2026-04-08 08:01:21.278600 INFO buffer_manager] Allocated weights buffer at (141594624, 516096) [2026-04-08 08:01:21.278601 INFO buffer_manager] Allocated weights buffer at (142110720, 2016) [2026-04-08 08:01:21.278603 INFO buffer_manager] Allocated weights buffer at (142114816, 4128768) [2026-04-08 08:01:21.278604 INFO buffer_manager] Allocated weights buffer at (146243584, 516096) [2026-04-08 08:01:21.278606 INFO buffer_manager] Allocated weights buffer at (146759680, 2016) [2026-04-08 08:01:21.278607 INFO buffer_manager] Allocated weights buffer at (146763776, 0) [2026-04-08 08:01:21.278609 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init_layer_cached(layer_idx=2, cache_slot=2) planned desc only [2026-04-08 08:01:21.307106 INFO buffer_manager] Allocated weights buffer at (146763776, 0) [2026-04-08 08:01:21.307121 INFO buffer_manager] Allocated weights buffer at (146763776, 132120576) [2026-04-08 08:01:21.307123 INFO buffer_manager] Allocated weights buffer at (278884352, 57344) [2026-04-08 08:01:21.307125 INFO buffer_manager] Allocated weights buffer at (278941696, 132120576) [2026-04-08 08:01:21.307126 INFO buffer_manager] Allocated weights buffer at (411062272, 57344) [2026-04-08 08:01:21.307128 INFO buffer_manager] Allocated weights buffer at (411119616, 132120576) [2026-04-08 08:01:21.307129 INFO buffer_manager] Allocated weights buffer at (543240192, 57344) [2026-04-08 08:01:21.307131 INFO buffer_manager] Allocated weights buffer at (543297536, 0) [2026-04-08 08:01:21.307132 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=3, cache_slot=3) planned desc only [2026-04-08 08:01:21.343445 INFO buffer_manager] Allocated weights buffer at (543297536, 0) [2026-04-08 08:01:21.343459 INFO buffer_manager] Allocated weights buffer at (543297536, 132120576) [2026-04-08 08:01:21.343461 INFO buffer_manager] Allocated weights buffer at (675418112, 57344) [2026-04-08 08:01:21.343462 INFO buffer_manager] Allocated weights buffer at (675475456, 132120576) [2026-04-08 08:01:21.343464 INFO buffer_manager] Allocated weights buffer at (807596032, 57344) [2026-04-08 08:01:21.343465 INFO buffer_manager] Allocated weights buffer at (807653376, 132120576) [2026-04-08 08:01:21.343467 INFO buffer_manager] Allocated weights buffer at (939773952, 57344) [2026-04-08 08:01:21.343468 INFO buffer_manager] Allocated weights buffer at (939831296, 0) [2026-04-08 08:01:21.343470 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=4, cache_slot=4) planned desc only [2026-04-08 08:01:21.379728 INFO buffer_manager] Allocated weights buffer at (939831296, 0) [2026-04-08 08:01:21.379741 INFO buffer_manager] Allocated weights buffer at (939831296, 132120576) [2026-04-08 08:01:21.379743 INFO buffer_manager] Allocated weights buffer at (1071951872, 57344) [2026-04-08 08:01:21.379745 INFO buffer_manager] Allocated weights buffer at (1072009216, 132120576) [2026-04-08 08:01:21.379746 INFO buffer_manager] Allocated weights buffer at (1204129792, 57344) [2026-04-08 08:01:21.379748 INFO buffer_manager] Allocated weights buffer at (1204187136, 132120576) [2026-04-08 08:01:21.379749 INFO buffer_manager] Allocated weights buffer at (1336307712, 57344) [2026-04-08 08:01:21.379751 INFO buffer_manager] Allocated weights buffer at (1336365056, 0) [2026-04-08 08:01:21.379752 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=5, cache_slot=5) planned desc only [2026-04-08 08:01:21.415952 INFO buffer_manager] Allocated weights buffer at (1336365056, 0) [2026-04-08 08:01:21.415967 INFO buffer_manager] Allocated weights buffer at (1336365056, 132120576) [2026-04-08 08:01:21.415969 INFO buffer_manager] Allocated weights buffer at (1468485632, 57344) [2026-04-08 08:01:21.415971 INFO buffer_manager] Allocated weights buffer at (1468542976, 132120576) [2026-04-08 08:01:21.415972 INFO buffer_manager] Allocated weights buffer at (1600663552, 57344) [2026-04-08 08:01:21.415974 INFO buffer_manager] Allocated weights buffer at (1600720896, 132120576) [2026-04-08 08:01:21.415979 INFO buffer_manager] Allocated weights buffer at (1732841472, 57344) [2026-04-08 08:01:21.415981 INFO buffer_manager] Allocated weights buffer at (1732898816, 0) [2026-04-08 08:01:21.415983 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=6, cache_slot=6) planned desc only [2026-04-08 08:01:21.452137 INFO buffer_manager] Allocated weights buffer at (1732898816, 0) [2026-04-08 08:01:21.452151 INFO buffer_manager] Allocated weights buffer at (1732898816, 132120576) [2026-04-08 08:01:21.452153 INFO buffer_manager] Allocated weights buffer at (1865019392, 57344) [2026-04-08 08:01:21.452155 INFO buffer_manager] Allocated weights buffer at (1865076736, 132120576) [2026-04-08 08:01:21.452156 INFO buffer_manager] Allocated weights buffer at (1997197312, 57344) [2026-04-08 08:01:21.452158 INFO buffer_manager] Allocated weights buffer at (1997254656, 132120576) [2026-04-08 08:01:21.452159 INFO buffer_manager] Allocated weights buffer at (2129375232, 57344) [2026-04-08 08:01:21.452161 INFO buffer_manager] Allocated weights buffer at (2129432576, 0) [2026-04-08 08:01:21.452162 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=7, cache_slot=7) planned desc only [2026-04-08 08:01:21.488283 INFO buffer_manager] Allocated weights buffer at (2129432576, 0) [2026-04-08 08:01:21.488297 INFO buffer_manager] Allocated weights buffer at (2129432576, 132120576) [2026-04-08 08:01:21.488300 INFO buffer_manager] Allocated weights buffer at (2261553152, 57344) [2026-04-08 08:01:21.488301 INFO buffer_manager] Allocated weights buffer at (2261610496, 132120576) [2026-04-08 08:01:21.488303 INFO buffer_manager] Allocated weights buffer at (2393731072, 57344) [2026-04-08 08:01:21.488304 INFO buffer_manager] Allocated weights buffer at (2393788416, 132120576) [2026-04-08 08:01:21.488305 INFO buffer_manager] Allocated weights buffer at (2525908992, 57344) [2026-04-08 08:01:21.488307 INFO buffer_manager] Allocated weights buffer at (2525966336, 0) [2026-04-08 08:01:21.488309 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=8, cache_slot=8) planned desc only [2026-04-08 08:01:21.524409 INFO buffer_manager] Allocated weights buffer at (2525966336, 0) [2026-04-08 08:01:21.524423 INFO buffer_manager] Allocated weights buffer at (2525966336, 132120576) [2026-04-08 08:01:21.524426 INFO buffer_manager] Allocated weights buffer at (2658086912, 57344) [2026-04-08 08:01:21.524427 INFO buffer_manager] Allocated weights buffer at (2658144256, 132120576) [2026-04-08 08:01:21.524429 INFO buffer_manager] Allocated weights buffer at (2790264832, 57344) [2026-04-08 08:01:21.524430 INFO buffer_manager] Allocated weights buffer at (2790322176, 132120576) [2026-04-08 08:01:21.524432 INFO buffer_manager] Allocated weights buffer at (2922442752, 57344) [2026-04-08 08:01:21.524433 INFO buffer_manager] Allocated weights buffer at (2922500096, 0) [2026-04-08 08:01:21.524435 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=9, cache_slot=9) planned desc only [2026-04-08 08:01:21.560541 INFO buffer_manager] Allocated weights buffer at (2922500096, 0) [2026-04-08 08:01:21.560555 INFO buffer_manager] Allocated weights buffer at (2922500096, 132120576) [2026-04-08 08:01:21.560557 INFO buffer_manager] Allocated weights buffer at (3054620672, 57344) [2026-04-08 08:01:21.560558 INFO buffer_manager] Allocated weights buffer at (3054678016, 132120576) [2026-04-08 08:01:21.560560 INFO buffer_manager] Allocated weights buffer at (3186798592, 57344) [2026-04-08 08:01:21.560561 INFO buffer_manager] Allocated weights buffer at (3186855936, 132120576) [2026-04-08 08:01:21.560563 INFO buffer_manager] Allocated weights buffer at (3318976512, 57344) [2026-04-08 08:01:21.560564 INFO buffer_manager] Allocated weights buffer at (3319033856, 0) [2026-04-08 08:01:21.560566 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=10, cache_slot=10) planned desc only [2026-04-08 08:01:21.596727 INFO buffer_manager] Allocated weights buffer at (3319033856, 0) [2026-04-08 08:01:21.596742 INFO buffer_manager] Allocated weights buffer at (3319033856, 132120576) [2026-04-08 08:01:21.596748 INFO buffer_manager] Allocated weights buffer at (3451154432, 57344) [2026-04-08 08:01:21.596750 INFO buffer_manager] Allocated weights buffer at (3451211776, 132120576) [2026-04-08 08:01:21.596752 INFO buffer_manager] Allocated weights buffer at (3583332352, 57344) [2026-04-08 08:01:21.596753 INFO buffer_manager] Allocated weights buffer at (3583389696, 132120576) [2026-04-08 08:01:21.596755 INFO buffer_manager] Allocated weights buffer at (3715510272, 57344) [2026-04-08 08:01:21.596756 INFO buffer_manager] Allocated weights buffer at (3715567616, 0) [2026-04-08 08:01:21.596758 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=11, cache_slot=11) planned desc only [2026-04-08 08:01:21.632928 INFO buffer_manager] Allocated weights buffer at (3715567616, 0) [2026-04-08 08:01:21.632943 INFO buffer_manager] Allocated weights buffer at (3715567616, 132120576) [2026-04-08 08:01:21.632945 INFO buffer_manager] Allocated weights buffer at (3847688192, 57344) [2026-04-08 08:01:21.632946 INFO buffer_manager] Allocated weights buffer at (3847745536, 132120576) [2026-04-08 08:01:21.632948 INFO buffer_manager] Allocated weights buffer at (3979866112, 57344) [2026-04-08 08:01:21.632949 INFO buffer_manager] Allocated weights buffer at (3979923456, 132120576) [2026-04-08 08:01:21.632951 INFO buffer_manager] Allocated weights buffer at (4112044032, 57344) [2026-04-08 08:01:21.632952 INFO buffer_manager] Allocated weights buffer at (4112101376, 0) [2026-04-08 08:01:21.632954 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=12, cache_slot=12) planned desc only [2026-04-08 08:01:21.669272 INFO buffer_manager] Allocated weights buffer at (4112101376, 0) [2026-04-08 08:01:21.669287 INFO buffer_manager] Allocated weights buffer at (4112101376, 132120576) [2026-04-08 08:01:21.669289 INFO buffer_manager] Allocated weights buffer at (4244221952, 57344) [2026-04-08 08:01:21.669290 INFO buffer_manager] Allocated weights buffer at (4244279296, 132120576) [2026-04-08 08:01:21.669292 INFO buffer_manager] Allocated weights buffer at (4376399872, 57344) [2026-04-08 08:01:21.669293 INFO buffer_manager] Allocated weights buffer at (4376457216, 132120576) [2026-04-08 08:01:21.669295 INFO buffer_manager] Allocated weights buffer at (4508577792, 57344) [2026-04-08 08:01:21.669296 INFO buffer_manager] Allocated weights buffer at (4508635136, 0) [2026-04-08 08:01:21.669298 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=13, cache_slot=13) planned desc only [2026-04-08 08:01:21.705508 INFO buffer_manager] Allocated weights buffer at (4508635136, 0) [2026-04-08 08:01:21.705524 INFO buffer_manager] Allocated weights buffer at (4508635136, 132120576) [2026-04-08 08:01:21.705526 INFO buffer_manager] Allocated weights buffer at (4640755712, 57344) [2026-04-08 08:01:21.705527 INFO buffer_manager] Allocated weights buffer at (4640813056, 132120576) [2026-04-08 08:01:21.705529 INFO buffer_manager] Allocated weights buffer at (4772933632, 57344) [2026-04-08 08:01:21.705530 INFO buffer_manager] Allocated weights buffer at (4772990976, 132120576) [2026-04-08 08:01:21.705532 INFO buffer_manager] Allocated weights buffer at (4905111552, 57344) [2026-04-08 08:01:21.705533 INFO buffer_manager] Allocated weights buffer at (4905168896, 0) [2026-04-08 08:01:21.705535 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=14, cache_slot=14) planned desc only [2026-04-08 08:01:21.741821 INFO buffer_manager] Allocated weights buffer at (4905168896, 0) [2026-04-08 08:01:21.741838 INFO buffer_manager] Allocated weights buffer at (4905168896, 132120576) [2026-04-08 08:01:21.741840 INFO buffer_manager] Allocated weights buffer at (5037289472, 57344) [2026-04-08 08:01:21.741842 INFO buffer_manager] Allocated weights buffer at (5037346816, 132120576) [2026-04-08 08:01:21.741843 INFO buffer_manager] Allocated weights buffer at (5169467392, 57344) [2026-04-08 08:01:21.741845 INFO buffer_manager] Allocated weights buffer at (5169524736, 132120576) [2026-04-08 08:01:21.741846 INFO buffer_manager] Allocated weights buffer at (5301645312, 57344) [2026-04-08 08:01:21.741854 INFO buffer_manager] Allocated weights buffer at (5301702656, 0) [2026-04-08 08:01:21.741855 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=15, cache_slot=15) planned desc only [2026-04-08 08:01:21.778080 INFO buffer_manager] Allocated weights buffer at (5301702656, 0) [2026-04-08 08:01:21.778099 INFO buffer_manager] Allocated weights buffer at (5301702656, 132120576) [2026-04-08 08:01:21.778101 INFO buffer_manager] Allocated weights buffer at (5433823232, 57344) [2026-04-08 08:01:21.778103 INFO buffer_manager] Allocated weights buffer at (5433880576, 132120576) [2026-04-08 08:01:21.778104 INFO buffer_manager] Allocated weights buffer at (5566001152, 57344) [2026-04-08 08:01:21.778105 INFO buffer_manager] Allocated weights buffer at (5566058496, 132120576) [2026-04-08 08:01:21.778107 INFO buffer_manager] Allocated weights buffer at (5698179072, 57344) [2026-04-08 08:01:21.778108 INFO buffer_manager] Allocated weights buffer at (5698236416, 0) [2026-04-08 08:01:21.778110 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=16, cache_slot=16) planned desc only [2026-04-08 08:01:21.814364 INFO buffer_manager] Allocated weights buffer at (5698236416, 0) [2026-04-08 08:01:21.814378 INFO buffer_manager] Allocated weights buffer at (5698236416, 132120576) [2026-04-08 08:01:21.814380 INFO buffer_manager] Allocated weights buffer at (5830356992, 57344) [2026-04-08 08:01:21.814381 INFO buffer_manager] Allocated weights buffer at (5830414336, 132120576) [2026-04-08 08:01:21.814383 INFO buffer_manager] Allocated weights buffer at (5962534912, 57344) [2026-04-08 08:01:21.814384 INFO buffer_manager] Allocated weights buffer at (5962592256, 132120576) [2026-04-08 08:01:21.814386 INFO buffer_manager] Allocated weights buffer at (6094712832, 57344) [2026-04-08 08:01:21.814387 INFO buffer_manager] Allocated weights buffer at (6094770176, 0) [2026-04-08 08:01:21.814389 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=17, cache_slot=17) planned desc only [2026-04-08 08:01:21.850528 INFO buffer_manager] Allocated weights buffer at (6094770176, 0) [2026-04-08 08:01:21.850541 INFO buffer_manager] Allocated weights buffer at (6094770176, 132120576) [2026-04-08 08:01:21.850543 INFO buffer_manager] Allocated weights buffer at (6226890752, 57344) [2026-04-08 08:01:21.850544 INFO buffer_manager] Allocated weights buffer at (6226948096, 132120576) [2026-04-08 08:01:21.850546 INFO buffer_manager] Allocated weights buffer at (6359068672, 57344) [2026-04-08 08:01:21.850548 INFO buffer_manager] Allocated weights buffer at (6359126016, 132120576) [2026-04-08 08:01:21.850549 INFO buffer_manager] Allocated weights buffer at (6491246592, 57344) [2026-04-08 08:01:21.850550 INFO buffer_manager] Allocated weights buffer at (6491303936, 0) [2026-04-08 08:01:21.850552 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=18, cache_slot=18) planned desc only [2026-04-08 08:01:21.886676 INFO buffer_manager] Allocated weights buffer at (6491303936, 0) [2026-04-08 08:01:21.886690 INFO buffer_manager] Allocated weights buffer at (6491303936, 132120576) [2026-04-08 08:01:21.886692 INFO buffer_manager] Allocated weights buffer at (6623424512, 57344) [2026-04-08 08:01:21.886693 INFO buffer_manager] Allocated weights buffer at (6623481856, 132120576) [2026-04-08 08:01:21.886695 INFO buffer_manager] Allocated weights buffer at (6755602432, 57344) [2026-04-08 08:01:21.886696 INFO buffer_manager] Allocated weights buffer at (6755659776, 132120576) [2026-04-08 08:01:21.886698 INFO buffer_manager] Allocated weights buffer at (6887780352, 57344) [2026-04-08 08:01:21.886699 INFO buffer_manager] Allocated weights buffer at (6887837696, 0) [2026-04-08 08:01:21.886701 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=19, cache_slot=19) planned desc only [2026-04-08 08:01:21.922888 INFO buffer_manager] Allocated weights buffer at (6887837696, 0) [2026-04-08 08:01:21.922901 INFO buffer_manager] Allocated weights buffer at (6887837696, 132120576) [2026-04-08 08:01:21.922907 INFO buffer_manager] Allocated weights buffer at (7019958272, 57344) [2026-04-08 08:01:21.922909 INFO buffer_manager] Allocated weights buffer at (7020015616, 132120576) [2026-04-08 08:01:21.922910 INFO buffer_manager] Allocated weights buffer at (7152136192, 57344) [2026-04-08 08:01:21.922912 INFO buffer_manager] Allocated weights buffer at (7152193536, 132120576) [2026-04-08 08:01:21.922913 INFO buffer_manager] Allocated weights buffer at (7284314112, 57344) [2026-04-08 08:01:21.922915 INFO buffer_manager] Allocated weights buffer at (7284371456, 0) [2026-04-08 08:01:21.922916 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=20, cache_slot=20) planned desc only [2026-04-08 08:01:21.959014 INFO buffer_manager] Allocated weights buffer at (7284371456, 0) [2026-04-08 08:01:21.959029 INFO buffer_manager] Allocated weights buffer at (7284371456, 132120576) [2026-04-08 08:01:21.959031 INFO buffer_manager] Allocated weights buffer at (7416492032, 57344) [2026-04-08 08:01:21.959032 INFO buffer_manager] Allocated weights buffer at (7416549376, 132120576) [2026-04-08 08:01:21.959034 INFO buffer_manager] Allocated weights buffer at (7548669952, 57344) [2026-04-08 08:01:21.959035 INFO buffer_manager] Allocated weights buffer at (7548727296, 132120576) [2026-04-08 08:01:21.959037 INFO buffer_manager] Allocated weights buffer at (7680847872, 57344) [2026-04-08 08:01:21.959038 INFO buffer_manager] Allocated weights buffer at (7680905216, 0) [2026-04-08 08:01:21.959040 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=21, cache_slot=21) planned desc only [2026-04-08 08:01:21.995147 INFO buffer_manager] Allocated weights buffer at (7680905216, 0) [2026-04-08 08:01:21.995161 INFO buffer_manager] Allocated weights buffer at (7680905216, 132120576) [2026-04-08 08:01:21.995163 INFO buffer_manager] Allocated weights buffer at (7813025792, 57344) [2026-04-08 08:01:21.995164 INFO buffer_manager] Allocated weights buffer at (7813083136, 132120576) [2026-04-08 08:01:21.995166 INFO buffer_manager] Allocated weights buffer at (7945203712, 57344) [2026-04-08 08:01:21.995167 INFO buffer_manager] Allocated weights buffer at (7945261056, 132120576) [2026-04-08 08:01:21.995169 INFO buffer_manager] Allocated weights buffer at (8077381632, 57344) [2026-04-08 08:01:21.995170 INFO buffer_manager] Allocated weights buffer at (8077438976, 0) [2026-04-08 08:01:21.995172 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=22, cache_slot=22) planned desc only [2026-04-08 08:01:22.031380 INFO buffer_manager] Allocated weights buffer at (8077438976, 0) [2026-04-08 08:01:22.031393 INFO buffer_manager] Allocated weights buffer at (8077438976, 132120576) [2026-04-08 08:01:22.031395 INFO buffer_manager] Allocated weights buffer at (8209559552, 57344) [2026-04-08 08:01:22.031397 INFO buffer_manager] Allocated weights buffer at (8209616896, 132120576) [2026-04-08 08:01:22.031398 INFO buffer_manager] Allocated weights buffer at (8341737472, 57344) [2026-04-08 08:01:22.031400 INFO buffer_manager] Allocated weights buffer at (8341794816, 132120576) [2026-04-08 08:01:22.031401 INFO buffer_manager] Allocated weights buffer at (8473915392, 57344) [2026-04-08 08:01:22.031403 INFO buffer_manager] Allocated weights buffer at (8473972736, 0) [2026-04-08 08:01:22.031405 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=23, cache_slot=23) planned desc only [2026-04-08 08:01:22.067706 INFO buffer_manager] Allocated weights buffer at (8473972736, 0) [2026-04-08 08:01:22.067720 INFO buffer_manager] Allocated weights buffer at (8473972736, 132120576) [2026-04-08 08:01:22.067722 INFO buffer_manager] Allocated weights buffer at (8606093312, 57344) [2026-04-08 08:01:22.067724 INFO buffer_manager] Allocated weights buffer at (8606150656, 132120576) [2026-04-08 08:01:22.067725 INFO buffer_manager] Allocated weights buffer at (8738271232, 57344) [2026-04-08 08:01:22.067727 INFO buffer_manager] Allocated weights buffer at (8738328576, 132120576) [2026-04-08 08:01:22.067728 INFO buffer_manager] Allocated weights buffer at (8870449152, 57344) [2026-04-08 08:01:22.067733 INFO buffer_manager] Allocated weights buffer at (8870506496, 0) [2026-04-08 08:01:22.067736 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=24, cache_slot=24) planned desc only [2026-04-08 08:01:22.103838 INFO buffer_manager] Allocated weights buffer at (8870506496, 0) [2026-04-08 08:01:22.103852 INFO buffer_manager] Allocated weights buffer at (8870506496, 132120576) [2026-04-08 08:01:22.103854 INFO buffer_manager] Allocated weights buffer at (9002627072, 57344) [2026-04-08 08:01:22.103856 INFO buffer_manager] Allocated weights buffer at (9002684416, 132120576) [2026-04-08 08:01:22.103857 INFO buffer_manager] Allocated weights buffer at (9134804992, 57344) [2026-04-08 08:01:22.103859 INFO buffer_manager] Allocated weights buffer at (9134862336, 132120576) [2026-04-08 08:01:22.103860 INFO buffer_manager] Allocated weights buffer at (9266982912, 57344) [2026-04-08 08:01:22.103862 INFO buffer_manager] Allocated weights buffer at (9267040256, 0) [2026-04-08 08:01:22.103863 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=25, cache_slot=25) planned desc only [2026-04-08 08:01:22.139982 INFO buffer_manager] Allocated weights buffer at (9267040256, 0) [2026-04-08 08:01:22.139997 INFO buffer_manager] Allocated weights buffer at (9267040256, 132120576) [2026-04-08 08:01:22.139999 INFO buffer_manager] Allocated weights buffer at (9399160832, 57344) [2026-04-08 08:01:22.140000 INFO buffer_manager] Allocated weights buffer at (9399218176, 132120576) [2026-04-08 08:01:22.140002 INFO buffer_manager] Allocated weights buffer at (9531338752, 57344) [2026-04-08 08:01:22.140003 INFO buffer_manager] Allocated weights buffer at (9531396096, 132120576) [2026-04-08 08:01:22.140005 INFO buffer_manager] Allocated weights buffer at (9663516672, 57344) [2026-04-08 08:01:22.140006 INFO buffer_manager] Allocated weights buffer at (9663574016, 0) [2026-04-08 08:01:22.140008 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=26, cache_slot=26) planned desc only [2026-04-08 08:01:22.176305 INFO buffer_manager] Allocated weights buffer at (9663574016, 0) [2026-04-08 08:01:22.176318 INFO buffer_manager] Allocated weights buffer at (9663574016, 132120576) [2026-04-08 08:01:22.176320 INFO buffer_manager] Allocated weights buffer at (9795694592, 57344) [2026-04-08 08:01:22.176322 INFO buffer_manager] Allocated weights buffer at (9795751936, 132120576) [2026-04-08 08:01:22.176323 INFO buffer_manager] Allocated weights buffer at (9927872512, 57344) [2026-04-08 08:01:22.176325 INFO buffer_manager] Allocated weights buffer at (9927929856, 132120576) [2026-04-08 08:01:22.176326 INFO buffer_manager] Allocated weights buffer at (10060050432, 57344) [2026-04-08 08:01:22.176328 INFO buffer_manager] Allocated weights buffer at (10060107776, 0) [2026-04-08 08:01:22.176329 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=27, cache_slot=27) planned desc only [2026-04-08 08:01:22.212502 INFO buffer_manager] Allocated weights buffer at (10060107776, 0) [2026-04-08 08:01:22.212516 INFO buffer_manager] Allocated weights buffer at (10060107776, 132120576) [2026-04-08 08:01:22.212518 INFO buffer_manager] Allocated weights buffer at (10192228352, 57344) [2026-04-08 08:01:22.212519 INFO buffer_manager] Allocated weights buffer at (10192285696, 132120576) [2026-04-08 08:01:22.212521 INFO buffer_manager] Allocated weights buffer at (10324406272, 57344) [2026-04-08 08:01:22.212522 INFO buffer_manager] Allocated weights buffer at (10324463616, 132120576) [2026-04-08 08:01:22.212524 INFO buffer_manager] Allocated weights buffer at (10456584192, 57344) [2026-04-08 08:01:22.212525 INFO buffer_manager] Allocated weights buffer at (10456641536, 0) [2026-04-08 08:01:22.212527 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=28, cache_slot=28) planned desc only [2026-04-08 08:01:22.248612 INFO buffer_manager] Allocated weights buffer at (10456641536, 0) [2026-04-08 08:01:22.248625 INFO buffer_manager] Allocated weights buffer at (10456641536, 132120576) [2026-04-08 08:01:22.248632 INFO buffer_manager] Allocated weights buffer at (10588762112, 57344) [2026-04-08 08:01:22.248634 INFO buffer_manager] Allocated weights buffer at (10588819456, 132120576) [2026-04-08 08:01:22.248635 INFO buffer_manager] Allocated weights buffer at (10720940032, 57344) [2026-04-08 08:01:22.248637 INFO buffer_manager] Allocated weights buffer at (10720997376, 132120576) [2026-04-08 08:01:22.248638 INFO buffer_manager] Allocated weights buffer at (10853117952, 57344) [2026-04-08 08:01:22.248640 INFO buffer_manager] Allocated weights buffer at (10853175296, 0) [2026-04-08 08:01:22.248642 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=29, cache_slot=29) planned desc only [2026-04-08 08:01:22.284733 INFO buffer_manager] Allocated weights buffer at (10853175296, 0) [2026-04-08 08:01:22.284746 INFO buffer_manager] Allocated weights buffer at (10853175296, 132120576) [2026-04-08 08:01:22.284748 INFO buffer_manager] Allocated weights buffer at (10985295872, 57344) [2026-04-08 08:01:22.284749 INFO buffer_manager] Allocated weights buffer at (10985353216, 132120576) [2026-04-08 08:01:22.284751 INFO buffer_manager] Allocated weights buffer at (11117473792, 57344) [2026-04-08 08:01:22.284752 INFO buffer_manager] Allocated weights buffer at (11117531136, 132120576) [2026-04-08 08:01:22.284754 INFO buffer_manager] Allocated weights buffer at (11249651712, 57344) [2026-04-08 08:01:22.284755 INFO buffer_manager] Allocated weights buffer at (11249709056, 0) [2026-04-08 08:01:22.284757 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=30, cache_slot=30) planned desc only [2026-04-08 08:01:22.320853 INFO buffer_manager] Allocated weights buffer at (11249709056, 0) [2026-04-08 08:01:22.320866 INFO buffer_manager] Allocated weights buffer at (11249709056, 132120576) [2026-04-08 08:01:22.320868 INFO buffer_manager] Allocated weights buffer at (11381829632, 57344) [2026-04-08 08:01:22.320869 INFO buffer_manager] Allocated weights buffer at (11381886976, 132120576) [2026-04-08 08:01:22.320871 INFO buffer_manager] Allocated weights buffer at (11514007552, 57344) [2026-04-08 08:01:22.320872 INFO buffer_manager] Allocated weights buffer at (11514064896, 132120576) [2026-04-08 08:01:22.320874 INFO buffer_manager] Allocated weights buffer at (11646185472, 57344) [2026-04-08 08:01:22.320875 INFO buffer_manager] Allocated weights buffer at (11646242816, 0) [2026-04-08 08:01:22.320877 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=31, cache_slot=31) planned desc only [2026-04-08 08:01:22.357032 INFO buffer_manager] Allocated weights buffer at (11646242816, 0) [2026-04-08 08:01:22.357045 INFO buffer_manager] Allocated weights buffer at (11646242816, 132120576) [2026-04-08 08:01:22.357047 INFO buffer_manager] Allocated weights buffer at (11778363392, 57344) [2026-04-08 08:01:22.357048 INFO buffer_manager] Allocated weights buffer at (11778420736, 132120576) [2026-04-08 08:01:22.357050 INFO buffer_manager] Allocated weights buffer at (11910541312, 57344) [2026-04-08 08:01:22.357051 INFO buffer_manager] Allocated weights buffer at (11910598656, 132120576) [2026-04-08 08:01:22.357053 INFO buffer_manager] Allocated weights buffer at (12042719232, 57344) [2026-04-08 08:01:22.357054 INFO buffer_manager] Allocated weights buffer at (12042776576, 0) [2026-04-08 08:01:22.357056 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=32, cache_slot=32) planned desc only [2026-04-08 08:01:22.393132 INFO buffer_manager] Allocated weights buffer at (12042776576, 0) [2026-04-08 08:01:22.393145 INFO buffer_manager] Allocated weights buffer at (12042776576, 132120576) [2026-04-08 08:01:22.393147 INFO buffer_manager] Allocated weights buffer at (12174897152, 57344) [2026-04-08 08:01:22.393149 INFO buffer_manager] Allocated weights buffer at (12174954496, 132120576) [2026-04-08 08:01:22.393151 INFO buffer_manager] Allocated weights buffer at (12307075072, 57344) [2026-04-08 08:01:22.393152 INFO buffer_manager] Allocated weights buffer at (12307132416, 132120576) [2026-04-08 08:01:22.393153 INFO buffer_manager] Allocated weights buffer at (12439252992, 57344) [2026-04-08 08:01:22.393159 INFO buffer_manager] Allocated weights buffer at (12439310336, 0) [2026-04-08 08:01:22.393161 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=33, cache_slot=33) planned desc only [2026-04-08 08:01:22.429230 INFO buffer_manager] Allocated weights buffer at (12439310336, 0) [2026-04-08 08:01:22.429243 INFO buffer_manager] Allocated weights buffer at (12439310336, 132120576) [2026-04-08 08:01:22.429245 INFO buffer_manager] Allocated weights buffer at (12571430912, 57344) [2026-04-08 08:01:22.429247 INFO buffer_manager] Allocated weights buffer at (12571488256, 132120576) [2026-04-08 08:01:22.429248 INFO buffer_manager] Allocated weights buffer at (12703608832, 57344) [2026-04-08 08:01:22.429250 INFO buffer_manager] Allocated weights buffer at (12703666176, 132120576) [2026-04-08 08:01:22.429251 INFO buffer_manager] Allocated weights buffer at (12835786752, 57344) [2026-04-08 08:01:22.429253 INFO buffer_manager] Allocated weights buffer at (12835844096, 0) [2026-04-08 08:01:22.429254 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=34, cache_slot=34) planned desc only [2026-04-08 08:01:22.465540 INFO buffer_manager] Allocated weights buffer at (12835844096, 0) [2026-04-08 08:01:22.465553 INFO buffer_manager] Allocated weights buffer at (12835844096, 132120576) [2026-04-08 08:01:22.465555 INFO buffer_manager] Allocated weights buffer at (12967964672, 57344) [2026-04-08 08:01:22.465557 INFO buffer_manager] Allocated weights buffer at (12968022016, 132120576) [2026-04-08 08:01:22.465558 INFO buffer_manager] Allocated weights buffer at (13100142592, 57344) [2026-04-08 08:01:22.465560 INFO buffer_manager] Allocated weights buffer at (13100199936, 132120576) [2026-04-08 08:01:22.465561 INFO buffer_manager] Allocated weights buffer at (13232320512, 57344) [2026-04-08 08:01:22.465563 INFO buffer_manager] Allocated weights buffer at (13232377856, 0) [2026-04-08 08:01:22.465564 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=35, cache_slot=35) planned desc only [2026-04-08 08:01:22.501872 INFO buffer_manager] Allocated weights buffer at (13232377856, 0) [2026-04-08 08:01:22.501885 INFO buffer_manager] Allocated weights buffer at (13232377856, 132120576) [2026-04-08 08:01:22.501887 INFO buffer_manager] Allocated weights buffer at (13364498432, 57344) [2026-04-08 08:01:22.501889 INFO buffer_manager] Allocated weights buffer at (13364555776, 132120576) [2026-04-08 08:01:22.501890 INFO buffer_manager] Allocated weights buffer at (13496676352, 57344) [2026-04-08 08:01:22.501892 INFO buffer_manager] Allocated weights buffer at (13496733696, 132120576) [2026-04-08 08:01:22.501893 INFO buffer_manager] Allocated weights buffer at (13628854272, 57344) [2026-04-08 08:01:22.501895 INFO buffer_manager] Allocated weights buffer at (13628911616, 0) [2026-04-08 08:01:22.501896 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=36, cache_slot=36) planned desc only [2026-04-08 08:01:22.537999 INFO buffer_manager] Allocated weights buffer at (13628911616, 0) [2026-04-08 08:01:22.538012 INFO buffer_manager] Allocated weights buffer at (13628911616, 132120576) [2026-04-08 08:01:22.538014 INFO buffer_manager] Allocated weights buffer at (13761032192, 57344) [2026-04-08 08:01:22.538016 INFO buffer_manager] Allocated weights buffer at (13761089536, 132120576) [2026-04-08 08:01:22.538017 INFO buffer_manager] Allocated weights buffer at (13893210112, 57344) [2026-04-08 08:01:22.538019 INFO buffer_manager] Allocated weights buffer at (13893267456, 132120576) [2026-04-08 08:01:22.538020 INFO buffer_manager] Allocated weights buffer at (14025388032, 57344) [2026-04-08 08:01:22.538022 INFO buffer_manager] Allocated weights buffer at (14025445376, 0) [2026-04-08 08:01:22.538024 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=37, cache_slot=37) planned desc only [2026-04-08 08:01:22.574050 INFO buffer_manager] Allocated weights buffer at (14025445376, 0) [2026-04-08 08:01:22.574063 INFO buffer_manager] Allocated weights buffer at (14025445376, 132120576) [2026-04-08 08:01:22.574069 INFO buffer_manager] Allocated weights buffer at (14157565952, 57344) [2026-04-08 08:01:22.574070 INFO buffer_manager] Allocated weights buffer at (14157623296, 132120576) [2026-04-08 08:01:22.574072 INFO buffer_manager] Allocated weights buffer at (14289743872, 57344) [2026-04-08 08:01:22.574074 INFO buffer_manager] Allocated weights buffer at (14289801216, 132120576) [2026-04-08 08:01:22.574075 INFO buffer_manager] Allocated weights buffer at (14421921792, 57344) [2026-04-08 08:01:22.574077 INFO buffer_manager] Allocated weights buffer at (14421979136, 0) [2026-04-08 08:01:22.574079 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=38, cache_slot=38) planned desc only [2026-04-08 08:01:22.610300 INFO buffer_manager] Allocated weights buffer at (14421979136, 0) [2026-04-08 08:01:22.610314 INFO buffer_manager] Allocated weights buffer at (14421979136, 132120576) [2026-04-08 08:01:22.610316 INFO buffer_manager] Allocated weights buffer at (14554099712, 57344) [2026-04-08 08:01:22.610317 INFO buffer_manager] Allocated weights buffer at (14554157056, 132120576) [2026-04-08 08:01:22.610319 INFO buffer_manager] Allocated weights buffer at (14686277632, 57344) [2026-04-08 08:01:22.610320 INFO buffer_manager] Allocated weights buffer at (14686334976, 132120576) [2026-04-08 08:01:22.610322 INFO buffer_manager] Allocated weights buffer at (14818455552, 57344) [2026-04-08 08:01:22.610323 INFO buffer_manager] Allocated weights buffer at (14818512896, 0) [2026-04-08 08:01:22.610325 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=39, cache_slot=39) planned desc only [2026-04-08 08:01:22.646445 INFO buffer_manager] Allocated weights buffer at (14818512896, 0) [2026-04-08 08:01:22.646459 INFO buffer_manager] Allocated weights buffer at (14818512896, 132120576) [2026-04-08 08:01:22.646461 INFO buffer_manager] Allocated weights buffer at (14950633472, 57344) [2026-04-08 08:01:22.646463 INFO buffer_manager] Allocated weights buffer at (14950690816, 132120576) [2026-04-08 08:01:22.646464 INFO buffer_manager] Allocated weights buffer at (15082811392, 57344) [2026-04-08 08:01:22.646466 INFO buffer_manager] Allocated weights buffer at (15082868736, 132120576) [2026-04-08 08:01:22.646467 INFO buffer_manager] Allocated weights buffer at (15214989312, 57344) [2026-04-08 08:01:22.646469 INFO buffer_manager] Allocated weights buffer at (15215046656, 0) [2026-04-08 08:01:22.646470 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=40, cache_slot=40) planned desc only [2026-04-08 08:01:22.682590 INFO buffer_manager] Allocated weights buffer at (15215046656, 0) [2026-04-08 08:01:22.682605 INFO buffer_manager] Allocated weights buffer at (15215046656, 132120576) [2026-04-08 08:01:22.682607 INFO buffer_manager] Allocated weights buffer at (15347167232, 57344) [2026-04-08 08:01:22.682608 INFO buffer_manager] Allocated weights buffer at (15347224576, 132120576) [2026-04-08 08:01:22.682610 INFO buffer_manager] Allocated weights buffer at (15479345152, 57344) [2026-04-08 08:01:22.682611 INFO buffer_manager] Allocated weights buffer at (15479402496, 132120576) [2026-04-08 08:01:22.682613 INFO buffer_manager] Allocated weights buffer at (15611523072, 57344) [2026-04-08 08:01:22.682615 INFO buffer_manager] Allocated weights buffer at (15611580416, 0) [2026-04-08 08:01:22.682616 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=41, cache_slot=41) planned desc only [2026-04-08 08:01:22.718670 INFO buffer_manager] Allocated weights buffer at (15611580416, 0) [2026-04-08 08:01:22.718683 INFO buffer_manager] Allocated weights buffer at (15611580416, 132120576) [2026-04-08 08:01:22.718685 INFO buffer_manager] Allocated weights buffer at (15743700992, 57344) [2026-04-08 08:01:22.718687 INFO buffer_manager] Allocated weights buffer at (15743758336, 132120576) [2026-04-08 08:01:22.718689 INFO buffer_manager] Allocated weights buffer at (15875878912, 57344) [2026-04-08 08:01:22.718690 INFO buffer_manager] Allocated weights buffer at (15875936256, 132120576) [2026-04-08 08:01:22.718692 INFO buffer_manager] Allocated weights buffer at (16008056832, 57344) [2026-04-08 08:01:22.718699 INFO buffer_manager] Allocated weights buffer at (16008114176, 0) [2026-04-08 08:01:22.718701 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=42, cache_slot=42) planned desc only [2026-04-08 08:01:22.754876 INFO buffer_manager] Allocated weights buffer at (16008114176, 0) [2026-04-08 08:01:22.754889 INFO buffer_manager] Allocated weights buffer at (16008114176, 132120576) [2026-04-08 08:01:22.754891 INFO buffer_manager] Allocated weights buffer at (16140234752, 57344) [2026-04-08 08:01:22.754893 INFO buffer_manager] Allocated weights buffer at (16140292096, 132120576) [2026-04-08 08:01:22.754894 INFO buffer_manager] Allocated weights buffer at (16272412672, 57344) [2026-04-08 08:01:22.754896 INFO buffer_manager] Allocated weights buffer at (16272470016, 132120576) [2026-04-08 08:01:22.754898 INFO buffer_manager] Allocated weights buffer at (16404590592, 57344) [2026-04-08 08:01:22.754899 INFO buffer_manager] Allocated weights buffer at (16404647936, 0) [2026-04-08 08:01:22.754901 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=43, cache_slot=43) planned desc only [2026-04-08 08:01:22.791054 INFO buffer_manager] Allocated weights buffer at (16404647936, 0) [2026-04-08 08:01:22.791072 INFO buffer_manager] Allocated weights buffer at (16404647936, 132120576) [2026-04-08 08:01:22.791075 INFO buffer_manager] Allocated weights buffer at (16536768512, 57344) [2026-04-08 08:01:22.791076 INFO buffer_manager] Allocated weights buffer at (16536825856, 132120576) [2026-04-08 08:01:22.791078 INFO buffer_manager] Allocated weights buffer at (16668946432, 57344) [2026-04-08 08:01:22.791079 INFO buffer_manager] Allocated weights buffer at (16669003776, 132120576) [2026-04-08 08:01:22.791081 INFO buffer_manager] Allocated weights buffer at (16801124352, 57344) [2026-04-08 08:01:22.791082 INFO buffer_manager] Allocated weights buffer at (16801181696, 0) [2026-04-08 08:01:22.791084 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=44, cache_slot=44) planned desc only [2026-04-08 08:01:22.827104 INFO buffer_manager] Allocated weights buffer at (16801181696, 0) [2026-04-08 08:01:22.827117 INFO buffer_manager] Allocated weights buffer at (16801181696, 132120576) [2026-04-08 08:01:22.827119 INFO buffer_manager] Allocated weights buffer at (16933302272, 57344) [2026-04-08 08:01:22.827121 INFO buffer_manager] Allocated weights buffer at (16933359616, 132120576) [2026-04-08 08:01:22.827122 INFO buffer_manager] Allocated weights buffer at (17065480192, 57344) [2026-04-08 08:01:22.827124 INFO buffer_manager] Allocated weights buffer at (17065537536, 132120576) [2026-04-08 08:01:22.827125 INFO buffer_manager] Allocated weights buffer at (17197658112, 57344) [2026-04-08 08:01:22.827127 INFO buffer_manager] Allocated weights buffer at (17197715456, 0) [2026-04-08 08:01:22.827128 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=45, cache_slot=45) planned desc only [2026-04-08 08:01:22.863141 INFO buffer_manager] Allocated weights buffer at (17197715456, 0) [2026-04-08 08:01:22.863153 INFO buffer_manager] Allocated weights buffer at (17197715456, 132120576) [2026-04-08 08:01:22.863155 INFO buffer_manager] Allocated weights buffer at (17329836032, 57344) [2026-04-08 08:01:22.863157 INFO buffer_manager] Allocated weights buffer at (17329893376, 132120576) [2026-04-08 08:01:22.863158 INFO buffer_manager] Allocated weights buffer at (17462013952, 57344) [2026-04-08 08:01:22.863160 INFO buffer_manager] Allocated weights buffer at (17462071296, 132120576) [2026-04-08 08:01:22.863161 INFO buffer_manager] Allocated weights buffer at (17594191872, 57344) [2026-04-08 08:01:22.863163 INFO buffer_manager] Allocated weights buffer at (17594249216, 0) [2026-04-08 08:01:22.863164 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=46, cache_slot=46) planned desc only [2026-04-08 08:01:22.899304 INFO buffer_manager] Allocated weights buffer at (17594249216, 0) [2026-04-08 08:01:22.899317 INFO buffer_manager] Allocated weights buffer at (17594249216, 132120576) [2026-04-08 08:01:22.899323 INFO buffer_manager] Allocated weights buffer at (17726369792, 57344) [2026-04-08 08:01:22.899324 INFO buffer_manager] Allocated weights buffer at (17726427136, 132120576) [2026-04-08 08:01:22.899326 INFO buffer_manager] Allocated weights buffer at (17858547712, 57344) [2026-04-08 08:01:22.899327 INFO buffer_manager] Allocated weights buffer at (17858605056, 132120576) [2026-04-08 08:01:22.899329 INFO buffer_manager] Allocated weights buffer at (17990725632, 57344) [2026-04-08 08:01:22.899330 INFO buffer_manager] Allocated weights buffer at (17990782976, 0) [2026-04-08 08:01:22.899332 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=47, cache_slot=47) planned desc only [2026-04-08 08:01:22.935374 INFO buffer_manager] Allocated weights buffer at (17990782976, 0) [2026-04-08 08:01:22.935387 INFO buffer_manager] Allocated weights buffer at (17990782976, 132120576) [2026-04-08 08:01:22.935389 INFO buffer_manager] Allocated weights buffer at (18122903552, 57344) [2026-04-08 08:01:22.935390 INFO buffer_manager] Allocated weights buffer at (18122960896, 132120576) [2026-04-08 08:01:22.935392 INFO buffer_manager] Allocated weights buffer at (18255081472, 57344) [2026-04-08 08:01:22.935393 INFO buffer_manager] Allocated weights buffer at (18255138816, 132120576) [2026-04-08 08:01:22.935395 INFO buffer_manager] Allocated weights buffer at (18387259392, 57344) [2026-04-08 08:01:22.935396 INFO buffer_manager] Allocated weights buffer at (18387316736, 0) [2026-04-08 08:01:22.935398 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=48, cache_slot=48) planned desc only [2026-04-08 08:01:22.971436 INFO buffer_manager] Allocated weights buffer at (18387316736, 0) [2026-04-08 08:01:22.971449 INFO buffer_manager] Allocated weights buffer at (18387316736, 132120576) [2026-04-08 08:01:22.971451 INFO buffer_manager] Allocated weights buffer at (18519437312, 57344) [2026-04-08 08:01:22.971453 INFO buffer_manager] Allocated weights buffer at (18519494656, 132120576) [2026-04-08 08:01:22.971454 INFO buffer_manager] Allocated weights buffer at (18651615232, 57344) [2026-04-08 08:01:22.971456 INFO buffer_manager] Allocated weights buffer at (18651672576, 132120576) [2026-04-08 08:01:22.971457 INFO buffer_manager] Allocated weights buffer at (18783793152, 57344) [2026-04-08 08:01:22.971459 INFO buffer_manager] Allocated weights buffer at (18783850496, 0) [2026-04-08 08:01:22.971460 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=49, cache_slot=49) planned desc only [2026-04-08 08:01:23.007479 INFO buffer_manager] Allocated weights buffer at (18783850496, 0) [2026-04-08 08:01:23.007491 INFO buffer_manager] Allocated weights buffer at (18783850496, 132120576) [2026-04-08 08:01:23.007493 INFO buffer_manager] Allocated weights buffer at (18915971072, 57344) [2026-04-08 08:01:23.007495 INFO buffer_manager] Allocated weights buffer at (18916028416, 132120576) [2026-04-08 08:01:23.007497 INFO buffer_manager] Allocated weights buffer at (19048148992, 57344) [2026-04-08 08:01:23.007498 INFO buffer_manager] Allocated weights buffer at (19048206336, 132120576) [2026-04-08 08:01:23.007500 INFO buffer_manager] Allocated weights buffer at (19180326912, 57344) [2026-04-08 08:01:23.007501 INFO buffer_manager] Allocated weights buffer at (19180384256, 0) [2026-04-08 08:01:23.007503 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=50, cache_slot=50) planned desc only [2026-04-08 08:01:23.043553 INFO buffer_manager] Allocated weights buffer at (19180384256, 0) [2026-04-08 08:01:23.043565 INFO buffer_manager] Allocated weights buffer at (19180384256, 132120576) [2026-04-08 08:01:23.043567 INFO buffer_manager] Allocated weights buffer at (19312504832, 57344) [2026-04-08 08:01:23.043569 INFO buffer_manager] Allocated weights buffer at (19312562176, 132120576) [2026-04-08 08:01:23.043570 INFO buffer_manager] Allocated weights buffer at (19444682752, 57344) [2026-04-08 08:01:23.043572 INFO buffer_manager] Allocated weights buffer at (19444740096, 132120576) [2026-04-08 08:01:23.043578 INFO buffer_manager] Allocated weights buffer at (19576860672, 57344) [2026-04-08 08:01:23.043579 INFO buffer_manager] Allocated weights buffer at (19576918016, 0) [2026-04-08 08:01:23.043581 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=51, cache_slot=51) planned desc only [2026-04-08 08:01:23.079658 INFO buffer_manager] Allocated weights buffer at (19576918016, 0) [2026-04-08 08:01:23.079671 INFO buffer_manager] Allocated weights buffer at (19576918016, 132120576) [2026-04-08 08:01:23.079673 INFO buffer_manager] Allocated weights buffer at (19709038592, 57344) [2026-04-08 08:01:23.079674 INFO buffer_manager] Allocated weights buffer at (19709095936, 132120576) [2026-04-08 08:01:23.079676 INFO buffer_manager] Allocated weights buffer at (19841216512, 57344) [2026-04-08 08:01:23.079677 INFO buffer_manager] Allocated weights buffer at (19841273856, 132120576) [2026-04-08 08:01:23.079679 INFO buffer_manager] Allocated weights buffer at (19973394432, 57344) [2026-04-08 08:01:23.079680 INFO buffer_manager] Allocated weights buffer at (19973451776, 0) [2026-04-08 08:01:23.079682 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=52, cache_slot=52) planned desc only [2026-04-08 08:01:23.115964 INFO buffer_manager] Allocated weights buffer at (19973451776, 0) [2026-04-08 08:01:23.115976 INFO buffer_manager] Allocated weights buffer at (19973451776, 132120576) [2026-04-08 08:01:23.115978 INFO buffer_manager] Allocated weights buffer at (20105572352, 57344) [2026-04-08 08:01:23.115980 INFO buffer_manager] Allocated weights buffer at (20105629696, 132120576) [2026-04-08 08:01:23.115982 INFO buffer_manager] Allocated weights buffer at (20237750272, 57344) [2026-04-08 08:01:23.115983 INFO buffer_manager] Allocated weights buffer at (20237807616, 132120576) [2026-04-08 08:01:23.115985 INFO buffer_manager] Allocated weights buffer at (20369928192, 57344) [2026-04-08 08:01:23.115986 INFO buffer_manager] Allocated weights buffer at (20369985536, 0) [2026-04-08 08:01:23.115988 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=53, cache_slot=53) planned desc only [2026-04-08 08:01:23.152191 INFO buffer_manager] Allocated weights buffer at (20369985536, 0) [2026-04-08 08:01:23.152204 INFO buffer_manager] Allocated weights buffer at (20369985536, 132120576) [2026-04-08 08:01:23.152206 INFO buffer_manager] Allocated weights buffer at (20502106112, 57344) [2026-04-08 08:01:23.152208 INFO buffer_manager] Allocated weights buffer at (20502163456, 132120576) [2026-04-08 08:01:23.152209 INFO buffer_manager] Allocated weights buffer at (20634284032, 57344) [2026-04-08 08:01:23.152211 INFO buffer_manager] Allocated weights buffer at (20634341376, 132120576) [2026-04-08 08:01:23.152212 INFO buffer_manager] Allocated weights buffer at (20766461952, 57344) [2026-04-08 08:01:23.152214 INFO buffer_manager] Allocated weights buffer at (20766519296, 0) [2026-04-08 08:01:23.152215 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=54, cache_slot=54) planned desc only [2026-04-08 08:01:23.188682 INFO buffer_manager] Allocated weights buffer at (20766519296, 0) [2026-04-08 08:01:23.188695 INFO buffer_manager] Allocated weights buffer at (20766519296, 132120576) [2026-04-08 08:01:23.188697 INFO buffer_manager] Allocated weights buffer at (20898639872, 57344) [2026-04-08 08:01:23.188699 INFO buffer_manager] Allocated weights buffer at (20898697216, 132120576) [2026-04-08 08:01:23.188700 INFO buffer_manager] Allocated weights buffer at (21030817792, 57344) [2026-04-08 08:01:23.188702 INFO buffer_manager] Allocated weights buffer at (21030875136, 132120576) [2026-04-08 08:01:23.188703 INFO buffer_manager] Allocated weights buffer at (21162995712, 57344) [2026-04-08 08:01:23.188705 INFO buffer_manager] Allocated weights buffer at (21163053056, 0) [2026-04-08 08:01:23.188706 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=55, cache_slot=55) planned desc only [2026-04-08 08:01:23.225080 INFO buffer_manager] Allocated weights buffer at (21163053056, 0) [2026-04-08 08:01:23.225098 INFO buffer_manager] Allocated weights buffer at (21163053056, 132120576) [2026-04-08 08:01:23.225100 INFO buffer_manager] Allocated weights buffer at (21295173632, 57344) [2026-04-08 08:01:23.225102 INFO buffer_manager] Allocated weights buffer at (21295230976, 132120576) [2026-04-08 08:01:23.225104 INFO buffer_manager] Allocated weights buffer at (21427351552, 57344) [2026-04-08 08:01:23.225106 INFO buffer_manager] Allocated weights buffer at (21427408896, 132120576) [2026-04-08 08:01:23.225108 INFO buffer_manager] Allocated weights buffer at (21559529472, 57344) [2026-04-08 08:01:23.225110 INFO buffer_manager] Allocated weights buffer at (21559586816, 0) [2026-04-08 08:01:23.225112 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=56, cache_slot=56) planned desc only [2026-04-08 08:01:23.261402 INFO buffer_manager] Allocated weights buffer at (21559586816, 0) [2026-04-08 08:01:23.261418 INFO buffer_manager] Allocated weights buffer at (21559586816, 132120576) [2026-04-08 08:01:23.261420 INFO buffer_manager] Allocated weights buffer at (21691707392, 57344) [2026-04-08 08:01:23.261422 INFO buffer_manager] Allocated weights buffer at (21691764736, 132120576) [2026-04-08 08:01:23.261423 INFO buffer_manager] Allocated weights buffer at (21823885312, 57344) [2026-04-08 08:01:23.261425 INFO buffer_manager] Allocated weights buffer at (21823942656, 132120576) [2026-04-08 08:01:23.261426 INFO buffer_manager] Allocated weights buffer at (21956063232, 57344) [2026-04-08 08:01:23.261428 INFO buffer_manager] Allocated weights buffer at (21956120576, 0) [2026-04-08 08:01:23.261430 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=57, cache_slot=57) planned desc only [2026-04-08 08:01:23.297638 INFO buffer_manager] Allocated weights buffer at (21956120576, 0) [2026-04-08 08:01:23.297651 INFO buffer_manager] Allocated weights buffer at (21956120576, 132120576) [2026-04-08 08:01:23.297653 INFO buffer_manager] Allocated weights buffer at (22088241152, 57344) [2026-04-08 08:01:23.297655 INFO buffer_manager] Allocated weights buffer at (22088298496, 132120576) [2026-04-08 08:01:23.297656 INFO buffer_manager] Allocated weights buffer at (22220419072, 57344) [2026-04-08 08:01:23.297658 INFO buffer_manager] Allocated weights buffer at (22220476416, 132120576) [2026-04-08 08:01:23.297659 INFO buffer_manager] Allocated weights buffer at (22352596992, 57344) [2026-04-08 08:01:23.297661 INFO buffer_manager] Allocated weights buffer at (22352654336, 0) [2026-04-08 08:01:23.297662 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=58, cache_slot=58) planned desc only [2026-04-08 08:01:23.333899 INFO buffer_manager] Allocated weights buffer at (22352654336, 0) [2026-04-08 08:01:23.333912 INFO buffer_manager] Allocated weights buffer at (22352654336, 132120576) [2026-04-08 08:01:23.333914 INFO buffer_manager] Allocated weights buffer at (22484774912, 57344) [2026-04-08 08:01:23.333915 INFO buffer_manager] Allocated weights buffer at (22484832256, 132120576) [2026-04-08 08:01:23.333917 INFO buffer_manager] Allocated weights buffer at (22616952832, 57344) [2026-04-08 08:01:23.333918 INFO buffer_manager] Allocated weights buffer at (22617010176, 132120576) [2026-04-08 08:01:23.333920 INFO buffer_manager] Allocated weights buffer at (22749130752, 57344) [2026-04-08 08:01:23.333921 INFO buffer_manager] Allocated weights buffer at (22749188096, 0) [2026-04-08 08:01:23.333923 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=59, cache_slot=59) planned desc only [2026-04-08 08:01:23.370174 INFO buffer_manager] Allocated weights buffer at (22749188096, 0) [2026-04-08 08:01:23.370187 INFO buffer_manager] Allocated weights buffer at (22749188096, 132120576) [2026-04-08 08:01:23.370189 INFO buffer_manager] Allocated weights buffer at (22881308672, 57344) [2026-04-08 08:01:23.370191 INFO buffer_manager] Allocated weights buffer at (22881366016, 132120576) [2026-04-08 08:01:23.370192 INFO buffer_manager] Allocated weights buffer at (23013486592, 57344) [2026-04-08 08:01:23.370194 INFO buffer_manager] Allocated weights buffer at (23013543936, 132120576) [2026-04-08 08:01:23.370200 INFO buffer_manager] Allocated weights buffer at (23145664512, 57344) [2026-04-08 08:01:23.370201 INFO buffer_manager] Allocated weights buffer at (23145721856, 0) [2026-04-08 08:01:23.370203 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=60, cache_slot=60) planned desc only [2026-04-08 08:01:23.733075 INFO buffer_manager] Allocated weights buffer at (23145721856, 0) [2026-04-08 08:01:23.733098 INFO buffer_manager] Allocated weights buffer at (23145721856, 132120576) [2026-04-08 08:01:23.733100 INFO buffer_manager] Allocated weights buffer at (23277842432, 57344) [2026-04-08 08:01:23.733102 INFO buffer_manager] Allocated weights buffer at (23277899776, 132120576) [2026-04-08 08:01:23.733103 INFO buffer_manager] Allocated weights buffer at (23410020352, 57344) [2026-04-08 08:01:23.733105 INFO buffer_manager] Allocated weights buffer at (23410077696, 132120576) [2026-04-08 08:01:23.733106 INFO buffer_manager] Allocated weights buffer at (23542198272, 57344) [2026-04-08 08:01:23.733108 INFO buffer_manager] Allocated weights buffer at (23542255616, 0) [2026-04-08 08:01:23.733110 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=61, cache_slot=61) planned desc only [2026-04-08 08:02:18.090839 INFO fp8_dpdk_common] fp9 fast path forced on by default in the current kernel build [2026-04-08 08:02:18.129870 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=152, expert_tiles=157, avg_tile_batch=1.78, prepare=594.144µs, send=4.650784ms, judge_wait=29.309555ms, fetch=2.840433ms, reduce=130ns; duck time-ns stats: p50=29.024691ms, p90=29.0819ms, max=29.103776ms; kernel_model: matmul=0.770703 GFLOP (26.481 GFLOP/s @ duck_max), param_stream=0.216072G (7.424 Gparam/s @ duck_max), weight_stream=231.921 MiB (8.356 GB/s @ duck_max) [2026-04-08 08:02:18.165473 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=153, expert_tiles=157, avg_tile_batch=1.78, prepare=333.203µs, send=1.870724ms, judge_wait=28.173084ms, fetch=2.163438ms, reduce=151ns; duck time-ns stats: p50=27.872574ms, p90=27.952262ms, max=28.014012ms; kernel_model: matmul=0.770703 GFLOP (27.511 GFLOP/s @ duck_max), param_stream=0.216072G (7.713 Gparam/s @ duck_max), weight_stream=231.921 MiB (8.681 GB/s @ duck_max) [2026-04-08 08:02:18.201258 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=156, expert_tiles=161, avg_tile_batch=1.74, prepare=198.24µs, send=1.869768ms, judge_wait=28.832162ms, fetch=1.974994ms, reduce=144ns; duck time-ns stats: p50=28.59523ms, p90=28.64043ms, max=28.668988ms; kernel_model: matmul=0.770703 GFLOP (26.883 GFLOP/s @ duck_max), param_stream=0.221577G (7.729 Gparam/s @ duck_max), weight_stream=237.830 MiB (8.699 GB/s @ duck_max) [2026-04-08 08:02:18.237316 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=137, expert_tiles=147, avg_tile_batch=1.90, prepare=55.961µs, send=1.872788ms, judge_wait=29.254603ms, fetch=1.973969ms, reduce=100ns; duck time-ns stats: p50=29.027842ms, p90=29.070385ms, max=29.087595ms; kernel_model: matmul=0.770703 GFLOP (26.496 GFLOP/s @ duck_max), param_stream=0.202310G (6.955 Gparam/s @ duck_max), weight_stream=217.149 MiB (7.828 GB/s @ duck_max) [2026-04-08 08:02:18.273008 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=153, expert_tiles=162, avg_tile_batch=1.73, prepare=53.49µs, send=1.870038ms, judge_wait=28.913238ms, fetch=1.989391ms, reduce=100ns; duck time-ns stats: p50=28.674402ms, p90=28.704611ms, max=28.747588ms; kernel_model: matmul=0.770703 GFLOP (26.809 GFLOP/s @ duck_max), param_stream=0.222953G (7.756 Gparam/s @ duck_max), weight_stream=239.307 MiB (8.729 GB/s @ duck_max) [2026-04-08 08:02:18.308651 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=124, expert_tiles=142, avg_tile_batch=1.97, prepare=59.597µs, send=1.871434ms, judge_wait=28.809965ms, fetch=2.011019ms, reduce=15ns; duck time-ns stats: p50=28.526048ms, p90=28.622317ms, max=28.650212ms; kernel_model: matmul=0.770703 GFLOP (26.900 GFLOP/s @ duck_max), param_stream=0.195428G (6.821 Gparam/s @ duck_max), weight_stream=209.763 MiB (7.677 GB/s @ duck_max) [2026-04-08 08:02:18.345090 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=134, expert_tiles=150, avg_tile_batch=1.87, prepare=57.771µs, send=1.870394ms, judge_wait=29.470275ms, fetch=1.9817ms, reduce=112ns; duck time-ns stats: p50=29.256249ms, p90=29.297299ms, max=29.302893ms; kernel_model: matmul=0.770703 GFLOP (26.301 GFLOP/s @ duck_max), param_stream=0.206438G (7.045 Gparam/s @ duck_max), weight_stream=221.581 MiB (7.929 GB/s @ duck_max) [2026-04-08 08:02:18.379273 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=130, expert_tiles=145, avg_tile_batch=1.93, prepare=62.4µs, send=1.870247ms, judge_wait=27.310491ms, fetch=1.972978ms, reduce=104ns; duck time-ns stats: p50=27.002204ms, p90=27.113083ms, max=27.148724ms; kernel_model: matmul=0.770703 GFLOP (28.388 GFLOP/s @ duck_max), param_stream=0.199557G (7.351 Gparam/s @ duck_max), weight_stream=214.194 MiB (8.273 GB/s @ duck_max) [2026-04-08 08:02:18.413448 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=122, expert_tiles=138, avg_tile_batch=2.03, prepare=206.261µs, send=1.868977ms, judge_wait=27.214207ms, fetch=1.984148ms, reduce=153ns; duck time-ns stats: p50=26.948072ms, p90=27.001672ms, max=27.040986ms; kernel_model: matmul=0.770703 GFLOP (28.501 GFLOP/s @ duck_max), param_stream=0.189923G (7.024 Gparam/s @ duck_max), weight_stream=203.854 MiB (7.905 GB/s @ duck_max) [2026-04-08 08:02:18.446595 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=128, expert_tiles=141, avg_tile_batch=1.99, prepare=59.84µs, send=1.869893ms, judge_wait=26.228627ms, fetch=1.972332ms, reduce=133ns; duck time-ns stats: p50=25.991913ms, p90=26.03255ms, max=26.07556ms; kernel_model: matmul=0.770703 GFLOP (29.557 GFLOP/s @ duck_max), param_stream=0.194052G (7.442 Gparam/s @ duck_max), weight_stream=208.286 MiB (8.376 GB/s @ duck_max) [2026-04-08 08:02:18.480277 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=105, expert_tiles=127, avg_tile_batch=2.20, prepare=55.31µs, send=1.875661ms, judge_wait=25.977539ms, fetch=1.978288ms, reduce=104ns; duck time-ns stats: p50=25.74388ms, p90=25.790007ms, max=25.816689ms; kernel_model: matmul=0.770703 GFLOP (29.853 GFLOP/s @ duck_max), param_stream=0.174785G (6.770 Gparam/s @ duck_max), weight_stream=187.605 MiB (7.620 GB/s @ duck_max) [2026-04-08 08:02:18.517077 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=113, expert_tiles=138, avg_tile_batch=2.03, prepare=95.166µs, send=1.873136ms, judge_wait=27.248636ms, fetch=1.972146ms, reduce=101ns; duck time-ns stats: p50=26.946304ms, p90=27.008901ms, max=27.087252ms; kernel_model: matmul=0.770703 GFLOP (28.453 GFLOP/s @ duck_max), param_stream=0.189923G (7.012 Gparam/s @ duck_max), weight_stream=203.854 MiB (7.891 GB/s @ duck_max) [2026-04-08 08:02:18.551250 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=88, expert_tiles=115, avg_tile_batch=2.43, prepare=118.374µs, send=1.871965ms, judge_wait=24.789108ms, fetch=1.981028ms, reduce=102ns; duck time-ns stats: p50=24.56387ms, p90=24.611271ms, max=24.63367ms; kernel_model: matmul=0.770703 GFLOP (31.287 GFLOP/s @ duck_max), param_stream=0.158269G (6.425 Gparam/s @ duck_max), weight_stream=169.878 MiB (7.231 GB/s @ duck_max) [2026-04-08 08:02:18.590731 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=94, expert_tiles=119, avg_tile_batch=2.35, prepare=106.843µs, send=1.871862ms, judge_wait=30.14512ms, fetch=1.982865ms, reduce=133ns; duck time-ns stats: p50=29.904735ms, p90=29.929773ms, max=29.963477ms; kernel_model: matmul=0.770703 GFLOP (25.721 GFLOP/s @ duck_max), param_stream=0.163774G (5.466 Gparam/s @ duck_max), weight_stream=175.787 MiB (6.152 GB/s @ duck_max) [2026-04-08 08:02:18.626888 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=119, expert_tiles=136, avg_tile_batch=2.06, prepare=86.825µs, send=1.874471ms, judge_wait=26.896016ms, fetch=1.972299ms, reduce=21ns; duck time-ns stats: p50=26.616183ms, p90=26.675265ms, max=26.722277ms; kernel_model: matmul=0.770703 GFLOP (28.841 GFLOP/s @ duck_max), param_stream=0.187171G (7.004 Gparam/s @ duck_max), weight_stream=200.900 MiB (7.883 GB/s @ duck_max) [2026-04-08 08:02:18.663311 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=124, expert_tiles=142, avg_tile_batch=1.97, prepare=85.157µs, send=1.87355ms, judge_wait=27.125883ms, fetch=1.980846ms, reduce=115ns; duck time-ns stats: p50=26.894858ms, p90=26.932982ms, max=26.95436ms; kernel_model: matmul=0.770703 GFLOP (28.593 GFLOP/s @ duck_max), param_stream=0.195428G (7.250 Gparam/s @ duck_max), weight_stream=209.763 MiB (8.160 GB/s @ duck_max) [2026-04-08 08:02:18.701161 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=99, expert_tiles=128, avg_tile_batch=2.19, prepare=100.671µs, send=1.871606ms, judge_wait=28.511059ms, fetch=1.981781ms, reduce=104ns; duck time-ns stats: p50=28.253736ms, p90=28.30461ms, max=28.319009ms; kernel_model: matmul=0.770703 GFLOP (27.215 GFLOP/s @ duck_max), param_stream=0.176161G (6.221 Gparam/s @ duck_max), weight_stream=189.082 MiB (7.001 GB/s @ duck_max) [2026-04-08 08:02:18.737899 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=111, expert_tiles=134, avg_tile_batch=2.09, prepare=85.983µs, send=1.873577ms, judge_wait=27.475647ms, fetch=1.97133ms, reduce=102ns; duck time-ns stats: p50=27.246688ms, p90=27.281629ms, max=27.294693ms; kernel_model: matmul=0.770703 GFLOP (28.236 GFLOP/s @ duck_max), param_stream=0.184418G (6.757 Gparam/s @ duck_max), weight_stream=197.945 MiB (7.604 GB/s @ duck_max) [2026-04-08 08:02:18.776693 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=102, expert_tiles=129, avg_tile_batch=2.17, prepare=86.462µs, send=1.872466ms, judge_wait=29.533465ms, fetch=1.97937ms, reduce=101ns; duck time-ns stats: p50=29.277437ms, p90=29.32594ms, max=29.341718ms; kernel_model: matmul=0.770703 GFLOP (26.266 GFLOP/s @ duck_max), param_stream=0.177537G (6.051 Gparam/s @ duck_max), weight_stream=190.559 MiB (6.810 GB/s @ duck_max) [2026-04-08 08:02:18.811818 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=109, expert_tiles=132, avg_tile_batch=2.12, prepare=89.486µs, send=1.871536ms, judge_wait=25.854343ms, fetch=1.972989ms, reduce=132ns; duck time-ns stats: p50=25.568092ms, p90=25.660786ms, max=25.668512ms; kernel_model: matmul=0.770703 GFLOP (30.025 GFLOP/s @ duck_max), param_stream=0.181666G (7.077 Gparam/s @ duck_max), weight_stream=194.991 MiB (7.966 GB/s @ duck_max) [2026-04-08 08:02:18.846729 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=86, expert_tiles=119, avg_tile_batch=2.35, prepare=104.524µs, send=1.87255ms, judge_wait=25.614673ms, fetch=1.98473ms, reduce=107ns; duck time-ns stats: p50=25.420535ms, p90=25.447432ms, max=25.454767ms; kernel_model: matmul=0.770703 GFLOP (30.277 GFLOP/s @ duck_max), param_stream=0.163774G (6.434 Gparam/s @ duck_max), weight_stream=175.787 MiB (7.241 GB/s @ duck_max) [2026-04-08 08:02:18.883229 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=82, expert_tiles=115, avg_tile_batch=2.43, prepare=102.894µs, send=1.873104ms, judge_wait=27.283626ms, fetch=1.959857ms, reduce=20ns; duck time-ns stats: p50=27.065096ms, p90=27.093046ms, max=27.115317ms; kernel_model: matmul=0.770703 GFLOP (28.423 GFLOP/s @ duck_max), param_stream=0.158269G (5.837 Gparam/s @ duck_max), weight_stream=169.878 MiB (6.569 GB/s @ duck_max) [2026-04-08 08:02:18.917507 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=97, expert_tiles=122, avg_tile_batch=2.30, prepare=88.118µs, send=1.871314ms, judge_wait=24.98497ms, fetch=1.975596ms, reduce=104ns; duck time-ns stats: p50=24.763659ms, p90=24.794058ms, max=24.819325ms; kernel_model: matmul=0.770703 GFLOP (31.053 GFLOP/s @ duck_max), param_stream=0.167903G (6.765 Gparam/s @ duck_max), weight_stream=180.219 MiB (7.614 GB/s @ duck_max) [2026-04-08 08:02:18.951372 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=100, expert_tiles=121, avg_tile_batch=2.31, prepare=88.469µs, send=1.875368ms, judge_wait=24.516092ms, fetch=1.976096ms, reduce=102ns; duck time-ns stats: p50=24.276048ms, p90=24.318977ms, max=24.362182ms; kernel_model: matmul=0.770703 GFLOP (31.635 GFLOP/s @ duck_max), param_stream=0.166527G (6.835 Gparam/s @ duck_max), weight_stream=178.742 MiB (7.693 GB/s @ duck_max) [2026-04-08 08:02:18.985962 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=92, expert_tiles=120, avg_tile_batch=2.33, prepare=87.229µs, send=1.875398ms, judge_wait=25.352545ms, fetch=1.978216ms, reduce=101ns; duck time-ns stats: p50=25.137482ms, p90=25.179841ms, max=25.191804ms; kernel_model: matmul=0.770703 GFLOP (30.593 GFLOP/s @ duck_max), param_stream=0.165151G (6.556 Gparam/s @ duck_max), weight_stream=177.264 MiB (7.378 GB/s @ duck_max) [2026-04-08 08:02:19.022067 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=102, expert_tiles=124, avg_tile_batch=2.26, prepare=101.679µs, send=1.87258ms, judge_wait=26.901507ms, fetch=1.964553ms, reduce=21ns; duck time-ns stats: p50=26.690827ms, p90=26.73184ms, max=26.742275ms; kernel_model: matmul=0.770703 GFLOP (28.820 GFLOP/s @ duck_max), param_stream=0.170656G (6.381 Gparam/s @ duck_max), weight_stream=183.173 MiB (7.182 GB/s @ duck_max) [2026-04-08 08:02:19.057006 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=86, expert_tiles=116, avg_tile_batch=2.41, prepare=92.959µs, send=1.875623ms, judge_wait=25.5094ms, fetch=2.119858ms, reduce=20ns; duck time-ns stats: p50=25.266121ms, p90=25.298622ms, max=25.329608ms; kernel_model: matmul=0.770703 GFLOP (30.427 GFLOP/s @ duck_max), param_stream=0.159646G (6.303 Gparam/s @ duck_max), weight_stream=171.356 MiB (7.094 GB/s @ duck_max) [2026-04-08 08:02:19.092365 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=100, expert_tiles=130, avg_tile_batch=2.15, prepare=97.071µs, send=1.872989ms, judge_wait=25.853071ms, fetch=1.971882ms, reduce=145ns; duck time-ns stats: p50=25.627538ms, p90=25.686724ms, max=25.695424ms; kernel_model: matmul=0.770703 GFLOP (29.994 GFLOP/s @ duck_max), param_stream=0.178913G (6.963 Gparam/s @ duck_max), weight_stream=192.036 MiB (7.837 GB/s @ duck_max) [2026-04-08 08:02:19.127829 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=103, expert_tiles=123, avg_tile_batch=2.28, prepare=52.445µs, send=1.870714ms, judge_wait=27.141359ms, fetch=1.97368ms, reduce=150ns; duck time-ns stats: p50=26.920921ms, p90=26.947491ms, max=26.968346ms; kernel_model: matmul=0.770703 GFLOP (28.578 GFLOP/s @ duck_max), param_stream=0.169279G (6.277 Gparam/s @ duck_max), weight_stream=181.696 MiB (7.065 GB/s @ duck_max) [2026-04-08 08:02:19.161566 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=91, expert_tiles=122, avg_tile_batch=2.30, prepare=60.587µs, send=1.874029ms, judge_wait=26.004805ms, fetch=1.991309ms, reduce=104ns; duck time-ns stats: p50=25.773557ms, p90=25.800399ms, max=25.831089ms; kernel_model: matmul=0.770703 GFLOP (29.836 GFLOP/s @ duck_max), param_stream=0.167903G (6.500 Gparam/s @ duck_max), weight_stream=180.219 MiB (7.316 GB/s @ duck_max) [2026-04-08 08:02:19.194825 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=88, expert_tiles=119, avg_tile_batch=2.35, prepare=70.989µs, send=1.890373ms, judge_wait=25.576886ms, fetch=1.979151ms, reduce=153ns; duck time-ns stats: p50=25.315571ms, p90=25.36605ms, max=25.384641ms; kernel_model: matmul=0.770703 GFLOP (30.361 GFLOP/s @ duck_max), param_stream=0.163774G (6.452 Gparam/s @ duck_max), weight_stream=175.787 MiB (7.261 GB/s @ duck_max) [2026-04-08 08:02:19.228043 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=98, expert_tiles=121, avg_tile_batch=2.31, prepare=184.003µs, send=1.876748ms, judge_wait=26.165854ms, fetch=1.973864ms, reduce=101ns; duck time-ns stats: p50=25.834711ms, p90=25.910962ms, max=25.954666ms; kernel_model: matmul=0.770703 GFLOP (29.694 GFLOP/s @ duck_max), param_stream=0.166527G (6.416 Gparam/s @ duck_max), weight_stream=178.742 MiB (7.221 GB/s @ duck_max) [2026-04-08 08:02:19.262861 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=97, expert_tiles=127, avg_tile_batch=2.20, prepare=51.204µs, send=1.893355ms, judge_wait=27.850507ms, fetch=1.974733ms, reduce=151ns; duck time-ns stats: p50=27.551262ms, p90=27.605111ms, max=27.63761ms; kernel_model: matmul=0.770703 GFLOP (27.886 GFLOP/s @ duck_max), param_stream=0.174785G (6.324 Gparam/s @ duck_max), weight_stream=187.605 MiB (7.118 GB/s @ duck_max) [2026-04-08 08:02:19.296827 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=110, expert_tiles=134, avg_tile_batch=2.09, prepare=53.258µs, send=1.893634ms, judge_wait=27.176267ms, fetch=1.974429ms, reduce=133ns; duck time-ns stats: p50=26.834326ms, p90=26.937995ms, max=26.964505ms; kernel_model: matmul=0.770703 GFLOP (28.582 GFLOP/s @ duck_max), param_stream=0.184418G (6.839 Gparam/s @ duck_max), weight_stream=197.945 MiB (7.698 GB/s @ duck_max) [2026-04-08 08:02:19.329415 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=102, expert_tiles=128, avg_tile_batch=2.19, prepare=54.119µs, send=1.871141ms, judge_wait=25.785545ms, fetch=1.968352ms, reduce=134ns; duck time-ns stats: p50=25.426422ms, p90=25.452045ms, max=25.473178ms; kernel_model: matmul=0.770703 GFLOP (30.255 GFLOP/s @ duck_max), param_stream=0.176161G (6.916 Gparam/s @ duck_max), weight_stream=189.082 MiB (7.783 GB/s @ duck_max) [2026-04-08 08:02:19.362896 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=93, expert_tiles=123, avg_tile_batch=2.28, prepare=53.46µs, send=1.894032ms, judge_wait=26.608032ms, fetch=1.987565ms, reduce=133ns; duck time-ns stats: p50=26.348541ms, p90=26.390582ms, max=26.403948ms; kernel_model: matmul=0.770703 GFLOP (29.189 GFLOP/s @ duck_max), param_stream=0.169279G (6.411 Gparam/s @ duck_max), weight_stream=181.696 MiB (7.216 GB/s @ duck_max) [2026-04-08 08:02:19.396498 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=98, expert_tiles=122, avg_tile_batch=2.30, prepare=60.323µs, send=1.888449ms, judge_wait=26.76223ms, fetch=1.967003ms, reduce=22ns; duck time-ns stats: p50=26.34508ms, p90=26.373152ms, max=26.396451ms; kernel_model: matmul=0.770703 GFLOP (29.197 GFLOP/s @ duck_max), param_stream=0.167903G (6.361 Gparam/s @ duck_max), weight_stream=180.219 MiB (7.159 GB/s @ duck_max) [2026-04-08 08:02:19.429280 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=102, expert_tiles=129, avg_tile_batch=2.17, prepare=64.18µs, send=1.892991ms, judge_wait=25.898368ms, fetch=1.96955ms, reduce=20ns; duck time-ns stats: p50=25.619227ms, p90=25.675414ms, max=25.690596ms; kernel_model: matmul=0.770703 GFLOP (29.999 GFLOP/s @ duck_max), param_stream=0.177537G (6.911 Gparam/s @ duck_max), weight_stream=190.559 MiB (7.778 GB/s @ duck_max) [2026-04-08 08:02:19.463688 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=103, expert_tiles=123, avg_tile_batch=2.28, prepare=53.953µs, send=1.892971ms, judge_wait=27.585786ms, fetch=1.977349ms, reduce=135ns; duck time-ns stats: p50=26.562844ms, p90=26.596848ms, max=26.647146ms; kernel_model: matmul=0.770703 GFLOP (28.923 GFLOP/s @ duck_max), param_stream=0.169279G (6.353 Gparam/s @ duck_max), weight_stream=181.696 MiB (7.150 GB/s @ duck_max) [2026-04-08 08:02:19.495707 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=106, expert_tiles=127, avg_tile_batch=2.20, prepare=57.248µs, send=1.89462ms, judge_wait=25.202623ms, fetch=1.96964ms, reduce=133ns; duck time-ns stats: p50=24.974493ms, p90=24.993927ms, max=25.013347ms; kernel_model: matmul=0.770703 GFLOP (30.812 GFLOP/s @ duck_max), param_stream=0.174785G (6.988 Gparam/s @ duck_max), weight_stream=187.605 MiB (7.865 GB/s @ duck_max) [2026-04-08 08:02:19.528822 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=102, expert_tiles=125, avg_tile_batch=2.24, prepare=50.692µs, send=1.894583ms, judge_wait=26.340382ms, fetch=1.995479ms, reduce=133ns; duck time-ns stats: p50=25.410612ms, p90=25.467294ms, max=25.477303ms; kernel_model: matmul=0.770703 GFLOP (30.251 GFLOP/s @ duck_max), param_stream=0.172032G (6.752 Gparam/s @ duck_max), weight_stream=184.650 MiB (7.600 GB/s @ duck_max) [2026-04-08 08:02:19.563673 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=108, expert_tiles=129, avg_tile_batch=2.17, prepare=52.367µs, send=1.893864ms, judge_wait=27.997076ms, fetch=2.031094ms, reduce=132ns; duck time-ns stats: p50=26.712808ms, p90=26.741356ms, max=26.757731ms; kernel_model: matmul=0.770703 GFLOP (28.803 GFLOP/s @ duck_max), param_stream=0.177537G (6.635 Gparam/s @ duck_max), weight_stream=190.559 MiB (7.468 GB/s @ duck_max) [2026-04-08 08:02:19.597316 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=118, expert_tiles=139, avg_tile_batch=2.01, prepare=51.884µs, send=1.892635ms, judge_wait=26.864222ms, fetch=1.970345ms, reduce=138ns; duck time-ns stats: p50=26.394218ms, p90=26.503644ms, max=26.515908ms; kernel_model: matmul=0.770703 GFLOP (29.066 GFLOP/s @ duck_max), param_stream=0.191300G (7.215 Gparam/s @ duck_max), weight_stream=205.331 MiB (8.120 GB/s @ duck_max) [2026-04-08 08:02:19.632572 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=101, expert_tiles=126, avg_tile_batch=2.22, prepare=53.036µs, send=1.874238ms, judge_wait=27.662414ms, fetch=1.983987ms, reduce=104ns; duck time-ns stats: p50=26.458896ms, p90=26.48385ms, max=26.541369ms; kernel_model: matmul=0.770703 GFLOP (29.038 GFLOP/s @ duck_max), param_stream=0.173408G (6.534 Gparam/s @ duck_max), weight_stream=186.128 MiB (7.353 GB/s @ duck_max) [2026-04-08 08:02:19.667081 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=98, expert_tiles=123, avg_tile_batch=2.28, prepare=97.669µs, send=1.876285ms, judge_wait=24.93856ms, fetch=1.991752ms, reduce=102ns; duck time-ns stats: p50=24.728324ms, p90=24.748417ms, max=24.775563ms; kernel_model: matmul=0.770703 GFLOP (31.107 GFLOP/s @ duck_max), param_stream=0.169279G (6.833 Gparam/s @ duck_max), weight_stream=181.696 MiB (7.690 GB/s @ duck_max) [2026-04-08 08:02:19.703160 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=94, expert_tiles=122, avg_tile_batch=2.30, prepare=107.782µs, send=1.873982ms, judge_wait=26.643339ms, fetch=1.969059ms, reduce=28ns; duck time-ns stats: p50=26.377006ms, p90=26.419301ms, max=26.458402ms; kernel_model: matmul=0.770703 GFLOP (29.129 GFLOP/s @ duck_max), param_stream=0.167903G (6.346 Gparam/s @ duck_max), weight_stream=180.219 MiB (7.142 GB/s @ duck_max) [2026-04-08 08:02:19.737179 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=94, expert_tiles=121, avg_tile_batch=2.31, prepare=86.439µs, send=1.874859ms, judge_wait=24.806704ms, fetch=1.979515ms, reduce=104ns; duck time-ns stats: p50=24.576296ms, p90=24.593346ms, max=24.627575ms; kernel_model: matmul=0.770703 GFLOP (31.294 GFLOP/s @ duck_max), param_stream=0.166527G (6.762 Gparam/s @ duck_max), weight_stream=178.742 MiB (7.610 GB/s @ duck_max) [2026-04-08 08:02:19.770873 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=109, expert_tiles=130, avg_tile_batch=2.15, prepare=49.539µs, send=1.870268ms, judge_wait=26.723719ms, fetch=2.182794ms, reduce=21ns; duck time-ns stats: p50=25.744343ms, p90=25.791144ms, max=25.841557ms; kernel_model: matmul=0.770703 GFLOP (29.824 GFLOP/s @ duck_max), param_stream=0.178913G (6.923 Gparam/s @ duck_max), weight_stream=192.036 MiB (7.792 GB/s @ duck_max) [2026-04-08 08:02:19.806461 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=95, expert_tiles=124, avg_tile_batch=2.26, prepare=51.421µs, send=1.871346ms, judge_wait=28.822829ms, fetch=1.973151ms, reduce=154ns; duck time-ns stats: p50=28.557378ms, p90=28.59874ms, max=28.618995ms; kernel_model: matmul=0.770703 GFLOP (26.930 GFLOP/s @ duck_max), param_stream=0.170656G (5.963 Gparam/s @ duck_max), weight_stream=183.173 MiB (6.711 GB/s @ duck_max) [2026-04-08 08:02:19.843064 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=94, expert_tiles=121, avg_tile_batch=2.31, prepare=52µs, send=1.871094ms, judge_wait=29.876957ms, fetch=1.967739ms, reduce=155ns; duck time-ns stats: p50=29.587977ms, p90=29.628775ms, max=29.632224ms; kernel_model: matmul=0.770703 GFLOP (26.009 GFLOP/s @ duck_max), param_stream=0.166527G (5.620 Gparam/s @ duck_max), weight_stream=178.742 MiB (6.325 GB/s @ duck_max) [2026-04-08 08:02:19.877605 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=98, expert_tiles=120, avg_tile_batch=2.33, prepare=52.634µs, send=1.872957ms, judge_wait=27.778857ms, fetch=1.975889ms, reduce=150ns; duck time-ns stats: p50=26.823495ms, p90=26.860528ms, max=26.891678ms; kernel_model: matmul=0.770703 GFLOP (28.660 GFLOP/s @ duck_max), param_stream=0.165151G (6.141 Gparam/s @ duck_max), weight_stream=177.264 MiB (6.912 GB/s @ duck_max) [2026-04-08 08:02:19.911769 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=94, expert_tiles=122, avg_tile_batch=2.30, prepare=62.013µs, send=1.87214ms, judge_wait=27.400928ms, fetch=1.980648ms, reduce=133ns; duck time-ns stats: p50=27.028586ms, p90=27.055507ms, max=27.075665ms; kernel_model: matmul=0.770703 GFLOP (28.465 GFLOP/s @ duck_max), param_stream=0.167903G (6.201 Gparam/s @ duck_max), weight_stream=180.219 MiB (6.979 GB/s @ duck_max) [2026-04-08 08:02:19.946035 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=105, expert_tiles=128, avg_tile_batch=2.19, prepare=57.094µs, send=1.871943ms, judge_wait=27.386606ms, fetch=1.974394ms, reduce=153ns; duck time-ns stats: p50=27.181921ms, p90=27.208385ms, max=27.220073ms; kernel_model: matmul=0.770703 GFLOP (28.314 GFLOP/s @ duck_max), param_stream=0.176161G (6.472 Gparam/s @ duck_max), weight_stream=189.082 MiB (7.284 GB/s @ duck_max) [2026-04-08 08:02:19.980204 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=108, expert_tiles=131, avg_tile_batch=2.14, prepare=53.339µs, send=1.873936ms, judge_wait=27.373769ms, fetch=1.971465ms, reduce=155ns; duck time-ns stats: p50=26.985529ms, p90=27.030091ms, max=27.051647ms; kernel_model: matmul=0.770703 GFLOP (28.490 GFLOP/s @ duck_max), param_stream=0.180290G (6.665 Gparam/s @ duck_max), weight_stream=193.514 MiB (7.501 GB/s @ duck_max) [2026-04-08 08:02:20.016437 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=111, expert_tiles=134, avg_tile_batch=2.09, prepare=58.453µs, send=1.871305ms, judge_wait=29.45496ms, fetch=1.973889ms, reduce=151ns; duck time-ns stats: p50=28.97182ms, p90=29.009415ms, max=29.019941ms; kernel_model: matmul=0.770703 GFLOP (26.558 GFLOP/s @ duck_max), param_stream=0.184418G (6.355 Gparam/s @ duck_max), weight_stream=197.945 MiB (7.152 GB/s @ duck_max) [2026-04-08 08:02:20.048273 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=100, expert_tiles=125, avg_tile_batch=2.24, prepare=52.867µs, send=1.872701ms, judge_wait=25.050368ms, fetch=1.973623ms, reduce=154ns; duck time-ns stats: p50=24.854252ms, p90=24.882126ms, max=24.896381ms; kernel_model: matmul=0.770703 GFLOP (30.956 GFLOP/s @ duck_max), param_stream=0.172032G (6.910 Gparam/s @ duck_max), weight_stream=184.650 MiB (7.777 GB/s @ duck_max) [2026-04-08 08:02:20.082914 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=103, expert_tiles=128, avg_tile_batch=2.19, prepare=56.985µs, send=1.872973ms, judge_wait=27.779929ms, fetch=1.995159ms, reduce=149ns; duck time-ns stats: p50=26.492879ms, p90=26.522169ms, max=26.542925ms; kernel_model: matmul=0.770703 GFLOP (29.036 GFLOP/s @ duck_max), param_stream=0.176161G (6.637 Gparam/s @ duck_max), weight_stream=189.082 MiB (7.470 GB/s @ duck_max) [2026-04-08 08:02:20.115163 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=35, top_k=8, tasks=280, unique_experts=99, expert_tiles=124, avg_tile_batch=2.26, prepare=53.23µs, send=1.87161ms, judge_wait=25.3364ms, fetch=1.99871ms, reduce=143ns; duck time-ns stats: p50=25.096023ms, p90=25.146679ms, max=25.173682ms; kernel_model: matmul=0.770703 GFLOP (30.615 GFLOP/s @ duck_max), param_stream=0.170656G (6.779 Gparam/s @ duck_max), weight_stream=183.173 MiB (7.630 GB/s @ duck_max) [2026-04-08 08:02:20.181705 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=34, top_k=8, tasks=272, unique_experts=126, expert_tiles=138, avg_tile_batch=1.97, prepare=385.969µs, send=3.628537ms, judge_wait=28.039658ms, fetch=2.104105ms, reduce=135ns; duck time-ns stats: p50=27.786894ms, p90=27.834461ms, max=27.861285ms; kernel_model: matmul=0.748683 GFLOP (26.872 GFLOP/s @ duck_max), param_stream=0.189923G (6.817 Gparam/s @ duck_max), weight_stream=203.854 MiB (7.672 GB/s @ duck_max) [2026-04-08 08:02:20.190008 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.047392ms; phases: prepare=5.575µs, send=63.311µs, judge_wait=845.789µs, fetch=94.734µs, reduce=20ns, writeback=425ns; duck time-ns stats: p50=759.518µs, p90=763.91µs, max=766.796µs; effective_read: activated_experts=8, params=0.011010G (14.359 Gparam/s @ duck_max), memory=11.818 MiB (16.160 GB/s @ duck_max), judge_gap=78.993µs, judge_ratio=1.103x [2026-04-08 08:02:20.903017 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.356526ms; phases: prepare=5.536µs, send=428.028µs, judge_wait=786.727µs, fetch=98.356µs, reduce=22ns, writeback=406ns; duck time-ns stats: p50=701.744µs, p90=706.771µs, max=707.975µs; effective_read: activated_experts=8, params=0.011010G (15.551 Gparam/s @ duck_max), memory=11.818 MiB (17.503 GB/s @ duck_max), judge_gap=78.752µs, judge_ratio=1.111x Token # 1: 761.095ms; value: next_token_ids=tensor([10051], device='cuda:0') mtp accept=0 prop=7157 top1=10051 accp=0.084 next=draft=3115 prop=3115 olap pair=682.8ms serial=1257.7ms gain=574.9ms ratio=0.46 s0=596.0ms s1=661.6ms wait=0.2/43.1ms pred gate=device [2026-04-08 08:02:21.016072 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 983.526µs; phases: prepare=3.404µs, send=64.529µs, judge_wait=785.912µs, fetch=92.384µs, reduce=20ns, writeback=563ns; duck time-ns stats: p50=700.557µs, p90=705.65µs, max=711.072µs; effective_read: activated_experts=8, params=0.011010G (15.484 Gparam/s @ duck_max), memory=11.818 MiB (17.427 GB/s @ duck_max), judge_gap=74.84µs, judge_ratio=1.105x Token # 2: 113.008ms; value: next_token_ids=tensor([3115], device='cuda:0') mtp accept=1 prop=3115 top1=3115 accp=0.622 next=draft=445 prop=445 olap pair=107.7ms serial=189.8ms gain=82.2ms ratio=0.43 s0=4.4ms s1=185.5ms wait=0.1/50.7ms pred gate=device [2026-04-08 08:02:21.019964 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 981.404µs; phases: prepare=3.268µs, send=62.12µs, judge_wait=785.603µs, fetch=93.232µs, reduce=20ns, writeback=418ns; duck time-ns stats: p50=700.988µs, p90=706.918µs, max=710.174µs; effective_read: activated_experts=8, params=0.011010G (15.503 Gparam/s @ duck_max), memory=11.818 MiB (17.449 GB/s @ duck_max), judge_gap=75.429µs, judge_ratio=1.106x Token # 3: 3.796ms; value: next_token_ids=tensor([35151], device='cuda:0') mtp accept=0 prop=445 top1=445 accp=0.787 next=pair draft=111551 prop=7831 pred gate=device [2026-04-08 08:02:21.132788 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 978.352µs; phases: prepare=3.388µs, send=63.469µs, judge_wait=779.544µs, fetch=91.904µs, reduce=21ns, writeback=487ns; duck time-ns stats: p50=693.769µs, p90=700.873µs, max=702.72µs; effective_read: activated_experts=8, params=0.011010G (15.668 Gparam/s @ duck_max), memory=11.818 MiB (17.634 GB/s @ duck_max), judge_gap=76.824µs, judge_ratio=1.109x Token # 4: 112.978ms; value: next_token_ids=tensor([7831], device='cuda:0') mtp accept=1 prop=7831 top1=7831 accp=0.228 next=draft=8842 prop=8842 olap pair=107.6ms serial=190.3ms gain=82.7ms ratio=0.43 s0=4.2ms s1=186.0ms wait=0.1/50.8ms pred gate=device [2026-04-08 08:02:21.136700 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 982.866µs; phases: prepare=3.256µs, send=63.778µs, judge_wait=782.079µs, fetch=96.548µs, reduce=20ns, writeback=430ns; duck time-ns stats: p50=699.767µs, p90=703.804µs, max=705.815µs; effective_read: activated_experts=8, params=0.011010G (15.599 Gparam/s @ duck_max), memory=11.818 MiB (17.557 GB/s @ duck_max), judge_gap=76.264µs, judge_ratio=1.108x Token # 5: 3.803ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=1.000 next=pair draft=1237 prop=1237 pred gate=device [2026-04-08 08:02:21.250003 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 983.101µs; phases: prepare=3.796µs, send=63.199µs, judge_wait=786.029µs, fetch=92.954µs, reduce=21ns, writeback=417ns; duck time-ns stats: p50=697.574µs, p90=703.06µs, max=707.915µs; effective_read: activated_experts=8, params=0.011010G (15.553 Gparam/s @ duck_max), memory=11.818 MiB (17.504 GB/s @ duck_max), judge_gap=78.114µs, judge_ratio=1.110x Token # 6: 113.403ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=draft=3699 prop=3699 olap pair=108.1ms serial=190.5ms gain=82.4ms ratio=0.43 s0=5.5ms s1=185.0ms wait=0.2/49.3ms pred gate=device [2026-04-08 08:02:21.253944 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 982.852µs; phases: prepare=3.087µs, send=63.253µs, judge_wait=783.562µs, fetch=95.688µs, reduce=20ns, writeback=498ns; duck time-ns stats: p50=694.493µs, p90=702.804µs, max=706.242µs; effective_read: activated_experts=8, params=0.011010G (15.590 Gparam/s @ duck_max), memory=11.818 MiB (17.546 GB/s @ duck_max), judge_gap=77.32µs, judge_ratio=1.109x Token # 7: 3.869ms; value: next_token_ids=tensor([3699], device='cuda:0') mtp accept=1 prop=3699 top1=3699 accp=1.000 next=pair draft=47 prop=47 pred gate=device [2026-04-08 08:02:21.366863 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 976.905µs; phases: prepare=3.577µs, send=62.492µs, judge_wait=780.437µs, fetch=93.343µs, reduce=21ns, writeback=469ns; duck time-ns stats: p50=695.983µs, p90=701.426µs, max=704.167µs; effective_read: activated_experts=8, params=0.011010G (15.636 Gparam/s @ duck_max), memory=11.818 MiB (17.598 GB/s @ duck_max), judge_gap=76.27µs, judge_ratio=1.108x Token # 8: 112.997ms; value: next_token_ids=tensor([47], device='cuda:0') mtp accept=1 prop=47 top1=47 accp=1.000 next=draft=1227 prop=1227 olap pair=107.7ms serial=190.9ms gain=83.2ms ratio=0.44 s0=3.8ms s1=187.1ms wait=0.1/51.7ms pred gate=device [2026-04-08 08:02:21.370805 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.016163ms; phases: prepare=3.415µs, send=61.262µs, judge_wait=778.302µs, fetch=100.808µs, reduce=88ns, writeback=491ns; duck time-ns stats: p50=696.89µs, p90=701.603µs, max=703.63µs; effective_read: activated_experts=8, params=0.011010G (15.647 Gparam/s @ duck_max), memory=11.818 MiB (17.611 GB/s @ duck_max), judge_gap=74.672µs, judge_ratio=1.106x Token # 9: 3.822ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=1 prop=1227 top1=1227 accp=1.000 next=pair draft=36101 prop=36101 pred gate=device Token # 10: 112.890ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=0.608 next=draft=2833 prop=2833 olap pair=107.7ms serial=190.7ms gain=83.1ms ratio=0.44 s0=3.8ms s1=187.0ms wait=0.1/51.7ms pred gate=device Token # 11: 3.759ms; value: next_token_ids=tensor([2833], device='cuda:0') mtp accept=1 prop=2833 top1=2833 accp=0.873 next=pair draft=2543 prop=2543 pred gate=device Token # 12: 112.954ms; value: next_token_ids=tensor([15133], device='cuda:0') mtp accept=0 prop=2543 top1=15133 accp=0.260 next=draft=2543 prop=320 olap pair=107.6ms serial=190.5ms gain=82.9ms ratio=0.44 s0=4.0ms s1=186.5ms wait=0.1/51.2ms pred gate=device Token # 13: 113.125ms; value: next_token_ids=tensor([11753], device='cuda:0') mtp accept=0 prop=320 top1=2543 accp=0.865 next=draft=66518 prop=66518 olap pair=107.8ms serial=190.7ms gain=82.9ms ratio=0.43 s0=4.2ms s1=186.5ms wait=0.1/50.9ms pred gate=device Token # 14: 113.567ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=0.974 next=draft=13349 prop=13349 olap pair=108.1ms serial=191.5ms gain=83.4ms ratio=0.44 s0=3.7ms s1=187.8ms wait=0.1/51.6ms pred gate=device Token # 15: 3.731ms; value: next_token_ids=tensor([13349], device='cuda:0') mtp accept=1 prop=13349 top1=13349 accp=0.999 next=pair draft=320 prop=320 pred gate=device Token # 16: 115.758ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.999 next=draft=445 prop=445 olap pair=107.9ms serial=191.2ms gain=83.3ms ratio=0.44 s0=3.8ms s1=187.4ms wait=0.1/51.6ms pred gate=device Token # 17: 3.737ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=1 prop=445 top1=445 accp=0.650 next=pair draft=66518 prop=63812 pred gate=device Token # 18: 113.117ms; value: next_token_ids=tensor([63812], device='cuda:0') mtp accept=1 prop=63812 top1=63812 accp=0.262 next=draft=10172 prop=4339 olap pair=107.9ms serial=190.8ms gain=83.0ms ratio=0.43 s0=4.2ms s1=186.6ms wait=0.1/50.9ms pred gate=device Token # 19: 3.760ms; value: next_token_ids=tensor([10172], device='cuda:0') mtp accept=0 prop=4339 top1=10172 accp=0.531 next=pair draft=525 prop=525 pred gate=device Token # 20: 113.377ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=0 prop=525 top1=548 accp=0.000 next=draft=36101 prop=36101 olap pair=108.0ms serial=191.2ms gain=83.2ms ratio=0.43 s0=4.2ms s1=187.0ms wait=0.1/51.0ms pred gate=device Token # 21: 113.438ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=1.000 next=draft=525 prop=525 olap pair=108.0ms serial=191.0ms gain=83.1ms ratio=0.43 s0=4.4ms s1=186.6ms wait=0.1/50.5ms pred gate=device Token # 22: 3.778ms; value: next_token_ids=tensor([525], device='cuda:0') mtp accept=1 prop=525 top1=525 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 23: 113.668ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=25873 prop=25873 olap pair=108.3ms serial=191.8ms gain=83.5ms ratio=0.44 s0=3.9ms s1=187.9ms wait=0.1/51.7ms pred gate=device Token # 24: 3.729ms; value: next_token_ids=tensor([25873], device='cuda:0') mtp accept=1 prop=25873 top1=25873 accp=0.990 next=pair draft=66518 prop=66518 pred gate=device Token # 25: 113.225ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=draft=13349 prop=13349 olap pair=107.9ms serial=191.2ms gain=83.3ms ratio=0.44 s0=3.7ms s1=187.5ms wait=0.1/51.7ms pred gate=device Token # 26: 3.793ms; value: next_token_ids=tensor([3606], device='cuda:0') mtp accept=0 prop=13349 top1=3606 accp=0.006 next=pair draft=450 prop=450 pred gate=device Token # 27: 114.154ms; value: next_token_ids=tensor([450], device='cuda:0') mtp accept=1 prop=450 top1=450 accp=1.000 next=draft=3374 prop=3374 olap pair=108.0ms serial=190.5ms gain=82.6ms ratio=0.43 s0=6.6ms s1=184.0ms wait=0.2/48.4ms pred gate=device Token # 28: 4.773ms; value: next_token_ids=tensor([3374], device='cuda:0') mtp accept=1 prop=3374 top1=3374 accp=0.999 next=pair draft=66518 prop=66518 pred gate=device Token # 29: 113.348ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=draft=1237 prop=1237 olap pair=107.8ms serial=190.8ms gain=83.0ms ratio=0.43 s0=4.6ms s1=186.2ms wait=0.1/50.5ms pred gate=device Token # 30: 3.813ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=4532 prop=4532 pred gate=device Token # 31: 113.314ms; value: next_token_ids=tensor([4532], device='cuda:0') mtp accept=1 prop=4532 top1=4532 accp=1.000 next=draft=50294 prop=50294 olap pair=108.0ms serial=191.4ms gain=83.4ms ratio=0.44 s0=3.9ms s1=187.6ms wait=0.1/51.5ms pred gate=device Token # 32: 3.761ms; value: next_token_ids=tensor([50294], device='cuda:0') mtp accept=1 prop=50294 top1=50294 accp=1.000 next=pair draft=1478 prop=1478 pred gate=device Token # 33: 113.106ms; value: next_token_ids=tensor([1478], device='cuda:0') mtp accept=1 prop=1478 top1=1478 accp=1.000 next=draft=14 prop=14 olap pair=107.8ms serial=191.0ms gain=83.3ms ratio=0.44 s0=4.0ms s1=187.0ms wait=0.1/51.1ms pred gate=device Token # 34: 3.824ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=0.999 next=pair draft=43151 prop=43151 pred gate=device Token # 35: 113.510ms; value: next_token_ids=tensor([43151], device='cuda:0') mtp accept=1 prop=43151 top1=43151 accp=0.999 next=draft=13968 prop=13968 olap pair=108.2ms serial=191.7ms gain=83.5ms ratio=0.44 s0=4.2ms s1=187.5ms wait=0.1/50.7ms pred gate=device Token # 36: 3.763ms; value: next_token_ids=tensor([13968], device='cuda:0') mtp accept=1 prop=13968 top1=13968 accp=1.000 next=pair draft=2353 prop=2353 pred gate=device Token # 37: 113.487ms; value: next_token_ids=tensor([2353], device='cuda:0') mtp accept=1 prop=2353 top1=2353 accp=0.974 next=draft=1121 prop=1121 olap pair=108.2ms serial=191.7ms gain=83.4ms ratio=0.44 s0=4.5ms s1=187.1ms wait=0.1/50.5ms pred gate=device Token # 38: 3.777ms; value: next_token_ids=tensor([1121], device='cuda:0') mtp accept=1 prop=1121 top1=1121 accp=1.000 next=pair draft=66518 prop=66518 pred gate=device Token # 39: 113.327ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=draft=1237 prop=1237 olap pair=108.0ms serial=191.2ms gain=83.2ms ratio=0.44 s0=4.8ms s1=186.4ms wait=0.1/50.0ms pred gate=device Token # 40: 3.808ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=95975 prop=95975 pred gate=device Token # 41: 113.708ms; value: next_token_ids=tensor([95975], device='cuda:0') mtp accept=1 prop=95975 top1=95975 accp=1.000 next=draft=50294 prop=50294 olap pair=108.4ms serial=192.1ms gain=83.7ms ratio=0.44 s0=4.3ms s1=187.7ms wait=0.1/50.8ms pred gate=device Token # 42: 3.793ms; value: next_token_ids=tensor([50294], device='cuda:0') mtp accept=1 prop=50294 top1=50294 accp=1.000 next=pair draft=1478 prop=1478 pred gate=device Token # 43: 113.330ms; value: next_token_ids=tensor([1478], device='cuda:0') mtp accept=1 prop=1478 top1=1478 accp=1.000 next=draft=14 prop=14 olap pair=108.1ms serial=191.5ms gain=83.4ms ratio=0.44 s0=4.3ms s1=187.2ms wait=0.1/50.7ms pred gate=device Token # 44: 3.848ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=1.000 next=pair draft=48159 prop=48159 pred gate=device Token # 45: 113.523ms; value: next_token_ids=tensor([48159], device='cuda:0') mtp accept=1 prop=48159 top1=48159 accp=1.000 next=draft=13968 prop=13968 olap pair=108.3ms serial=191.9ms gain=83.6ms ratio=0.44 s0=4.3ms s1=187.7ms wait=0.1/50.8ms pred gate=device Token # 46: 3.804ms; value: next_token_ids=tensor([13968], device='cuda:0') mtp accept=1 prop=13968 top1=13968 accp=0.999 next=pair draft=34408 prop=34408 pred gate=device Token # 47: 113.637ms; value: next_token_ids=tensor([34408], device='cuda:0') mtp accept=1 prop=34408 top1=34408 accp=1.000 next=draft=1728 prop=1728 olap pair=108.4ms serial=192.0ms gain=83.6ms ratio=0.44 s0=4.3ms s1=187.7ms wait=0.1/50.6ms pred gate=device Token # 48: 3.863ms; value: next_token_ids=tensor([1728], device='cuda:0') mtp accept=1 prop=1728 top1=1728 accp=0.956 next=pair draft=66518 prop=66518 pred gate=device Token # 49: 113.364ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=draft=1237 prop=1237 olap pair=108.1ms serial=191.3ms gain=83.2ms ratio=0.44 s0=4.3ms s1=187.0ms wait=0.1/50.6ms pred gate=device Token # 50: 3.822ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=122492 prop=122492 pred gate=device Token # 51: 113.555ms; value: next_token_ids=tensor([122492], device='cuda:0') mtp accept=1 prop=122492 top1=122492 accp=1.000 next=draft=50294 prop=50294 olap pair=108.3ms serial=191.8ms gain=83.6ms ratio=0.44 s0=4.3ms s1=187.5ms wait=0.1/50.6ms pred gate=device Token # 52: 3.819ms; value: next_token_ids=tensor([50294], device='cuda:0') mtp accept=1 prop=50294 top1=50294 accp=1.000 next=pair draft=1478 prop=1478 pred gate=device Token # 53: 113.232ms; value: next_token_ids=tensor([1478], device='cuda:0') mtp accept=1 prop=1478 top1=1478 accp=1.000 next=draft=14 prop=14 olap pair=108.0ms serial=191.3ms gain=83.3ms ratio=0.44 s0=4.2ms s1=187.0ms wait=0.1/50.6ms pred gate=device Token # 54: 3.869ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=1.000 next=pair draft=32907 prop=32907 pred gate=device Token # 55: 113.373ms; value: next_token_ids=tensor([32907], device='cuda:0') mtp accept=1 prop=32907 top1=32907 accp=1.000 next=draft=1227 prop=1227 olap pair=108.1ms serial=191.6ms gain=83.4ms ratio=0.44 s0=4.2ms s1=187.3ms wait=0.1/50.7ms pred gate=device Token # 56: 3.815ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=1 prop=1227 top1=1227 accp=0.999 next=pair draft=548 prop=548 pred gate=device Token # 57: 114.058ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=1 prop=548 top1=548 accp=0.986 next=draft=22089 prop=22089 olap pair=108.8ms serial=191.0ms gain=82.1ms ratio=0.43 s0=4.5ms s1=186.4ms wait=0.1/50.3ms pred gate=device Token # 58: 3.762ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=0 prop=22089 top1=10756 accp=0.267 next=pair draft=66518 prop=66518 pred gate=device Token # 59: 113.755ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=0.999 next=draft=1237 prop=1237 olap pair=108.4ms serial=191.7ms gain=83.3ms ratio=0.43 s0=4.2ms s1=187.5ms wait=0.1/50.8ms pred gate=device Token # 60: 3.811ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=45324 prop=45324 pred gate=device Token # 61: 117.021ms; value: next_token_ids=tensor([45324], device='cuda:0') mtp accept=1 prop=45324 top1=45324 accp=1.000 next=draft=50294 prop=50294 olap pair=111.7ms serial=197.6ms gain=85.9ms ratio=0.43 s0=4.9ms s1=192.8ms wait=0.2/49.5ms pred gate=device Token # 62: 3.844ms; value: next_token_ids=tensor([50294], device='cuda:0') mtp accept=1 prop=50294 top1=50294 accp=1.000 next=pair draft=1478 prop=1478 pred gate=device Token # 63: 113.396ms; value: next_token_ids=tensor([1478], device='cuda:0') mtp accept=1 prop=1478 top1=1478 accp=1.000 next=draft=14 prop=14 olap pair=108.1ms serial=191.7ms gain=83.6ms ratio=0.44 s0=4.3ms s1=187.4ms wait=0.1/51.0ms pred gate=device Token # 64: 3.875ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=1.000 next=pair draft=21425 prop=21425 pred gate=device Token # 65: 113.782ms; value: next_token_ids=tensor([21425], device='cuda:0') mtp accept=1 prop=21425 top1=21425 accp=1.000 next=draft=14164 prop=14164 olap pair=108.5ms serial=192.3ms gain=83.8ms ratio=0.44 s0=4.2ms s1=188.1ms wait=0.1/50.9ms pred gate=device Token # 66: 3.803ms; value: next_token_ids=tensor([14164], device='cuda:0') mtp accept=1 prop=14164 top1=14164 accp=1.000 next=pair draft=2920 prop=2920 pred gate=device Token # 67: 113.623ms; value: next_token_ids=tensor([10352], device='cuda:0') mtp accept=0 prop=2920 top1=10352 accp=0.213 next=draft=47507 prop=47507 olap pair=108.3ms serial=192.0ms gain=83.7ms ratio=0.44 s0=4.3ms s1=187.7ms wait=0.1/50.8ms pred gate=device Token # 68: 113.887ms; value: next_token_ids=tensor([47507], device='cuda:0') mtp accept=1 prop=47507 top1=47507 accp=0.854 next=draft=12201 prop=12201 olap pair=108.5ms serial=192.1ms gain=83.6ms ratio=0.44 s0=4.3ms s1=187.8ms wait=0.1/50.5ms pred gate=device Token # 69: 3.843ms; value: next_token_ids=tensor([12201], device='cuda:0') mtp accept=1 prop=12201 top1=12201 accp=0.999 next=pair draft=9068 prop=9068 pred gate=device Token # 70: 113.522ms; value: next_token_ids=tensor([9068], device='cuda:0') mtp accept=1 prop=9068 top1=9068 accp=0.994 next=draft=75310 prop=75310 olap pair=108.1ms serial=191.6ms gain=83.4ms ratio=0.44 s0=4.2ms s1=187.3ms wait=0.1/50.5ms pred gate=device Token # 71: 3.793ms; value: next_token_ids=tensor([2920], device='cuda:0') mtp accept=0 prop=75310 top1=2920 accp=0.057 next=pair draft=9574 prop=9574 pred gate=device Token # 72: 113.961ms; value: next_token_ids=tensor([9574], device='cuda:0') mtp accept=1 prop=9574 top1=9574 accp=0.743 next=draft=303 prop=303 olap pair=108.6ms serial=192.4ms gain=83.8ms ratio=0.44 s0=4.2ms s1=188.1ms wait=0.1/50.4ms pred gate=device Token # 73: 3.820ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=1380 prop=1380 pred gate=device Token # 74: 113.623ms; value: next_token_ids=tensor([1380], device='cuda:0') mtp accept=1 prop=1380 top1=1380 accp=1.000 next=draft=10051 prop=10051 olap pair=108.3ms serial=191.9ms gain=83.6ms ratio=0.44 s0=4.2ms s1=187.6ms wait=0.1/50.7ms pred gate=device Token # 75: 3.800ms; value: next_token_ids=tensor([10051], device='cuda:0') mtp accept=1 prop=10051 top1=10051 accp=0.986 next=pair draft=445 prop=445 pred gate=device Token # 76: 113.816ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=1 prop=445 top1=445 accp=0.987 next=draft=81143 prop=81143 olap pair=108.5ms serial=192.2ms gain=83.7ms ratio=0.44 s0=4.2ms s1=188.0ms wait=0.1/50.6ms pred gate=device Token # 77: 3.761ms; value: next_token_ids=tensor([4383], device='cuda:0') mtp accept=0 prop=81143 top1=4383 accp=0.395 next=pair draft=4398 prop=114710 pred gate=device Token # 78: 113.720ms; value: next_token_ids=tensor([12052], device='cuda:0') mtp accept=0 prop=114710 top1=4398 accp=0.530 next=draft=1237 prop=1237 olap pair=108.4ms serial=192.3ms gain=83.9ms ratio=0.44 s0=3.7ms s1=188.7ms wait=0.1/51.9ms pred gate=device Token # 79: 114.218ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=draft=81143 prop=81143 olap pair=108.8ms serial=193.1ms gain=84.3ms ratio=0.44 s0=3.6ms s1=189.5ms wait=0.1/51.8ms pred gate=device Token # 80: 3.808ms; value: next_token_ids=tensor([81143], device='cuda:0') mtp accept=1 prop=81143 top1=81143 accp=0.957 next=pair draft=3701 prop=3701 pred gate=device Token # 81: 113.608ms; value: next_token_ids=tensor([3701], device='cuda:0') mtp accept=1 prop=3701 top1=3701 accp=1.000 next=draft=1227 prop=1227 olap pair=108.3ms serial=192.1ms gain=83.8ms ratio=0.44 s0=3.6ms s1=188.5ms wait=0.1/51.8ms pred gate=device Token # 82: 3.856ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=0 prop=1227 top1=1227 accp=0.538 next=pair draft=291 prop=291 pred gate=device Token # 83: 113.480ms; value: next_token_ids=tensor([291], device='cuda:0') mtp accept=1 prop=291 top1=291 accp=1.000 next=draft=11528 prop=11528 olap pair=108.1ms serial=191.7ms gain=83.6ms ratio=0.44 s0=4.0ms s1=187.7ms wait=0.1/51.2ms pred gate=device Token # 84: 3.763ms; value: next_token_ids=tensor([11528], device='cuda:0') mtp accept=1 prop=11528 top1=11528 accp=1.000 next=pair draft=1227 prop=1227 pred gate=device Token # 85: 113.677ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=1 prop=1227 top1=1227 accp=1.000 next=draft=545 prop=545 olap pair=108.3ms serial=192.1ms gain=83.7ms ratio=0.44 s0=3.9ms s1=188.2ms wait=0.1/51.3ms pred gate=device Token # 86: 3.834ms; value: next_token_ids=tensor([545], device='cuda:0') mtp accept=1 prop=545 top1=545 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 87: 114.707ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=15121 prop=15121 olap pair=108.5ms serial=191.7ms gain=83.2ms ratio=0.43 s0=4.6ms s1=187.1ms wait=0.1/50.8ms pred gate=device Token # 88: 4.791ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=0 prop=15121 top1=301 accp=0.300 next=pair draft=36101 prop=36101 pred gate=device Token # 89: 114.369ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=0.971 next=draft=2833 prop=2833 olap pair=108.8ms serial=193.0ms gain=84.1ms ratio=0.44 s0=3.9ms s1=189.0ms wait=0.1/51.6ms pred gate=device Token # 90: 3.794ms; value: next_token_ids=tensor([17520], device='cuda:0') mtp accept=0 prop=2833 top1=17520 accp=0.142 next=pair draft=713 prop=713 pred gate=device Token # 91: 113.690ms; value: next_token_ids=tensor([713], device='cuda:0') mtp accept=1 prop=713 top1=713 accp=0.987 next=draft=5402 prop=5402 olap pair=108.3ms serial=191.8ms gain=83.5ms ratio=0.44 s0=4.1ms s1=187.7ms wait=0.1/50.9ms pred gate=device Token # 92: 3.821ms; value: next_token_ids=tensor([28310], device='cuda:0') mtp accept=0 prop=5402 top1=303 accp=0.276 next=pair draft=18580 prop=18580 pred gate=device Token # 93: 114.084ms; value: next_token_ids=tensor([18580], device='cuda:0') mtp accept=1 prop=18580 top1=18580 accp=1.000 next=draft=11846 prop=11846 olap pair=108.7ms serial=192.7ms gain=84.0ms ratio=0.44 s0=4.3ms s1=188.4ms wait=0.1/50.5ms pred gate=device Token # 94: 3.838ms; value: next_token_ids=tensor([946], device='cuda:0') mtp accept=0 prop=11846 top1=946 accp=0.304 next=pair draft=478 prop=478 pred gate=device Token # 95: 113.935ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=draft=7346 prop=7346 olap pair=108.6ms serial=192.5ms gain=83.9ms ratio=0.44 s0=3.7ms s1=188.9ms wait=0.1/51.8ms pred gate=device Token # 96: 3.762ms; value: next_token_ids=tensor([372], device='cuda:0') mtp accept=0 prop=7346 top1=372 accp=0.048 next=pair draft=223 prop=223 pred gate=device Token # 97: 113.533ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=19 prop=19 olap pair=108.2ms serial=191.8ms gain=83.6ms ratio=0.44 s0=3.6ms s1=188.2ms wait=0.1/51.8ms pred gate=device Token # 98: 3.807ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 99: 115.776ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=223 prop=223 olap pair=108.2ms serial=191.6ms gain=83.4ms ratio=0.44 s0=3.9ms s1=187.7ms wait=0.1/51.5ms pred gate=device Token # 100: 3.801ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=3374 prop=3374 pred gate=device Token # 101: 114.018ms; value: next_token_ids=tensor([3374], device='cuda:0') mtp accept=1 prop=3374 top1=3374 accp=0.609 next=draft=66518 prop=66518 olap pair=108.6ms serial=192.5ms gain=83.9ms ratio=0.44 s0=4.3ms s1=188.2ms wait=0.1/50.7ms pred gate=device Token # 102: 3.801ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=pair draft=1237 prop=1237 pred gate=device Token # 103: 114.378ms; value: next_token_ids=tensor([343], device='cuda:0') mtp accept=0 prop=1237 top1=1237 accp=0.987 next=draft=9422 prop=9422 olap pair=109.0ms serial=193.4ms gain=84.3ms ratio=0.44 s0=4.1ms s1=189.3ms wait=0.1/51.0ms pred gate=device Token # 104: 114.280ms; value: next_token_ids=tensor([9422], device='cuda:0') mtp accept=1 prop=9422 top1=9422 accp=0.584 next=draft=14 prop=14 olap pair=108.9ms serial=192.8ms gain=84.0ms ratio=0.44 s0=4.4ms s1=188.4ms wait=0.1/50.3ms pred gate=device Token # 105: 3.790ms; value: next_token_ids=tensor([682], device='cuda:0') mtp accept=0 prop=14 top1=682 accp=0.233 next=pair draft=15 prop=15 pred gate=device Token # 106: 113.269ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=0.997 next=draft=2619 prop=2619 olap pair=108.0ms serial=191.2ms gain=83.2ms ratio=0.44 s0=4.8ms s1=186.4ms wait=0.1/47.6ms pred gate=device Token # 107: 3.728ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.970 next=pair draft=14087 prop=14087 pred gate=device Token # 108: 113.200ms; value: next_token_ids=tensor([14087], device='cuda:0') mtp accept=1 prop=14087 top1=14087 accp=0.999 next=draft=666 prop=666 olap pair=107.9ms serial=191.3ms gain=83.4ms ratio=0.44 s0=4.7ms s1=186.6ms wait=0.2/44.7ms pred gate=device Token # 109: 3.781ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=pair draft=768 prop=768 pred gate=device Token # 110: 113.465ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=1275 prop=1275 olap pair=108.3ms serial=192.0ms gain=83.7ms ratio=0.44 s0=4.5ms s1=187.4ms wait=0.1/44.8ms pred gate=device Token # 111: 3.723ms; value: next_token_ids=tensor([1275], device='cuda:0') mtp accept=1 prop=1275 top1=1275 accp=1.000 next=pair draft=3374 prop=8842 pred gate=device Token # 112: 113.299ms; value: next_token_ids=tensor([10172], device='cuda:0') mtp accept=0 prop=8842 top1=23787 accp=0.009 next=draft=3374 prop=3374 olap pair=108.1ms serial=191.8ms gain=83.6ms ratio=0.44 s0=4.5ms s1=187.3ms wait=0.1/45.0ms pred gate=device Token # 113: 113.443ms; value: next_token_ids=tensor([3374], device='cuda:0') mtp accept=1 prop=3374 top1=3374 accp=1.000 next=draft=32041 prop=32041 olap pair=108.1ms serial=191.8ms gain=83.7ms ratio=0.44 s0=4.2ms s1=187.6ms wait=0.1/45.0ms pred gate=device Token # 114: 3.686ms; value: next_token_ids=tensor([31446], device='cuda:0') mtp accept=0 prop=32041 top1=24495 accp=0.111 next=pair draft=621 prop=743 pred gate=device Token # 115: 113.101ms; value: next_token_ids=tensor([743], device='cuda:0') mtp accept=1 prop=743 top1=743 accp=0.477 next=draft=13097 prop=13097 olap pair=107.8ms serial=191.1ms gain=83.3ms ratio=0.44 s0=4.2ms s1=186.9ms wait=0.1/45.3ms pred gate=device Token # 116: 3.730ms; value: next_token_ids=tensor([13097], device='cuda:0') mtp accept=1 prop=13097 top1=13097 accp=0.999 next=pair draft=844 prop=3343 pred gate=device Token # 117: 113.115ms; value: next_token_ids=tensor([760], device='cuda:0') mtp accept=0 prop=3343 top1=844 accp=0.488 next=draft=2089 prop=2089 olap pair=107.9ms serial=191.4ms gain=83.5ms ratio=0.44 s0=4.2ms s1=187.2ms wait=0.1/45.3ms pred gate=device Token # 118: 113.913ms; value: next_token_ids=tensor([2089], device='cuda:0') mtp accept=1 prop=2089 top1=2089 accp=0.991 next=draft=303 prop=303 olap pair=108.6ms serial=192.7ms gain=84.1ms ratio=0.44 s0=4.4ms s1=188.4ms wait=0.1/45.0ms pred gate=device Token # 119: 3.725ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.998 next=pair draft=7849 prop=7849 pred gate=device Token # 120: 113.566ms; value: next_token_ids=tensor([7849], device='cuda:0') mtp accept=1 prop=7849 top1=7849 accp=1.000 next=draft=760 prop=760 olap pair=108.4ms serial=192.2ms gain=83.8ms ratio=0.44 s0=4.7ms s1=187.5ms wait=0.1/44.6ms pred gate=device Token # 121: 3.671ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=0 prop=760 top1=6034 accp=0.003 next=pair draft=572 prop=1237 pred gate=device Token # 122: 113.700ms; value: next_token_ids=tensor([572], device='cuda:0') mtp accept=0 prop=1237 top1=572 accp=0.859 next=draft=7712 prop=11513 olap pair=108.4ms serial=192.3ms gain=83.9ms ratio=0.44 s0=4.4ms s1=188.0ms wait=0.1/44.8ms pred gate=device Token # 123: 113.550ms; value: next_token_ids=tensor([11513], device='cuda:0') mtp accept=1 prop=11513 top1=7712 accp=0.607 next=draft=30869 prop=30869 olap pair=108.2ms serial=191.9ms gain=83.7ms ratio=0.44 s0=4.7ms s1=187.2ms wait=0.1/44.3ms pred gate=device Token # 124: 3.782ms; value: next_token_ids=tensor([30869], device='cuda:0') mtp accept=1 prop=30869 top1=30869 accp=0.989 next=pair draft=8842 prop=8842 pred gate=device Token # 125: 113.514ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=1.000 next=draft=52600 prop=52600 olap pair=108.2ms serial=192.1ms gain=83.8ms ratio=0.44 s0=4.2ms s1=187.8ms wait=0.1/45.4ms pred gate=device Token # 126: 3.783ms; value: next_token_ids=tensor([52600], device='cuda:0') mtp accept=1 prop=52600 top1=52600 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 127: 113.254ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=9968 prop=9968 olap pair=108.0ms serial=191.4ms gain=83.4ms ratio=0.44 s0=4.7ms s1=186.7ms wait=0.1/44.3ms pred gate=device Token # 128: 3.711ms; value: next_token_ids=tensor([17349], device='cuda:0') mtp accept=0 prop=9968 top1=17349 accp=0.043 next=pair draft=4398 prop=4398 pred gate=device Token # 129: 113.785ms; value: next_token_ids=tensor([4398], device='cuda:0') mtp accept=1 prop=4398 top1=4398 accp=0.996 next=draft=7557 prop=1057 olap pair=108.5ms serial=192.4ms gain=83.9ms ratio=0.44 s0=4.8ms s1=187.7ms wait=0.1/44.1ms pred gate=device Token # 130: 3.697ms; value: next_token_ids=tensor([1057], device='cuda:0') mtp accept=1 prop=1057 top1=23945 accp=0.413 next=pair draft=760 prop=760 pred gate=device Token # 131: 113.882ms; value: next_token_ids=tensor([760], device='cuda:0') mtp accept=1 prop=760 top1=760 accp=0.987 next=draft=2089 prop=2089 olap pair=108.6ms serial=192.5ms gain=83.9ms ratio=0.44 s0=4.7ms s1=187.7ms wait=0.1/44.1ms pred gate=device Token # 132: 3.752ms; value: next_token_ids=tensor([2089], device='cuda:0') mtp accept=1 prop=2089 top1=2089 accp=0.999 next=pair draft=19661 prop=19661 pred gate=device Token # 133: 113.865ms; value: next_token_ids=tensor([19661], device='cuda:0') mtp accept=1 prop=19661 top1=19661 accp=0.993 next=draft=303 prop=303 olap pair=108.6ms serial=192.6ms gain=84.0ms ratio=0.44 s0=4.8ms s1=187.9ms wait=0.1/44.3ms pred gate=device Token # 134: 3.729ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=4272 prop=4272 pred gate=device Token # 135: 113.331ms; value: next_token_ids=tensor([4272], device='cuda:0') mtp accept=1 prop=4272 top1=4272 accp=1.000 next=draft=2490 prop=4339 olap pair=108.1ms serial=191.7ms gain=83.6ms ratio=0.44 s0=4.7ms s1=187.0ms wait=0.1/44.6ms pred gate=device Token # 136: 3.746ms; value: next_token_ids=tensor([22710], device='cuda:0') mtp accept=0 prop=4339 top1=22710 accp=0.278 next=pair draft=59563 prop=59563 pred gate=device Token # 137: 113.580ms; value: next_token_ids=tensor([59563], device='cuda:0') mtp accept=1 prop=59563 top1=59563 accp=1.000 next=draft=876 prop=876 olap pair=108.3ms serial=192.1ms gain=83.7ms ratio=0.44 s0=4.7ms s1=187.4ms wait=0.1/44.2ms pred gate=device Token # 138: 3.666ms; value: next_token_ids=tensor([876], device='cuda:0') mtp accept=1 prop=876 top1=876 accp=0.909 next=pair draft=15 prop=15 pred gate=device Token # 139: 113.272ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=2619 prop=2619 olap pair=108.1ms serial=191.7ms gain=83.6ms ratio=0.44 s0=4.2ms s1=187.5ms wait=0.1/45.2ms pred gate=device Token # 140: 3.711ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.966 next=pair draft=36101 prop=36101 pred gate=device Token # 141: 113.699ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=1.000 next=draft=666 prop=666 olap pair=108.5ms serial=192.5ms gain=84.0ms ratio=0.44 s0=4.3ms s1=188.2ms wait=0.1/45.0ms pred gate=device Token # 142: 3.736ms; value: next_token_ids=tensor([17520], device='cuda:0') mtp accept=0 prop=666 top1=17520 accp=0.023 next=pair draft=666 prop=666 pred gate=device Token # 143: 113.580ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=768 prop=768 olap pair=108.3ms serial=192.0ms gain=83.7ms ratio=0.44 s0=4.2ms s1=187.8ms wait=0.1/45.1ms pred gate=device Token # 144: 3.683ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=445 prop=445 pred gate=device Token # 145: 113.546ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=1 prop=445 top1=445 accp=1.000 next=draft=36101 prop=36101 olap pair=108.2ms serial=192.0ms gain=83.8ms ratio=0.44 s0=4.2ms s1=187.8ms wait=0.1/45.2ms pred gate=device Token # 146: 3.702ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=1.000 next=pair draft=625 prop=625 pred gate=device Token # 147: 113.370ms; value: next_token_ids=tensor([625], device='cuda:0') mtp accept=1 prop=625 top1=625 accp=0.927 next=draft=303 prop=303 olap pair=108.0ms serial=191.5ms gain=83.5ms ratio=0.44 s0=4.5ms s1=187.0ms wait=0.1/44.5ms pred gate=device Token # 148: 3.719ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=2204 prop=2204 pred gate=device Token # 149: 113.413ms; value: next_token_ids=tensor([2204], device='cuda:0') mtp accept=1 prop=2204 top1=2204 accp=0.988 next=draft=2382 prop=2541 olap pair=108.3ms serial=191.9ms gain=83.7ms ratio=0.44 s0=4.7ms s1=187.2ms wait=0.1/44.3ms pred gate=device Token # 150: 3.756ms; value: next_token_ids=tensor([4383], device='cuda:0') mtp accept=0 prop=2541 top1=4383 accp=0.004 next=pair draft=12052 prop=12052 pred gate=device Token # 151: 113.935ms; value: next_token_ids=tensor([12052], device='cuda:0') mtp accept=1 prop=12052 top1=12052 accp=1.000 next=draft=16913 prop=16913 olap pair=108.6ms serial=192.3ms gain=83.7ms ratio=0.44 s0=4.0ms s1=188.4ms wait=0.1/46.2ms pred gate=device Token # 152: 3.739ms; value: next_token_ids=tensor([16913], device='cuda:0') mtp accept=1 prop=16913 top1=16913 accp=0.994 next=pair draft=19 prop=19 pred gate=device Token # 153: 113.982ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=303 prop=303 olap pair=108.8ms serial=193.1ms gain=84.3ms ratio=0.44 s0=3.8ms s1=189.3ms wait=0.1/46.0ms pred gate=device Token # 154: 3.783ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.990 next=pair draft=41540 prop=41540 pred gate=device Token # 155: 113.786ms; value: next_token_ids=tensor([41540], device='cuda:0') mtp accept=1 prop=41540 top1=41540 accp=0.895 next=draft=3374 prop=3374 olap pair=108.6ms serial=192.6ms gain=84.0ms ratio=0.44 s0=3.9ms s1=188.7ms wait=0.1/46.2ms pred gate=device Token # 156: 3.714ms; value: next_token_ids=tensor([7557], device='cuda:0') mtp accept=0 prop=3374 top1=7557 accp=0.016 next=pair draft=8979 prop=8979 pred gate=device Token # 157: 113.732ms; value: next_token_ids=tensor([3374], device='cuda:0') mtp accept=0 prop=8979 top1=3374 accp=0.066 next=draft=25024 prop=25024 olap pair=108.5ms serial=192.4ms gain=83.9ms ratio=0.44 s0=3.9ms s1=188.5ms wait=0.1/46.0ms pred gate=device Token # 158: 114.197ms; value: next_token_ids=tensor([25024], device='cuda:0') mtp accept=1 prop=25024 top1=25024 accp=1.000 next=draft=119545 prop=119545 olap pair=108.9ms serial=193.4ms gain=84.4ms ratio=0.44 s0=3.9ms s1=189.4ms wait=0.1/46.0ms pred gate=device Token # 159: 3.729ms; value: next_token_ids=tensor([91013], device='cuda:0') mtp accept=0 prop=119545 top1=91013 accp=0.276 next=pair draft=621 prop=621 pred gate=device Token # 160: 114.006ms; value: next_token_ids=tensor([621], device='cuda:0') mtp accept=1 prop=621 top1=621 accp=1.000 next=draft=7557 prop=7557 olap pair=108.6ms serial=192.5ms gain=83.9ms ratio=0.44 s0=4.2ms s1=188.3ms wait=0.1/45.2ms pred gate=device Token # 161: 3.746ms; value: next_token_ids=tensor([3007], device='cuda:0') mtp accept=0 prop=7557 top1=7557 accp=0.774 next=pair draft=6034 prop=6034 pred gate=device Token # 162: 113.773ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=draft=572 prop=572 olap pair=108.5ms serial=192.3ms gain=83.8ms ratio=0.44 s0=4.1ms s1=188.2ms wait=0.1/45.3ms pred gate=device Token # 163: 3.682ms; value: next_token_ids=tensor([572], device='cuda:0') mtp accept=1 prop=572 top1=572 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 164: 113.754ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.938 next=draft=7849 prop=7849 olap pair=108.5ms serial=192.7ms gain=84.2ms ratio=0.44 s0=3.6ms s1=189.1ms wait=0.1/46.5ms pred gate=device Token # 165: 3.771ms; value: next_token_ids=tensor([7849], device='cuda:0') mtp accept=1 prop=7849 top1=7849 accp=0.999 next=pair draft=6034 prop=6034 pred gate=device Token # 166: 113.685ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=draft=9968 prop=9968 olap pair=108.4ms serial=192.1ms gain=83.7ms ratio=0.44 s0=4.1ms s1=188.0ms wait=0.1/46.0ms pred gate=device Token # 167: 4.361ms; value: next_token_ids=tensor([9968], device='cuda:0') mtp accept=1 prop=9968 top1=9968 accp=0.926 next=pair draft=4339 prop=4339 pred gate=device Token # 168: 114.084ms; value: next_token_ids=tensor([1824], device='cuda:0') mtp accept=0 prop=4339 top1=1824 accp=0.007 next=draft=974 prop=974 olap pair=108.6ms serial=192.4ms gain=83.8ms ratio=0.44 s0=4.9ms s1=187.5ms wait=0.1/45.3ms pred gate=device Token # 169: 113.969ms; value: next_token_ids=tensor([974], device='cuda:0') mtp accept=1 prop=974 top1=974 accp=1.000 next=draft=1427 prop=1427 olap pair=108.6ms serial=192.8ms gain=84.2ms ratio=0.44 s0=3.7ms s1=189.1ms wait=0.1/46.3ms pred gate=device Token # 170: 3.731ms; value: next_token_ids=tensor([1427], device='cuda:0') mtp accept=1 prop=1427 top1=1427 accp=1.000 next=pair draft=13062 prop=13062 pred gate=device Token # 171: 113.959ms; value: next_token_ids=tensor([13062], device='cuda:0') mtp accept=1 prop=13062 top1=13062 accp=1.000 next=draft=303 prop=303 olap pair=108.7ms serial=193.0ms gain=84.3ms ratio=0.44 s0=3.9ms s1=189.0ms wait=0.1/45.9ms pred gate=device Token # 172: 3.728ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=0 prop=303 top1=320 accp=0.037 next=pair draft=2803 prop=2803 pred gate=device Token # 173: 113.736ms; value: next_token_ids=tensor([2803], device='cuda:0') mtp accept=1 prop=2803 top1=2803 accp=0.948 next=draft=303 prop=303 olap pair=108.5ms serial=192.5ms gain=84.0ms ratio=0.44 s0=4.3ms s1=188.2ms wait=0.1/45.2ms pred gate=device Token # 174: 3.684ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=1140 prop=1140 pred gate=device Token # 175: 113.913ms; value: next_token_ids=tensor([1140], device='cuda:0') mtp accept=1 prop=1140 top1=1140 accp=0.993 next=draft=2382 prop=2382 olap pair=108.6ms serial=192.3ms gain=83.6ms ratio=0.43 s0=6.8ms s1=185.5ms wait=0.2/42.8ms pred gate=device Token # 176: 3.709ms; value: next_token_ids=tensor([4383], device='cuda:0') mtp accept=0 prop=2382 top1=4383 accp=0.513 next=pair draft=12052 prop=12052 pred gate=device Token # 177: 113.571ms; value: next_token_ids=tensor([114710], device='cuda:0') mtp accept=0 prop=12052 top1=114710 accp=0.078 next=draft=19 prop=19 olap pair=108.2ms serial=192.1ms gain=83.9ms ratio=0.44 s0=3.7ms s1=188.3ms wait=0.1/46.3ms pred gate=device Token # 178: 114.411ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=625 prop=625 olap pair=109.1ms serial=193.6ms gain=84.5ms ratio=0.44 s0=4.1ms s1=189.4ms wait=0.1/45.4ms pred gate=device Token # 179: 3.763ms; value: next_token_ids=tensor([625], device='cuda:0') mtp accept=1 prop=625 top1=625 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 180: 114.130ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.995 next=draft=3374 prop=3374 olap pair=108.9ms serial=193.3ms gain=84.4ms ratio=0.44 s0=4.1ms s1=189.1ms wait=0.1/45.5ms pred gate=device Token # 181: 3.793ms; value: next_token_ids=tensor([3374], device='cuda:0') mtp accept=1 prop=3374 top1=3374 accp=0.654 next=pair draft=66518 prop=66518 pred gate=device Token # 182: 114.228ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=draft=6525 prop=6525 olap pair=109.0ms serial=193.4ms gain=84.5ms ratio=0.44 s0=3.7ms s1=189.7ms wait=0.1/46.4ms pred gate=device Token # 183: 3.735ms; value: next_token_ids=tensor([6525], device='cuda:0') mtp accept=1 prop=6525 top1=6525 accp=0.802 next=pair draft=2541 prop=8649 pred gate=device Token # 184: 114.402ms; value: next_token_ids=tensor([3796], device='cuda:0') mtp accept=0 prop=8649 top1=79008 accp=0.197 next=draft=9567 prop=9567 olap pair=109.2ms serial=194.1ms gain=84.8ms ratio=0.44 s0=3.6ms s1=190.4ms wait=0.1/46.5ms pred gate=device Token # 185: 114.667ms; value: next_token_ids=tensor([5293], device='cuda:0') mtp accept=0 prop=9567 top1=5293 accp=0.120 next=draft=18617 prop=18617 olap pair=109.3ms serial=194.3ms gain=85.0ms ratio=0.44 s0=3.8ms s1=190.5ms wait=0.1/46.3ms pred gate=device Token # 186: 114.436ms; value: next_token_ids=tensor([18617], device='cuda:0') mtp accept=1 prop=18617 top1=18617 accp=0.999 next=draft=303 prop=303 olap pair=109.1ms serial=193.3ms gain=84.3ms ratio=0.44 s0=5.8ms s1=187.5ms wait=0.2/43.9ms pred gate=device Token # 187: 3.755ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=2524 prop=2524 pred gate=device Token # 188: 115.013ms; value: next_token_ids=tensor([2524], device='cuda:0') mtp accept=1 prop=2524 top1=2524 accp=1.000 next=draft=40092 prop=40092 olap pair=109.7ms serial=195.0ms gain=85.3ms ratio=0.44 s0=3.7ms s1=191.2ms wait=0.1/46.2ms pred gate=device Token # 189: 3.875ms; value: next_token_ids=tensor([40092], device='cuda:0') mtp accept=1 prop=40092 top1=40092 accp=0.976 next=pair draft=3374 prop=3374 pred gate=device Token # 190: 114.900ms; value: next_token_ids=tensor([25024], device='cuda:0') mtp accept=0 prop=3374 top1=25024 accp=0.049 next=draft=2386 prop=2386 olap pair=109.6ms serial=194.8ms gain=85.2ms ratio=0.44 s0=4.2ms s1=190.6ms wait=0.1/45.2ms pred gate=device Token # 191: 115.033ms; value: next_token_ids=tensor([2386], device='cuda:0') mtp accept=1 prop=2386 top1=2386 accp=0.989 next=draft=1415 prop=1415 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.5ms s1=190.3ms wait=0.1/44.8ms pred gate=device Token # 192: 3.746ms; value: next_token_ids=tensor([4398], device='cuda:0') mtp accept=0 prop=1415 top1=4398 accp=0.028 next=pair draft=303 prop=303 pred gate=device Token # 193: 114.413ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.967 next=draft=6525 prop=6525 olap pair=109.1ms serial=193.7ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.5ms wait=0.1/45.3ms pred gate=device Token # 194: 3.694ms; value: next_token_ids=tensor([6525], device='cuda:0') mtp accept=1 prop=6525 top1=6525 accp=0.998 next=pair draft=3437 prop=116037 pred gate=device Token # 195: 114.561ms; value: next_token_ids=tensor([31446], device='cuda:0') mtp accept=0 prop=116037 top1=31446 accp=0.037 next=draft=320 prop=320 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.0ms wait=0.1/45.2ms pred gate=device Token # 196: 114.287ms; value: next_token_ids=tensor([621], device='cuda:0') mtp accept=0 prop=320 top1=621 accp=0.280 next=draft=13097 prop=13097 olap pair=109.0ms serial=193.4ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.2ms wait=0.1/45.1ms pred gate=device Token # 197: 114.222ms; value: next_token_ids=tensor([13097], device='cuda:0') mtp accept=1 prop=13097 top1=13097 accp=0.818 next=draft=6034 prop=6034 olap pair=108.8ms serial=193.1ms gain=84.3ms ratio=0.44 s0=4.2ms s1=188.9ms wait=0.1/45.3ms pred gate=device Token # 198: 3.733ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=pair draft=572 prop=572 pred gate=device Token # 199: 114.188ms; value: next_token_ids=tensor([572], device='cuda:0') mtp accept=1 prop=572 top1=572 accp=1.000 next=draft=320 prop=320 olap pair=108.9ms serial=193.4ms gain=84.4ms ratio=0.44 s0=4.2ms s1=189.2ms wait=0.1/45.2ms pred gate=device Token # 200: 3.776ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.998 next=pair draft=4009 prop=4009 pred gate=device Token # 201: 113.979ms; value: next_token_ids=tensor([4009], device='cuda:0') mtp accept=1 prop=4009 top1=4009 accp=0.694 next=draft=303 prop=303 olap pair=108.7ms serial=193.1ms gain=84.4ms ratio=0.44 s0=3.7ms s1=189.4ms wait=0.1/46.4ms pred gate=device Token # 202: 3.832ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=666 prop=445 pred gate=device Token # 203: 114.649ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=1 prop=445 top1=666 accp=0.959 next=draft=2382 prop=2382 olap pair=109.4ms serial=194.1ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.8ms wait=0.1/45.1ms pred gate=device Token # 204: 3.717ms; value: next_token_ids=tensor([2382], device='cuda:0') mtp accept=1 prop=2382 top1=2382 accp=1.000 next=pair draft=92 prop=92 pred gate=device Token # 205: 114.068ms; value: next_token_ids=tensor([92], device='cuda:0') mtp accept=1 prop=92 top1=92 accp=1.000 next=draft=31 prop=31 olap pair=108.7ms serial=192.9ms gain=84.2ms ratio=0.44 s0=4.3ms s1=188.6ms wait=0.1/45.5ms pred gate=device Token # 206: 3.781ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 207: 113.662ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=15121 prop=15121 olap pair=108.4ms serial=192.3ms gain=83.9ms ratio=0.44 s0=4.8ms s1=187.5ms wait=0.1/44.3ms pred gate=device Token # 208: 3.811ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=0 prop=15121 top1=301 accp=0.229 next=pair draft=36101 prop=36101 pred gate=device Token # 209: 114.547ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=1.000 next=draft=525 prop=525 olap pair=109.2ms serial=193.9ms gain=84.7ms ratio=0.44 s0=4.7ms s1=189.2ms wait=0.1/44.5ms pred gate=device Token # 210: 3.783ms; value: next_token_ids=tensor([525], device='cuda:0') mtp accept=1 prop=525 top1=525 accp=0.940 next=pair draft=303 prop=303 pred gate=device Token # 211: 114.510ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=9422 prop=3374 olap pair=109.2ms serial=193.9ms gain=84.7ms ratio=0.44 s0=4.7ms s1=189.2ms wait=0.1/44.4ms pred gate=device Token # 212: 3.726ms; value: next_token_ids=tensor([3374], device='cuda:0') mtp accept=1 prop=3374 top1=9422 accp=0.733 next=pair draft=66518 prop=66518 pred gate=device Token # 213: 113.845ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=draft=422 prop=422 olap pair=108.6ms serial=192.6ms gain=84.0ms ratio=0.44 s0=4.4ms s1=188.1ms wait=0.1/45.0ms pred gate=device Token # 214: 3.729ms; value: next_token_ids=tensor([422], device='cuda:0') mtp accept=1 prop=422 top1=422 accp=0.950 next=pair draft=18580 prop=18580 pred gate=device Token # 215: 114.429ms; value: next_token_ids=tensor([18580], device='cuda:0') mtp accept=1 prop=18580 top1=18580 accp=1.000 next=draft=478 prop=478 olap pair=109.2ms serial=193.9ms gain=84.7ms ratio=0.44 s0=4.4ms s1=189.5ms wait=0.1/44.8ms pred gate=device Token # 216: 3.770ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=pair draft=372 prop=372 pred gate=device Token # 217: 113.833ms; value: next_token_ids=tensor([372], device='cuda:0') mtp accept=1 prop=372 top1=372 accp=1.000 next=draft=223 prop=223 olap pair=108.6ms serial=192.6ms gain=84.0ms ratio=0.44 s0=4.7ms s1=187.9ms wait=0.1/44.8ms pred gate=device Token # 218: 3.752ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 219: 114.675ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=16 prop=16 olap pair=109.5ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.6ms s1=189.8ms wait=0.1/45.0ms pred gate=device Token # 220: 3.715ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 221: 114.011ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=2353 prop=2353 olap pair=108.8ms serial=193.1ms gain=84.3ms ratio=0.44 s0=4.7ms s1=188.4ms wait=0.1/44.6ms pred gate=device Token # 222: 3.725ms; value: next_token_ids=tensor([2353], device='cuda:0') mtp accept=1 prop=2353 top1=2353 accp=1.000 next=pair draft=1121 prop=1121 pred gate=device Token # 223: 113.994ms; value: next_token_ids=tensor([1121], device='cuda:0') mtp accept=1 prop=1121 top1=1121 accp=1.000 next=draft=66518 prop=66518 olap pair=108.8ms serial=193.0ms gain=84.2ms ratio=0.44 s0=4.2ms s1=188.8ms wait=0.1/45.1ms pred gate=device Token # 224: 3.769ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=pair draft=343 prop=343 pred gate=device Token # 225: 114.627ms; value: next_token_ids=tensor([343], device='cuda:0') mtp accept=1 prop=343 top1=343 accp=1.000 next=draft=8835 prop=8835 olap pair=109.3ms serial=193.0ms gain=83.7ms ratio=0.43 s0=4.1ms s1=188.9ms wait=0.1/45.9ms pred gate=device Token # 226: 3.703ms; value: next_token_ids=tensor([8835], device='cuda:0') mtp accept=1 prop=8835 top1=8835 accp=1.000 next=pair draft=682 prop=682 pred gate=device Token # 227: 114.330ms; value: next_token_ids=tensor([682], device='cuda:0') mtp accept=1 prop=682 top1=682 accp=1.000 next=draft=15 prop=15 olap pair=109.0ms serial=193.8ms gain=84.8ms ratio=0.44 s0=3.7ms s1=190.2ms wait=0.1/46.5ms pred gate=device Token # 228: 3.694ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=pair draft=2619 prop=2619 pred gate=device Token # 229: 114.008ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.999 next=draft=14087 prop=14087 olap pair=108.8ms serial=193.1ms gain=84.3ms ratio=0.44 s0=4.3ms s1=188.8ms wait=0.1/45.0ms pred gate=device Token # 230: 3.745ms; value: next_token_ids=tensor([14087], device='cuda:0') mtp accept=1 prop=14087 top1=14087 accp=1.000 next=pair draft=666 prop=666 pred gate=device Token # 231: 114.966ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=768 prop=768 olap pair=109.7ms serial=195.1ms gain=85.4ms ratio=0.44 s0=4.2ms s1=190.9ms wait=0.1/45.3ms pred gate=device Token # 232: 3.720ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=1275 prop=1275 pred gate=device Token # 233: 114.180ms; value: next_token_ids=tensor([1275], device='cuda:0') mtp accept=1 prop=1275 top1=1275 accp=1.000 next=draft=8842 prop=8842 olap pair=109.0ms serial=193.5ms gain=84.6ms ratio=0.44 s0=4.2ms s1=189.3ms wait=0.1/45.1ms pred gate=device Token # 234: 3.731ms; value: next_token_ids=tensor([52727], device='cuda:0') mtp accept=0 prop=8842 top1=52727 accp=0.035 next=pair draft=51259 prop=51259 pred gate=device Token # 235: 113.844ms; value: next_token_ids=tensor([2827], device='cuda:0') mtp accept=0 prop=51259 top1=45276 accp=0.185 next=draft=1237 prop=1237 olap pair=108.6ms serial=192.8ms gain=84.2ms ratio=0.44 s0=4.3ms s1=188.6ms wait=0.1/45.1ms pred gate=device Token # 236: 114.140ms; value: next_token_ids=tensor([10395], device='cuda:0') mtp accept=0 prop=1237 top1=10395 accp=0.000 next=draft=2353 prop=2353 olap pair=108.9ms serial=193.3ms gain=84.4ms ratio=0.44 s0=4.2ms s1=189.1ms wait=0.1/45.1ms pred gate=device Token # 237: 114.164ms; value: next_token_ids=tensor([51259], device='cuda:0') mtp accept=0 prop=2353 top1=51259 accp=0.223 next=draft=24479 prop=24479 olap pair=108.9ms serial=193.4ms gain=84.5ms ratio=0.44 s0=4.2ms s1=189.1ms wait=0.1/45.1ms pred gate=device Token # 238: 114.621ms; value: next_token_ids=tensor([24479], device='cuda:0') mtp accept=1 prop=24479 top1=24479 accp=0.979 next=draft=31446 prop=24495 olap pair=109.3ms serial=193.5ms gain=84.2ms ratio=0.44 s0=4.0ms s1=189.4ms wait=0.1/45.9ms pred gate=device Token # 239: 3.695ms; value: next_token_ids=tensor([1824], device='cuda:0') mtp accept=0 prop=24495 top1=1824 accp=0.499 next=pair draft=31446 prop=31446 pred gate=device Token # 240: 114.090ms; value: next_token_ids=tensor([31446], device='cuda:0') mtp accept=1 prop=31446 top1=31446 accp=0.723 next=draft=303 prop=303 olap pair=108.8ms serial=193.3ms gain=84.5ms ratio=0.44 s0=3.7ms s1=189.6ms wait=0.1/46.5ms pred gate=device Token # 241: 3.718ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=25650 prop=119545 pred gate=device Token # 242: 115.038ms; value: next_token_ids=tensor([10780], device='cuda:0') mtp accept=0 prop=119545 top1=10780 accp=0.091 next=draft=621 prop=621 olap pair=109.6ms serial=194.6ms gain=84.9ms ratio=0.44 s0=3.8ms s1=190.7ms wait=0.1/46.1ms pred gate=device Token # 243: 114.475ms; value: next_token_ids=tensor([621], device='cuda:0') mtp accept=1 prop=621 top1=621 accp=1.000 next=draft=13097 prop=13097 olap pair=109.1ms serial=193.7ms gain=84.6ms ratio=0.44 s0=3.9ms s1=189.7ms wait=0.1/45.9ms pred gate=device Token # 244: 3.796ms; value: next_token_ids=tensor([13097], device='cuda:0') mtp accept=1 prop=13097 top1=13097 accp=1.000 next=pair draft=6034 prop=6034 pred gate=device Token # 245: 113.914ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=draft=572 prop=572 olap pair=108.6ms serial=192.8ms gain=84.2ms ratio=0.44 s0=3.7ms s1=189.1ms wait=0.1/46.2ms pred gate=device Token # 246: 3.754ms; value: next_token_ids=tensor([572], device='cuda:0') mtp accept=1 prop=572 top1=572 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 247: 114.220ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=7849 prop=7849 olap pair=109.0ms serial=193.4ms gain=84.4ms ratio=0.44 s0=3.7ms s1=189.7ms wait=0.1/46.4ms pred gate=device Token # 248: 3.688ms; value: next_token_ids=tensor([7849], device='cuda:0') mtp accept=1 prop=7849 top1=7849 accp=0.997 next=pair draft=6034 prop=6034 pred gate=device Token # 249: 114.114ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=draft=1299 prop=4398 olap pair=108.9ms serial=193.4ms gain=84.5ms ratio=0.44 s0=3.7ms s1=189.6ms wait=0.1/46.4ms pred gate=device Token # 250: 3.670ms; value: next_token_ids=tensor([1299], device='cuda:0') mtp accept=0 prop=4398 top1=1299 accp=0.495 next=pair draft=28963 prop=10176 pred gate=device Token # 251: 114.385ms; value: next_token_ids=tensor([10176], device='cuda:0') mtp accept=1 prop=10176 top1=28963 accp=0.565 next=draft=3343 prop=3343 olap pair=109.2ms serial=193.9ms gain=84.7ms ratio=0.44 s0=4.2ms s1=189.6ms wait=0.1/45.2ms pred gate=device Token # 252: 3.689ms; value: next_token_ids=tensor([51259], device='cuda:0') mtp accept=0 prop=3343 top1=51259 accp=0.000 next=pair draft=29457 prop=29457 pred gate=device Token # 253: 114.597ms; value: next_token_ids=tensor([29457], device='cuda:0') mtp accept=1 prop=29457 top1=29457 accp=0.999 next=draft=303 prop=303 olap pair=109.2ms serial=194.0ms gain=84.8ms ratio=0.44 s0=4.2ms s1=189.8ms wait=0.1/45.1ms pred gate=device Token # 254: 3.775ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=4339 prop=4272 pred gate=device Token # 255: 113.891ms; value: next_token_ids=tensor([2490], device='cuda:0') mtp accept=0 prop=4272 top1=2490 accp=0.379 next=draft=24479 prop=974 olap pair=108.7ms serial=193.0ms gain=84.3ms ratio=0.44 s0=4.2ms s1=188.8ms wait=0.1/45.2ms pred gate=device Token # 256: 114.816ms; value: next_token_ids=tensor([16303], device='cuda:0') mtp accept=0 prop=974 top1=16303 accp=0.001 next=draft=1237 prop=1237 olap pair=109.5ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.1ms wait=0.1/45.0ms pred gate=device Token # 257: 114.490ms; value: next_token_ids=tensor([637], device='cuda:0') mtp accept=0 prop=1237 top1=637 accp=0.013 next=draft=13276 prop=13276 olap pair=109.2ms serial=193.8ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.5ms wait=0.1/45.0ms pred gate=device Token # 258: 114.121ms; value: next_token_ids=tensor([13276], device='cuda:0') mtp accept=1 prop=13276 top1=13276 accp=0.879 next=draft=4339 prop=2920 olap pair=108.8ms serial=193.3ms gain=84.5ms ratio=0.44 s0=4.2ms s1=189.1ms wait=0.1/45.1ms pred gate=device Token # 259: 3.701ms; value: next_token_ids=tensor([4498], device='cuda:0') mtp accept=0 prop=2920 top1=654 accp=0.226 next=pair draft=876 prop=876 pred gate=device Token # 260: 114.427ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=0 prop=876 top1=320 accp=0.283 next=draft=8040 prop=8040 olap pair=109.2ms serial=193.9ms gain=84.7ms ratio=0.44 s0=4.2ms s1=189.7ms wait=0.1/45.2ms pred gate=device Token # 261: 114.747ms; value: next_token_ids=tensor([8040], device='cuda:0') mtp accept=1 prop=8040 top1=8040 accp=0.993 next=draft=303 prop=303 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.0ms wait=0.1/45.2ms pred gate=device Token # 262: 3.698ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=445 prop=445 pred gate=device Token # 263: 114.313ms; value: next_token_ids=tensor([1057], device='cuda:0') mtp accept=0 prop=445 top1=1057 accp=0.118 next=draft=23775 prop=23775 olap pair=109.1ms serial=193.8ms gain=84.7ms ratio=0.44 s0=4.2ms s1=189.6ms wait=0.1/45.4ms pred gate=device Token # 264: 114.416ms; value: next_token_ids=tensor([23775], device='cuda:0') mtp accept=1 prop=23775 top1=23775 accp=0.608 next=draft=2827 prop=24070 olap pair=109.1ms serial=193.8ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.5ms wait=0.1/45.3ms pred gate=device Token # 265: 3.730ms; value: next_token_ids=tensor([2827], device='cuda:0') mtp accept=0 prop=24070 top1=2827 accp=0.955 next=pair draft=72647 prop=72647 pred gate=device Token # 266: 114.227ms; value: next_token_ids=tensor([1263], device='cuda:0') mtp accept=0 prop=72647 top1=1263 accp=0.397 next=draft=2635 prop=2635 olap pair=109.0ms serial=193.5ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.3ms wait=0.1/45.2ms pred gate=device Token # 267: 114.511ms; value: next_token_ids=tensor([2635], device='cuda:0') mtp accept=1 prop=2635 top1=2635 accp=0.977 next=draft=2712 prop=2712 olap pair=109.2ms serial=194.0ms gain=84.8ms ratio=0.44 s0=4.4ms s1=189.6ms wait=0.1/45.1ms pred gate=device Token # 268: 3.717ms; value: next_token_ids=tensor([2712], device='cuda:0') mtp accept=1 prop=2712 top1=2712 accp=0.980 next=pair draft=1316 prop=1316 pred gate=device Token # 269: 114.430ms; value: next_token_ids=tensor([31446], device='cuda:0') mtp accept=0 prop=1316 top1=31446 accp=0.017 next=draft=303 prop=303 olap pair=109.2ms serial=193.5ms gain=84.3ms ratio=0.44 s0=4.8ms s1=188.7ms wait=0.1/44.9ms pred gate=device Token # 270: 115.600ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.810 next=draft=7849 prop=7849 olap pair=110.3ms serial=193.9ms gain=83.6ms ratio=0.43 s0=4.6ms s1=189.3ms wait=0.1/44.9ms pred gate=device Token # 271: 3.827ms; value: next_token_ids=tensor([7849], device='cuda:0') mtp accept=1 prop=7849 top1=7849 accp=0.525 next=pair draft=6034 prop=6034 pred gate=device Token # 272: 114.789ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=draft=4339 prop=4339 olap pair=109.5ms serial=193.1ms gain=83.6ms ratio=0.43 s0=4.5ms s1=188.6ms wait=0.1/45.1ms pred gate=device Token # 273: 3.743ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=0.863 next=pair draft=23945 prop=3343 pred gate=device Token # 274: 114.279ms; value: next_token_ids=tensor([23945], device='cuda:0') mtp accept=0 prop=3343 top1=23945 accp=0.588 next=draft=10626 prop=10626 olap pair=109.1ms serial=193.8ms gain=84.7ms ratio=0.44 s0=4.2ms s1=189.6ms wait=0.1/45.3ms pred gate=device Token # 275: 114.696ms; value: next_token_ids=tensor([10626], device='cuda:0') mtp accept=1 prop=10626 top1=10626 accp=0.905 next=draft=303 prop=303 olap pair=109.3ms serial=194.2ms gain=84.9ms ratio=0.44 s0=4.3ms s1=189.9ms wait=0.1/45.1ms pred gate=device Token # 276: 3.759ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=4272 prop=4272 pred gate=device Token # 277: 114.014ms; value: next_token_ids=tensor([4272], device='cuda:0') mtp accept=1 prop=4272 top1=4272 accp=0.962 next=draft=2490 prop=2490 olap pair=108.8ms serial=193.0ms gain=84.2ms ratio=0.44 s0=4.7ms s1=188.2ms wait=0.1/44.8ms pred gate=device Token # 278: 3.772ms; value: next_token_ids=tensor([2490], device='cuda:0') mtp accept=1 prop=2490 top1=2490 accp=0.410 next=pair draft=16303 prop=16303 pred gate=device Token # 279: 113.782ms; value: next_token_ids=tensor([5480], device='cuda:0') mtp accept=0 prop=16303 top1=5480 accp=0.220 next=draft=102407 prop=102407 olap pair=108.6ms serial=193.1ms gain=84.5ms ratio=0.44 s0=3.9ms s1=189.2ms wait=0.1/46.2ms pred gate=device Token # 280: 114.023ms; value: next_token_ids=tensor([102407], device='cuda:0') mtp accept=1 prop=102407 top1=8555 accp=0.381 next=draft=1316 prop=1316 olap pair=108.8ms serial=193.2ms gain=84.5ms ratio=0.44 s0=3.6ms s1=189.6ms wait=0.1/46.5ms pred gate=device Token # 281: 3.683ms; value: next_token_ids=tensor([1316], device='cuda:0') mtp accept=1 prop=1316 top1=1316 accp=0.995 next=pair draft=102407 prop=102407 pred gate=device Token # 282: 114.064ms; value: next_token_ids=tensor([5480], device='cuda:0') mtp accept=0 prop=102407 top1=5480 accp=0.019 next=draft=41 prop=41 olap pair=108.9ms serial=193.5ms gain=84.6ms ratio=0.44 s0=3.8ms s1=189.8ms wait=0.1/46.4ms pred gate=device Token # 283: 114.342ms; value: next_token_ids=tensor([41], device='cuda:0') mtp accept=1 prop=41 top1=41 accp=0.994 next=draft=1929 prop=1929 olap pair=109.0ms serial=193.6ms gain=84.7ms ratio=0.44 s0=3.6ms s1=190.0ms wait=0.1/46.5ms pred gate=device Token # 284: 3.781ms; value: next_token_ids=tensor([1929], device='cuda:0') mtp accept=1 prop=1929 top1=1929 accp=1.000 next=pair draft=955 prop=1824 pred gate=device Token # 285: 114.151ms; value: next_token_ids=tensor([955], device='cuda:0') mtp accept=0 prop=1824 top1=955 accp=0.749 next=draft=5667 prop=5667 olap pair=108.9ms serial=193.3ms gain=84.4ms ratio=0.44 s0=4.8ms s1=188.5ms wait=0.1/44.1ms pred gate=device Token # 286: 114.138ms; value: next_token_ids=tensor([5667], device='cuda:0') mtp accept=1 prop=5667 top1=5667 accp=0.999 next=draft=5289 prop=5289 olap pair=108.8ms serial=193.0ms gain=84.3ms ratio=0.44 s0=4.8ms s1=188.2ms wait=0.1/44.5ms pred gate=device Token # 287: 3.728ms; value: next_token_ids=tensor([21093], device='cuda:0') mtp accept=0 prop=5289 top1=21093 accp=0.262 next=pair draft=4498 prop=876 pred gate=device Token # 288: 114.624ms; value: next_token_ids=tensor([876], device='cuda:0') mtp accept=1 prop=876 top1=876 accp=0.029 next=draft=15 prop=15 olap pair=109.3ms serial=194.0ms gain=84.8ms ratio=0.44 s0=4.2ms s1=189.8ms wait=0.1/45.5ms pred gate=device Token # 289: 3.664ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=pair draft=2619 prop=2899 pred gate=device Token # 290: 114.717ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=0 prop=2899 top1=2619 accp=0.803 next=draft=36101 prop=36101 olap pair=109.4ms serial=194.4ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.7ms wait=0.1/46.5ms pred gate=device Token # 291: 114.697ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=1.000 next=draft=17520 prop=17520 olap pair=109.4ms serial=194.4ms gain=85.0ms ratio=0.44 s0=3.9ms s1=190.4ms wait=0.1/46.0ms pred gate=device Token # 292: 3.703ms; value: next_token_ids=tensor([17520], device='cuda:0') mtp accept=1 prop=17520 top1=17520 accp=1.000 next=pair draft=666 prop=666 pred gate=device Token # 293: 114.623ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=768 prop=768 olap pair=109.3ms serial=194.2ms gain=84.9ms ratio=0.44 s0=4.3ms s1=189.9ms wait=0.1/45.3ms pred gate=device Token # 294: 3.729ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=445 prop=445 pred gate=device Token # 295: 114.253ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=1 prop=445 top1=445 accp=0.983 next=draft=2382 prop=2382 olap pair=109.1ms serial=193.7ms gain=84.7ms ratio=0.44 s0=4.2ms s1=189.6ms wait=0.1/45.3ms pred gate=device Token # 296: 3.762ms; value: next_token_ids=tensor([2382], device='cuda:0') mtp accept=1 prop=2382 top1=2382 accp=0.590 next=pair draft=92 prop=92 pred gate=device Token # 297: 113.843ms; value: next_token_ids=tensor([92], device='cuda:0') mtp accept=1 prop=92 top1=92 accp=1.000 next=draft=31 prop=31 olap pair=108.6ms serial=192.7ms gain=84.1ms ratio=0.44 s0=4.2ms s1=188.5ms wait=0.1/45.2ms pred gate=device Token # 298: 3.740ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 299: 114.306ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=301 prop=301 olap pair=109.2ms serial=193.9ms gain=84.7ms ratio=0.44 s0=4.2ms s1=189.6ms wait=0.1/45.2ms pred gate=device Token # 300: 3.802ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=0.923 next=pair draft=36101 prop=36101 pred gate=device Token # 301: 114.440ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=1.000 next=draft=525 prop=525 olap pair=109.2ms serial=194.0ms gain=84.8ms ratio=0.44 s0=4.1ms s1=190.0ms wait=0.1/45.8ms pred gate=device Token # 302: 3.767ms; value: next_token_ids=tensor([525], device='cuda:0') mtp accept=1 prop=525 top1=525 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 303: 114.099ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=2353 prop=2353 olap pair=108.9ms serial=193.3ms gain=84.5ms ratio=0.44 s0=3.7ms s1=189.7ms wait=0.1/46.5ms pred gate=device Token # 304: 3.716ms; value: next_token_ids=tensor([2353], device='cuda:0') mtp accept=1 prop=2353 top1=2353 accp=1.000 next=pair draft=1121 prop=1121 pred gate=device Token # 305: 114.695ms; value: next_token_ids=tensor([1121], device='cuda:0') mtp accept=1 prop=1121 top1=1121 accp=1.000 next=draft=66518 prop=66518 olap pair=109.5ms serial=194.8ms gain=85.3ms ratio=0.44 s0=3.7ms s1=191.1ms wait=0.1/46.4ms pred gate=device Token # 306: 3.714ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=pair draft=389 prop=389 pred gate=device Token # 307: 114.106ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=0.972 next=draft=19911 prop=79273 olap pair=108.9ms serial=193.6ms gain=84.7ms ratio=0.44 s0=3.7ms s1=189.9ms wait=0.1/46.3ms pred gate=device Token # 308: 3.844ms; value: next_token_ids=tensor([79273], device='cuda:0') mtp accept=1 prop=79273 top1=79273 accp=0.388 next=pair draft=303 prop=303 pred gate=device Token # 309: 114.372ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=0 prop=303 top1=320 accp=0.062 next=draft=2524 prop=2524 olap pair=109.1ms serial=193.8ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.5ms wait=0.1/45.2ms pred gate=device Token # 310: 114.676ms; value: next_token_ids=tensor([2524], device='cuda:0') mtp accept=1 prop=2524 top1=2524 accp=0.998 next=draft=10802 prop=10802 olap pair=109.4ms serial=194.5ms gain=85.1ms ratio=0.44 s0=3.8ms s1=190.7ms wait=0.1/46.2ms pred gate=device Token # 311: 3.719ms; value: next_token_ids=tensor([19585], device='cuda:0') mtp accept=0 prop=10802 top1=10802 accp=0.771 next=pair draft=1275 prop=1275 pred gate=device Token # 312: 114.135ms; value: next_token_ids=tensor([1275], device='cuda:0') mtp accept=1 prop=1275 top1=1275 accp=0.999 next=draft=8842 prop=8842 olap pair=108.9ms serial=193.5ms gain=84.6ms ratio=0.44 s0=3.7ms s1=189.8ms wait=0.1/46.5ms pred gate=device Token # 313: 3.669ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=0.998 next=pair draft=45032 prop=45032 pred gate=device Token # 314: 114.093ms; value: next_token_ids=tensor([10638], device='cuda:0') mtp accept=0 prop=45032 top1=10638 accp=0.127 next=draft=45032 prop=45032 olap pair=108.9ms serial=192.9ms gain=84.0ms ratio=0.44 s0=4.0ms s1=188.9ms wait=0.1/46.1ms pred gate=device Token # 315: 114.891ms; value: next_token_ids=tensor([1824], device='cuda:0') mtp accept=0 prop=45032 top1=1824 accp=0.002 next=draft=31446 prop=31446 olap pair=109.6ms serial=194.8ms gain=85.2ms ratio=0.44 s0=3.7ms s1=191.1ms wait=0.1/46.4ms pred gate=device Token # 316: 114.790ms; value: next_token_ids=tensor([31446], device='cuda:0') mtp accept=1 prop=31446 top1=31446 accp=0.996 next=draft=303 prop=303 olap pair=109.4ms serial=194.4ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.7ms wait=0.1/46.4ms pred gate=device Token # 317: 3.706ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=8494 prop=8494 pred gate=device Token # 318: 113.927ms; value: next_token_ids=tensor([15495], device='cuda:0') mtp accept=0 prop=8494 top1=10802 accp=0.261 next=draft=1275 prop=1275 olap pair=108.7ms serial=193.3ms gain=84.5ms ratio=0.44 s0=3.7ms s1=189.5ms wait=0.1/46.4ms pred gate=device Token # 319: 114.120ms; value: next_token_ids=tensor([59250], device='cuda:0') mtp accept=0 prop=1275 top1=59250 accp=0.000 next=draft=3374 prop=3374 olap pair=108.8ms serial=193.3ms gain=84.5ms ratio=0.44 s0=3.6ms s1=189.7ms wait=0.1/46.4ms pred gate=device Token # 320: 114.349ms; value: next_token_ids=tensor([60540], device='cuda:0') mtp accept=0 prop=3374 top1=60540 accp=0.000 next=draft=12052 prop=12052 olap pair=109.1ms serial=193.8ms gain=84.8ms ratio=0.44 s0=3.6ms s1=190.2ms wait=0.1/46.6ms pred gate=device Token # 321: 114.578ms; value: next_token_ids=tensor([12052], device='cuda:0') mtp accept=1 prop=12052 top1=12052 accp=1.000 next=draft=320 prop=320 olap pair=109.2ms serial=194.0ms gain=84.8ms ratio=0.44 s0=3.6ms s1=190.4ms wait=0.1/46.5ms pred gate=device Token # 322: 3.740ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=10802 prop=10802 pred gate=device Token # 323: 114.561ms; value: next_token_ids=tensor([2490], device='cuda:0') mtp accept=0 prop=10802 top1=2490 accp=0.005 next=draft=1275 prop=1275 olap pair=109.3ms serial=194.3ms gain=84.9ms ratio=0.44 s0=3.6ms s1=190.7ms wait=0.1/46.5ms pred gate=device Token # 324: 121.521ms; value: next_token_ids=tensor([1275], device='cuda:0') mtp accept=1 prop=1275 top1=1275 accp=0.997 next=draft=8842 prop=8842 olap pair=110.1ms serial=194.8ms gain=84.6ms ratio=0.43 s0=4.3ms s1=190.5ms wait=0.1/45.3ms pred gate=device Token # 325: 3.684ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=0.875 next=pair draft=10602 prop=10602 pred gate=device Token # 326: 114.314ms; value: next_token_ids=tensor([10602], device='cuda:0') mtp accept=1 prop=10602 top1=10602 accp=0.829 next=draft=68160 prop=68160 olap pair=109.1ms serial=193.6ms gain=84.5ms ratio=0.44 s0=4.2ms s1=189.3ms wait=0.1/45.2ms pred gate=device Token # 327: 3.795ms; value: next_token_ids=tensor([10780], device='cuda:0') mtp accept=0 prop=68160 top1=10780 accp=0.056 next=pair draft=621 prop=621 pred gate=device Token # 328: 113.999ms; value: next_token_ids=tensor([621], device='cuda:0') mtp accept=1 prop=621 top1=621 accp=1.000 next=draft=13097 prop=13097 olap pair=108.7ms serial=192.9ms gain=84.2ms ratio=0.44 s0=4.3ms s1=188.6ms wait=0.1/45.2ms pred gate=device Token # 329: 3.792ms; value: next_token_ids=tensor([13097], device='cuda:0') mtp accept=1 prop=13097 top1=13097 accp=1.000 next=pair draft=6034 prop=6034 pred gate=device Token # 330: 114.318ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=draft=572 prop=572 olap pair=109.1ms serial=193.6ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.3ms wait=0.1/45.1ms pred gate=device Token # 331: 3.832ms; value: next_token_ids=tensor([572], device='cuda:0') mtp accept=1 prop=572 top1=572 accp=0.998 next=pair draft=303 prop=303 pred gate=device Token # 332: 113.912ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=119221 prop=119221 olap pair=108.7ms serial=192.9ms gain=84.2ms ratio=0.44 s0=4.2ms s1=188.7ms wait=0.1/45.3ms pred gate=device Token # 333: 3.700ms; value: next_token_ids=tensor([119221], device='cuda:0') mtp accept=1 prop=119221 top1=119221 accp=0.856 next=pair draft=45276 prop=45276 pred gate=device Token # 334: 114.093ms; value: next_token_ids=tensor([7849], device='cuda:0') mtp accept=0 prop=45276 top1=7849 accp=0.333 next=draft=6034 prop=6034 olap pair=108.8ms serial=193.3ms gain=84.5ms ratio=0.44 s0=3.7ms s1=189.6ms wait=0.1/46.4ms pred gate=device Token # 335: 114.404ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=0.993 next=draft=124356 prop=124356 olap pair=109.1ms serial=193.8ms gain=84.7ms ratio=0.44 s0=3.6ms s1=190.2ms wait=0.1/46.5ms pred gate=device Token # 336: 3.671ms; value: next_token_ids=tensor([124356], device='cuda:0') mtp accept=1 prop=124356 top1=124356 accp=0.996 next=pair draft=42829 prop=42829 pred gate=device Token # 337: 114.757ms; value: next_token_ids=tensor([42829], device='cuda:0') mtp accept=1 prop=42829 top1=42829 accp=0.992 next=draft=303 prop=303 olap pair=109.4ms serial=194.4ms gain=85.0ms ratio=0.44 s0=3.6ms s1=190.8ms wait=0.1/46.5ms pred gate=device Token # 338: 3.713ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=1380 prop=1380 pred gate=device Token # 339: 114.828ms; value: next_token_ids=tensor([1380], device='cuda:0') mtp accept=1 prop=1380 top1=1380 accp=0.882 next=draft=13825 prop=13825 olap pair=109.6ms serial=194.8ms gain=85.2ms ratio=0.44 s0=3.6ms s1=191.2ms wait=0.1/46.4ms pred gate=device Token # 340: 3.691ms; value: next_token_ids=tensor([4953], device='cuda:0') mtp accept=0 prop=13825 top1=4953 accp=0.000 next=pair draft=4339 prop=4339 pred gate=device Token # 341: 114.373ms; value: next_token_ids=tensor([13097], device='cuda:0') mtp accept=0 prop=4339 top1=13097 accp=0.117 next=draft=6034 prop=6034 olap pair=109.1ms serial=194.0ms gain=84.9ms ratio=0.44 s0=3.6ms s1=190.4ms wait=0.1/46.4ms pred gate=device Token # 342: 114.352ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=0.654 next=draft=3437 prop=3437 olap pair=109.0ms serial=193.6ms gain=84.6ms ratio=0.44 s0=3.7ms s1=189.9ms wait=0.1/46.4ms pred gate=device Token # 343: 3.762ms; value: next_token_ids=tensor([3437], device='cuda:0') mtp accept=1 prop=3437 top1=7103 accp=0.112 next=pair draft=4339 prop=4339 pred gate=device Token # 344: 114.968ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=1.000 next=draft=1057 prop=1057 olap pair=109.7ms serial=195.0ms gain=85.3ms ratio=0.44 s0=3.9ms s1=191.1ms wait=0.1/46.0ms pred gate=device Token # 345: 3.705ms; value: next_token_ids=tensor([1057], device='cuda:0') mtp accept=1 prop=1057 top1=1057 accp=0.874 next=pair draft=2353 prop=2353 pred gate=device Token # 346: 114.851ms; value: next_token_ids=tensor([25024], device='cuda:0') mtp accept=0 prop=2353 top1=25024 accp=0.000 next=draft=14015 prop=14015 olap pair=109.7ms serial=195.0ms gain=85.4ms ratio=0.44 s0=3.6ms s1=191.4ms wait=0.1/46.4ms pred gate=device Token # 347: 114.826ms; value: next_token_ids=tensor([14015], device='cuda:0') mtp accept=1 prop=14015 top1=14015 accp=0.997 next=draft=1427 prop=1427 olap pair=109.5ms serial=194.5ms gain=85.1ms ratio=0.44 s0=3.6ms s1=190.9ms wait=0.1/46.4ms pred gate=device Token # 348: 3.703ms; value: next_token_ids=tensor([1427], device='cuda:0') mtp accept=1 prop=1427 top1=1427 accp=1.000 next=pair draft=13062 prop=13062 pred gate=device Token # 349: 114.794ms; value: next_token_ids=tensor([13062], device='cuda:0') mtp accept=1 prop=13062 top1=13062 accp=1.000 next=draft=320 prop=320 olap pair=109.5ms serial=194.7ms gain=85.1ms ratio=0.44 s0=3.6ms s1=191.0ms wait=0.1/46.4ms pred gate=device Token # 350: 3.658ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.977 next=pair draft=2803 prop=2803 pred gate=device Token # 351: 114.267ms; value: next_token_ids=tensor([2803], device='cuda:0') mtp accept=1 prop=2803 top1=2803 accp=0.968 next=draft=303 prop=303 olap pair=109.1ms serial=193.8ms gain=84.7ms ratio=0.44 s0=3.6ms s1=190.1ms wait=0.1/46.4ms pred gate=device Token # 352: 3.736ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=2353 prop=2353 pred gate=device Token # 353: 114.589ms; value: next_token_ids=tensor([2353], device='cuda:0') mtp accept=1 prop=2353 top1=2353 accp=0.932 next=draft=1121 prop=1121 olap pair=109.4ms serial=194.4ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.7ms wait=0.1/46.5ms pred gate=device Token # 354: 3.744ms; value: next_token_ids=tensor([1121], device='cuda:0') mtp accept=1 prop=1121 top1=1121 accp=1.000 next=pair draft=66518 prop=66518 pred gate=device Token # 355: 113.836ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=draft=673 prop=673 olap pair=108.6ms serial=192.7ms gain=84.1ms ratio=0.44 s0=4.3ms s1=188.4ms wait=0.1/45.2ms pred gate=device Token # 356: 3.713ms; value: next_token_ids=tensor([673], device='cuda:0') mtp accept=1 prop=673 top1=673 accp=0.954 next=pair draft=25858 prop=25858 pred gate=device Token # 357: 114.077ms; value: next_token_ids=tensor([25858], device='cuda:0') mtp accept=1 prop=25858 top1=25858 accp=1.000 next=draft=77170 prop=77170 olap pair=108.9ms serial=193.4ms gain=84.6ms ratio=0.44 s0=3.9ms s1=189.5ms wait=0.1/46.0ms pred gate=device Token # 358: 3.733ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=0 prop=77170 top1=6034 accp=0.013 next=pair draft=16660 prop=16660 pred gate=device Token # 359: 114.208ms; value: next_token_ids=tensor([16660], device='cuda:0') mtp accept=1 prop=16660 top1=16660 accp=0.719 next=draft=16303 prop=16303 olap pair=108.9ms serial=193.3ms gain=84.5ms ratio=0.44 s0=3.7ms s1=189.6ms wait=0.1/46.3ms pred gate=device Token # 360: 3.695ms; value: next_token_ids=tensor([16303], device='cuda:0') mtp accept=1 prop=16303 top1=16303 accp=1.000 next=pair draft=100642 prop=100642 pred gate=device Token # 361: 114.748ms; value: next_token_ids=tensor([100642], device='cuda:0') mtp accept=1 prop=100642 top1=100642 accp=0.916 next=draft=303 prop=303 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.1ms s1=190.4ms wait=0.1/45.6ms pred gate=device Token # 362: 3.713ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=4009 prop=4009 pred gate=device Token # 363: 114.739ms; value: next_token_ids=tensor([4009], device='cuda:0') mtp accept=1 prop=4009 top1=4009 accp=0.687 next=draft=2386 prop=2386 olap pair=109.6ms serial=194.6ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.3ms wait=0.1/45.2ms pred gate=device Token # 364: 3.715ms; value: next_token_ids=tensor([2386], device='cuda:0') mtp accept=1 prop=2386 top1=2386 accp=0.998 next=pair draft=89967 prop=89967 pred gate=device Token # 365: 114.434ms; value: next_token_ids=tensor([89967], device='cuda:0') mtp accept=1 prop=89967 top1=89967 accp=0.665 next=draft=8842 prop=8842 olap pair=109.2ms serial=194.0ms gain=84.8ms ratio=0.44 s0=4.2ms s1=189.8ms wait=0.1/45.4ms pred gate=device Token # 366: 3.676ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=0.952 next=pair draft=12052 prop=12052 pred gate=device Token # 367: 114.061ms; value: next_token_ids=tensor([12052], device='cuda:0') mtp accept=1 prop=12052 top1=12052 accp=0.728 next=draft=548 prop=548 olap pair=108.8ms serial=193.1ms gain=84.3ms ratio=0.44 s0=4.2ms s1=188.9ms wait=0.1/45.4ms pred gate=device Token # 368: 3.732ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=1 prop=548 top1=548 accp=0.759 next=pair draft=16303 prop=16303 pred gate=device Token # 369: 114.774ms; value: next_token_ids=tensor([16303], device='cuda:0') mtp accept=1 prop=16303 top1=16303 accp=0.991 next=draft=70359 prop=70359 olap pair=109.6ms serial=194.7ms gain=85.2ms ratio=0.44 s0=4.2ms s1=190.5ms wait=0.1/45.5ms pred gate=device Token # 370: 3.743ms; value: next_token_ids=tensor([45045], device='cuda:0') mtp accept=0 prop=70359 top1=45045 accp=0.025 next=pair draft=478 prop=478 pred gate=device Token # 371: 114.706ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.970 next=draft=372 prop=372 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/45.2ms pred gate=device Token # 372: 3.717ms; value: next_token_ids=tensor([372], device='cuda:0') mtp accept=1 prop=372 top1=372 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 373: 114.172ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=21 prop=21 olap pair=109.0ms serial=193.5ms gain=84.5ms ratio=0.44 s0=4.2ms s1=189.3ms wait=0.1/45.3ms pred gate=device Token # 374: 3.677ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 375: 115.146ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=223 prop=223 olap pair=109.9ms serial=195.4ms gain=85.5ms ratio=0.44 s0=4.2ms s1=191.2ms wait=0.1/45.2ms pred gate=device Token # 376: 3.723ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=34408 prop=34408 pred gate=device Token # 377: 114.662ms; value: next_token_ids=tensor([34408], device='cuda:0') mtp accept=1 prop=34408 top1=34408 accp=1.000 next=draft=1728 prop=1728 olap pair=109.5ms serial=194.4ms gain=85.0ms ratio=0.44 s0=4.2ms s1=190.2ms wait=0.1/45.3ms pred gate=device Token # 378: 3.773ms; value: next_token_ids=tensor([1728], device='cuda:0') mtp accept=1 prop=1728 top1=1728 accp=1.000 next=pair draft=66518 prop=66518 pred gate=device Token # 379: 114.333ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=draft=343 prop=343 olap pair=109.1ms serial=193.9ms gain=84.8ms ratio=0.44 s0=4.2ms s1=189.7ms wait=0.1/45.3ms pred gate=device Token # 380: 3.759ms; value: next_token_ids=tensor([343], device='cuda:0') mtp accept=1 prop=343 top1=343 accp=1.000 next=pair draft=10124 prop=10124 pred gate=device Token # 381: 114.927ms; value: next_token_ids=tensor([10124], device='cuda:0') mtp accept=1 prop=10124 top1=10124 accp=1.000 next=draft=682 prop=682 olap pair=109.8ms serial=195.1ms gain=85.3ms ratio=0.44 s0=4.3ms s1=190.8ms wait=0.1/45.2ms pred gate=device Token # 382: 3.761ms; value: next_token_ids=tensor([682], device='cuda:0') mtp accept=1 prop=682 top1=682 accp=1.000 next=pair draft=15 prop=15 pred gate=device Token # 383: 114.422ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=2619 prop=406 olap pair=109.3ms serial=194.2ms gain=84.9ms ratio=0.44 s0=4.2ms s1=190.0ms wait=0.1/45.1ms pred gate=device Token # 384: 3.777ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=0 prop=406 top1=2619 accp=0.512 next=pair draft=14087 prop=14087 pred gate=device Token # 385: 114.529ms; value: next_token_ids=tensor([14087], device='cuda:0') mtp accept=1 prop=14087 top1=14087 accp=1.000 next=draft=666 prop=666 olap pair=109.3ms serial=194.0ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.7ms wait=0.1/45.3ms pred gate=device Token # 386: 3.765ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=pair draft=768 prop=768 pred gate=device Token # 387: 114.683ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=1275 prop=1275 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.1ms wait=0.1/45.3ms pred gate=device Token # 388: 3.735ms; value: next_token_ids=tensor([1275], device='cuda:0') mtp accept=1 prop=1275 top1=1275 accp=1.000 next=pair draft=8842 prop=8842 pred gate=device Token # 389: 114.522ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=0.997 next=draft=27362 prop=27362 olap pair=109.2ms serial=193.9ms gain=84.7ms ratio=0.44 s0=4.2ms s1=189.6ms wait=0.1/45.2ms pred gate=device Token # 390: 3.852ms; value: next_token_ids=tensor([2635], device='cuda:0') mtp accept=0 prop=27362 top1=2635 accp=0.052 next=pair draft=2827 prop=2827 pred gate=device Token # 391: 115.105ms; value: next_token_ids=tensor([2827], device='cuda:0') mtp accept=1 prop=2827 top1=2827 accp=1.000 next=draft=31446 prop=31446 olap pair=109.8ms serial=194.7ms gain=84.9ms ratio=0.44 s0=3.8ms s1=190.9ms wait=0.1/46.2ms pred gate=device Token # 392: 3.733ms; value: next_token_ids=tensor([32041], device='cuda:0') mtp accept=0 prop=31446 top1=32041 accp=0.035 next=pair draft=13097 prop=13097 pred gate=device Token # 393: 114.335ms; value: next_token_ids=tensor([13097], device='cuda:0') mtp accept=1 prop=13097 top1=13097 accp=1.000 next=draft=7046 prop=7046 olap pair=109.0ms serial=193.6ms gain=84.6ms ratio=0.44 s0=3.7ms s1=189.9ms wait=0.1/46.4ms pred gate=device Token # 394: 3.691ms; value: next_token_ids=tensor([7046], device='cuda:0') mtp accept=1 prop=7046 top1=7046 accp=0.739 next=pair draft=303 prop=303 pred gate=device Token # 395: 114.282ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=7849 prop=7849 olap pair=108.9ms serial=193.4ms gain=84.5ms ratio=0.44 s0=3.7ms s1=189.8ms wait=0.1/46.3ms pred gate=device Token # 396: 3.724ms; value: next_token_ids=tensor([7849], device='cuda:0') mtp accept=1 prop=7849 top1=7849 accp=1.000 next=pair draft=7046 prop=7046 pred gate=device Token # 397: 114.629ms; value: next_token_ids=tensor([7046], device='cuda:0') mtp accept=1 prop=7046 top1=7046 accp=1.000 next=draft=9501 prop=9501 olap pair=109.3ms serial=194.2ms gain=84.9ms ratio=0.44 s0=3.6ms s1=190.5ms wait=0.1/46.5ms pred gate=device Token # 398: 3.759ms; value: next_token_ids=tensor([9501], device='cuda:0') mtp accept=1 prop=9501 top1=9501 accp=0.864 next=pair draft=7557 prop=7557 pred gate=device Token # 399: 114.947ms; value: next_token_ids=tensor([7557], device='cuda:0') mtp accept=1 prop=7557 top1=7557 accp=0.998 next=draft=6034 prop=6034 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=4.1ms s1=190.6ms wait=0.1/45.6ms pred gate=device Token # 400: 3.759ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=pair draft=572 prop=572 pred gate=device Token # 401: 115.084ms; value: next_token_ids=tensor([572], device='cuda:0') mtp accept=1 prop=572 top1=572 accp=1.000 next=draft=320 prop=320 olap pair=109.8ms serial=195.1ms gain=85.3ms ratio=0.44 s0=4.0ms s1=191.1ms wait=0.1/45.8ms pred gate=device Token # 402: 3.702ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=445 prop=445 pred gate=device Token # 403: 114.698ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=1 prop=445 top1=445 accp=0.974 next=draft=36101 prop=34408 olap pair=109.5ms serial=194.6ms gain=85.1ms ratio=0.44 s0=3.7ms s1=190.9ms wait=0.1/46.4ms pred gate=device Token # 404: 3.744ms; value: next_token_ids=tensor([10172], device='cuda:0') mtp accept=0 prop=34408 top1=10172 accp=0.045 next=pair draft=625 prop=625 pred gate=device Token # 405: 114.644ms; value: next_token_ids=tensor([625], device='cuda:0') mtp accept=1 prop=625 top1=625 accp=1.000 next=draft=303 prop=303 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=3.8ms s1=190.3ms wait=0.1/46.2ms pred gate=device Token # 406: 3.682ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=1275 prop=1275 pred gate=device Token # 407: 114.424ms; value: next_token_ids=tensor([2490], device='cuda:0') mtp accept=0 prop=1275 top1=2490 accp=0.325 next=draft=1275 prop=1275 olap pair=109.3ms serial=193.9ms gain=84.6ms ratio=0.44 s0=4.4ms s1=189.5ms wait=0.1/45.1ms pred gate=device Token # 408: 114.340ms; value: next_token_ids=tensor([2878], device='cuda:0') mtp accept=0 prop=1275 top1=2878 accp=0.080 next=draft=73466 prop=4383 olap pair=109.1ms serial=193.8ms gain=84.7ms ratio=0.44 s0=3.9ms s1=189.8ms wait=0.1/46.0ms pred gate=device Token # 409: 114.837ms; value: next_token_ids=tensor([73466], device='cuda:0') mtp accept=0 prop=4383 top1=73466 accp=0.650 next=draft=1237 prop=1237 olap pair=109.5ms serial=194.6ms gain=85.1ms ratio=0.44 s0=4.0ms s1=190.7ms wait=0.1/46.0ms pred gate=device Token # 410: 116.508ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=draft=82 prop=82 olap pair=111.2ms serial=197.6ms gain=86.4ms ratio=0.44 s0=4.4ms s1=193.1ms wait=0.1/44.8ms pred gate=device Token # 411: 3.827ms; value: next_token_ids=tensor([56560], device='cuda:0') mtp accept=0 prop=82 top1=56560 accp=0.001 next=pair draft=1761 prop=1761 pred gate=device Token # 412: 114.735ms; value: next_token_ids=tensor([1761], device='cuda:0') mtp accept=1 prop=1761 top1=1761 accp=0.998 next=draft=2471 prop=2471 olap pair=109.4ms serial=193.9ms gain=84.5ms ratio=0.44 s0=4.7ms s1=189.1ms wait=0.1/44.6ms pred gate=device Token # 413: 3.807ms; value: next_token_ids=tensor([2471], device='cuda:0') mtp accept=1 prop=2471 top1=59700 accp=0.271 next=pair draft=1227 prop=1227 pred gate=device Token # 414: 114.271ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=1 prop=1227 top1=1227 accp=1.000 next=draft=637 prop=637 olap pair=109.1ms serial=193.7ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.4ms wait=0.1/45.4ms pred gate=device Token # 415: 3.704ms; value: next_token_ids=tensor([637], device='cuda:0') mtp accept=1 prop=637 top1=637 accp=0.988 next=pair draft=38186 prop=6710 pred gate=device Token # 416: 114.458ms; value: next_token_ids=tensor([6710], device='cuda:0') mtp accept=1 prop=6710 top1=38186 accp=0.530 next=draft=34408 prop=34408 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.7ms wait=0.1/45.2ms pred gate=device Token # 417: 3.712ms; value: next_token_ids=tensor([34408], device='cuda:0') mtp accept=1 prop=34408 top1=34408 accp=0.985 next=pair draft=1728 prop=1728 pred gate=device Token # 418: 114.085ms; value: next_token_ids=tensor([15038], device='cuda:0') mtp accept=0 prop=1728 top1=15038 accp=0.317 next=draft=38385 prop=38385 olap pair=108.8ms serial=193.2ms gain=84.4ms ratio=0.44 s0=4.3ms s1=188.9ms wait=0.1/45.4ms pred gate=device Token # 419: 115.019ms; value: next_token_ids=tensor([37227], device='cuda:0') mtp accept=0 prop=38385 top1=37227 accp=0.100 next=draft=320 prop=320 olap pair=109.8ms serial=195.0ms gain=85.3ms ratio=0.44 s0=4.2ms s1=190.8ms wait=0.1/45.3ms pred gate=device Token # 420: 114.670ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.979 next=draft=36101 prop=36101 olap pair=109.2ms serial=194.0ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.7ms wait=0.1/45.4ms pred gate=device Token # 421: 3.821ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=0 prop=36101 top1=445 accp=0.002 next=pair draft=36101 prop=36101 pred gate=device Token # 422: 114.239ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=1.000 next=draft=625 prop=625 olap pair=108.9ms serial=193.4ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.1ms wait=0.1/45.2ms pred gate=device Token # 423: 3.822ms; value: next_token_ids=tensor([625], device='cuda:0') mtp accept=1 prop=625 top1=625 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 424: 114.670ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=9308 prop=1263 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.2ms s1=190.2ms wait=0.1/45.4ms pred gate=device Token # 425: 3.771ms; value: next_token_ids=tensor([3660], device='cuda:0') mtp accept=0 prop=1263 top1=3660 accp=0.180 next=pair draft=45276 prop=45276 pred gate=device Token # 426: 114.481ms; value: next_token_ids=tensor([45276], device='cuda:0') mtp accept=1 prop=45276 top1=45276 accp=0.878 next=draft=25024 prop=25024 olap pair=109.2ms serial=194.0ms gain=84.8ms ratio=0.44 s0=3.8ms s1=190.2ms wait=0.1/46.3ms pred gate=device Token # 427: 3.755ms; value: next_token_ids=tensor([25024], device='cuda:0') mtp accept=1 prop=25024 top1=25024 accp=0.993 next=pair draft=303 prop=303 pred gate=device Token # 428: 114.395ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=9308 prop=9308 olap pair=109.1ms serial=193.8ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.6ms wait=0.1/45.1ms pred gate=device Token # 429: 3.722ms; value: next_token_ids=tensor([34408], device='cuda:0') mtp accept=0 prop=9308 top1=34408 accp=0.329 next=pair draft=1728 prop=1728 pred gate=device Token # 430: 114.715ms; value: next_token_ids=tensor([1728], device='cuda:0') mtp accept=1 prop=1728 top1=1728 accp=1.000 next=draft=66518 prop=66518 olap pair=109.4ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.1ms wait=0.1/45.2ms pred gate=device Token # 431: 3.862ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=0.968 next=pair draft=673 prop=673 pred gate=device Token # 432: 115.189ms; value: next_token_ids=tensor([78816], device='cuda:0') mtp accept=0 prop=673 top1=673 accp=0.509 next=draft=1057 prop=1057 olap pair=110.0ms serial=195.6ms gain=85.6ms ratio=0.44 s0=4.2ms s1=191.3ms wait=0.1/45.2ms pred gate=device Token # 433: 114.834ms; value: next_token_ids=tensor([1959], device='cuda:0') mtp accept=0 prop=1057 top1=1959 accp=0.001 next=draft=25024 prop=25024 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/45.3ms pred gate=device Token # 434: 114.874ms; value: next_token_ids=tensor([25024], device='cuda:0') mtp accept=1 prop=25024 top1=25024 accp=1.000 next=draft=27362 prop=27362 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=4.2ms s1=190.5ms wait=0.1/45.3ms pred gate=device Token # 435: 3.728ms; value: next_token_ids=tensor([29697], device='cuda:0') mtp accept=0 prop=27362 top1=29697 accp=0.034 next=pair draft=2490 prop=2490 pred gate=device Token # 436: 115.161ms; value: next_token_ids=tensor([7206], device='cuda:0') mtp accept=0 prop=2490 top1=7206 accp=0.107 next=draft=7849 prop=7849 olap pair=109.9ms serial=195.3ms gain=85.4ms ratio=0.44 s0=3.7ms s1=191.6ms wait=0.1/46.3ms pred gate=device Token # 437: 114.492ms; value: next_token_ids=tensor([13110], device='cuda:0') mtp accept=0 prop=7849 top1=13110 accp=0.246 next=draft=6034 prop=6034 olap pair=109.1ms serial=193.8ms gain=84.8ms ratio=0.44 s0=3.7ms s1=190.2ms wait=0.1/46.4ms pred gate=device Token # 438: 114.847ms; value: next_token_ids=tensor([7046], device='cuda:0') mtp accept=0 prop=6034 top1=7046 accp=0.054 next=draft=303 prop=303 olap pair=109.5ms serial=194.5ms gain=85.1ms ratio=0.44 s0=3.7ms s1=190.8ms wait=0.1/46.4ms pred gate=device Token # 439: 115.086ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.990 next=draft=7849 prop=7849 olap pair=109.8ms serial=195.2ms gain=85.4ms ratio=0.44 s0=3.7ms s1=191.5ms wait=0.1/46.4ms pred gate=device Token # 440: 3.715ms; value: next_token_ids=tensor([3007], device='cuda:0') mtp accept=0 prop=7849 top1=3007 accp=0.220 next=pair draft=7046 prop=7046 pred gate=device Token # 441: 114.430ms; value: next_token_ids=tensor([7046], device='cuda:0') mtp accept=1 prop=7046 top1=7046 accp=0.726 next=draft=1263 prop=1263 olap pair=109.2ms serial=194.0ms gain=84.8ms ratio=0.44 s0=3.7ms s1=190.3ms wait=0.1/46.4ms pred gate=device Token # 442: 3.738ms; value: next_token_ids=tensor([52951], device='cuda:0') mtp accept=0 prop=1263 top1=52951 accp=0.006 next=pair draft=6034 prop=6034 pred gate=device Token # 443: 114.895ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=draft=572 prop=572 olap pair=109.5ms serial=194.7ms gain=85.2ms ratio=0.44 s0=3.7ms s1=191.0ms wait=0.1/46.5ms pred gate=device Token # 444: 3.697ms; value: next_token_ids=tensor([572], device='cuda:0') mtp accept=1 prop=572 top1=572 accp=1.000 next=pair draft=3437 prop=3437 pred gate=device Token # 445: 114.373ms; value: next_token_ids=tensor([3437], device='cuda:0') mtp accept=1 prop=3437 top1=3437 accp=0.951 next=draft=4339 prop=4339 olap pair=109.1ms serial=193.7ms gain=84.7ms ratio=0.44 s0=3.8ms s1=189.9ms wait=0.1/46.2ms pred gate=device Token # 446: 3.756ms; value: next_token_ids=tensor([4398], device='cuda:0') mtp accept=0 prop=4339 top1=4398 accp=0.212 next=pair draft=3007 prop=7557 pred gate=device Token # 447: 114.486ms; value: next_token_ids=tensor([7557], device='cuda:0') mtp accept=1 prop=7557 top1=7557 accp=0.562 next=draft=25024 prop=25024 olap pair=109.2ms serial=193.9ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.6ms wait=0.1/45.2ms pred gate=device Token # 448: 3.791ms; value: next_token_ids=tensor([25024], device='cuda:0') mtp accept=1 prop=25024 top1=25024 accp=0.998 next=pair draft=1237 prop=1237 pred gate=device Token # 449: 114.847ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=draft=445 prop=445 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.1ms s1=190.4ms wait=0.1/45.5ms pred gate=device Token # 450: 3.724ms; value: next_token_ids=tensor([2204], device='cuda:0') mtp accept=0 prop=445 top1=2204 accp=0.014 next=pair draft=4162 prop=4162 pred gate=device Token # 451: 114.343ms; value: next_token_ids=tensor([60540], device='cuda:0') mtp accept=0 prop=4162 top1=60540 accp=0.088 next=draft=16913 prop=16913 olap pair=109.1ms serial=193.6ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.3ms wait=0.1/45.2ms pred gate=device Token # 452: 114.866ms; value: next_token_ids=tensor([16913], device='cuda:0') mtp accept=1 prop=16913 top1=16913 accp=0.894 next=draft=19 prop=19 olap pair=109.5ms serial=194.5ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/45.2ms pred gate=device Token # 453: 3.737ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=1227 prop=1227 pred gate=device Token # 454: 114.155ms; value: next_token_ids=tensor([14164], device='cuda:0') mtp accept=0 prop=1227 top1=14164 accp=0.411 next=draft=18332 prop=18332 olap pair=108.9ms serial=193.4ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.1ms wait=0.1/45.2ms pred gate=device Token # 455: 114.627ms; value: next_token_ids=tensor([2803], device='cuda:0') mtp accept=0 prop=18332 top1=2803 accp=0.185 next=draft=303 prop=303 olap pair=109.3ms serial=194.2ms gain=84.9ms ratio=0.44 s0=4.2ms s1=190.0ms wait=0.1/45.2ms pred gate=device Token # 456: 114.354ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.951 next=draft=1140 prop=3660 olap pair=108.9ms serial=193.4ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.2ms wait=0.1/45.3ms pred gate=device Token # 457: 3.708ms; value: next_token_ids=tensor([1140], device='cuda:0') mtp accept=0 prop=3660 top1=1140 accp=0.576 next=pair draft=2382 prop=2382 pred gate=device Token # 458: 114.162ms; value: next_token_ids=tensor([2382], device='cuda:0') mtp accept=1 prop=2382 top1=2382 accp=0.997 next=draft=92 prop=92 olap pair=108.8ms serial=193.1ms gain=84.3ms ratio=0.44 s0=4.3ms s1=188.8ms wait=0.1/45.3ms pred gate=device Token # 459: 3.784ms; value: next_token_ids=tensor([92], device='cuda:0') mtp accept=1 prop=92 top1=92 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 460: 114.544ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=19 prop=19 olap pair=109.2ms serial=194.0ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.7ms wait=0.1/45.2ms pred gate=device Token # 461: 3.710ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=625 prop=625 pred gate=device Token # 462: 114.582ms; value: next_token_ids=tensor([625], device='cuda:0') mtp accept=1 prop=625 top1=625 accp=1.000 next=draft=303 prop=303 olap pair=109.1ms serial=193.6ms gain=84.4ms ratio=0.44 s0=4.4ms s1=189.2ms wait=0.2/45.1ms pred gate=device Token # 463: 3.794ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=34408 prop=34408 pred gate=device Token # 464: 114.612ms; value: next_token_ids=tensor([34408], device='cuda:0') mtp accept=1 prop=34408 top1=34408 accp=0.894 next=draft=1728 prop=1728 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.0ms wait=0.1/45.2ms pred gate=device Token # 465: 3.760ms; value: next_token_ids=tensor([1728], device='cuda:0') mtp accept=1 prop=1728 top1=1728 accp=1.000 next=pair draft=66518 prop=66518 pred gate=device Token # 466: 114.371ms; value: next_token_ids=tensor([1380], device='cuda:0') mtp accept=0 prop=66518 top1=1380 accp=0.307 next=draft=11035 prop=11035 olap pair=109.2ms serial=193.9ms gain=84.7ms ratio=0.44 s0=4.2ms s1=189.7ms wait=0.1/45.3ms pred gate=device Token # 467: 114.223ms; value: next_token_ids=tensor([11035], device='cuda:0') mtp accept=1 prop=11035 top1=11035 accp=1.000 next=draft=10730 prop=6190 olap pair=108.9ms serial=193.4ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.1ms wait=0.1/45.2ms pred gate=device Token # 468: 3.737ms; value: next_token_ids=tensor([10730], device='cuda:0') mtp accept=0 prop=6190 top1=10730 accp=0.684 next=pair draft=673 prop=673 pred gate=device Token # 469: 114.516ms; value: next_token_ids=tensor([673], device='cuda:0') mtp accept=1 prop=673 top1=673 accp=0.636 next=draft=60555 prop=60555 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.8ms wait=0.1/45.3ms pred gate=device Token # 470: 3.744ms; value: next_token_ids=tensor([60555], device='cuda:0') mtp accept=1 prop=60555 top1=60555 accp=0.968 next=pair draft=303 prop=303 pred gate=device Token # 471: 114.588ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.999 next=draft=2524 prop=2524 olap pair=109.3ms serial=193.9ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.6ms wait=0.1/45.2ms pred gate=device Token # 472: 3.701ms; value: next_token_ids=tensor([2524], device='cuda:0') mtp accept=1 prop=2524 top1=2524 accp=1.000 next=pair draft=40092 prop=40092 pred gate=device Token # 473: 114.690ms; value: next_token_ids=tensor([40092], device='cuda:0') mtp accept=1 prop=40092 top1=40092 accp=0.981 next=draft=25024 prop=6034 olap pair=109.5ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.2ms s1=190.1ms wait=0.1/45.4ms pred gate=device Token # 474: 3.771ms; value: next_token_ids=tensor([25024], device='cuda:0') mtp accept=0 prop=6034 top1=25024 accp=0.659 next=pair draft=303 prop=303 pred gate=device Token # 475: 114.603ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=0 prop=303 top1=445 accp=0.152 next=draft=34408 prop=34408 olap pair=109.3ms serial=194.2ms gain=84.9ms ratio=0.44 s0=3.7ms s1=190.5ms wait=0.1/46.4ms pred gate=device Token # 476: 114.887ms; value: next_token_ids=tensor([34408], device='cuda:0') mtp accept=1 prop=34408 top1=34408 accp=0.917 next=draft=1728 prop=1728 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.2ms s1=190.3ms wait=0.1/45.3ms pred gate=device Token # 477: 9.901ms; value: next_token_ids=tensor([16533], device='cuda:0') mtp accept=0 prop=1728 top1=16533 accp=0.101 next=pair draft=14643 prop=14643 pred gate=device Token # 478: 114.341ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=14643 top1=303 accp=0.106 next=draft=6525 prop=6525 olap pair=109.1ms serial=193.5ms gain=84.4ms ratio=0.44 s0=4.3ms s1=189.2ms wait=0.1/45.1ms pred gate=device Token # 479: 114.382ms; value: next_token_ids=tensor([6525], device='cuda:0') mtp accept=1 prop=6525 top1=6525 accp=0.917 next=draft=3437 prop=3437 olap pair=109.1ms serial=193.8ms gain=84.7ms ratio=0.44 s0=4.2ms s1=189.6ms wait=0.1/45.4ms pred gate=device Token # 480: 3.721ms; value: next_token_ids=tensor([38922], device='cuda:0') mtp accept=0 prop=3437 top1=38186 accp=0.002 next=pair draft=13097 prop=13097 pred gate=device Token # 481: 114.255ms; value: next_token_ids=tensor([13097], device='cuda:0') mtp accept=1 prop=13097 top1=13097 accp=0.939 next=draft=6034 prop=6034 olap pair=108.9ms serial=193.4ms gain=84.5ms ratio=0.44 s0=4.0ms s1=189.4ms wait=0.1/45.7ms pred gate=device Token # 482: 3.792ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=0.882 next=pair draft=3437 prop=3437 pred gate=device Token # 483: 114.155ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=0 prop=3437 top1=1237 accp=0.005 next=draft=7849 prop=7849 olap pair=108.9ms serial=193.5ms gain=84.5ms ratio=0.44 s0=3.9ms s1=189.5ms wait=0.1/45.9ms pred gate=device Token # 484: 114.416ms; value: next_token_ids=tensor([40092], device='cuda:0') mtp accept=0 prop=7849 top1=40092 accp=0.017 next=draft=6034 prop=6034 olap pair=109.2ms serial=193.8ms gain=84.7ms ratio=0.44 s0=4.2ms s1=189.6ms wait=0.1/45.2ms pred gate=device Token # 485: 114.780ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=0.774 next=draft=55197 prop=55197 olap pair=109.3ms serial=194.2ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.0ms wait=0.1/45.3ms pred gate=device Token # 486: 3.735ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=0 prop=55197 top1=445 accp=0.492 next=pair draft=4339 prop=4339 pred gate=device Token # 487: 114.966ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=0.985 next=draft=303 prop=303 olap pair=109.7ms serial=195.0ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.7ms wait=0.1/45.3ms pred gate=device Token # 488: 3.759ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=3442 prop=3442 pred gate=device Token # 489: 114.875ms; value: next_token_ids=tensor([3442], device='cuda:0') mtp accept=1 prop=3442 top1=3442 accp=0.793 next=draft=6034 prop=6034 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=4.2ms s1=190.6ms wait=0.1/45.3ms pred gate=device Token # 490: 3.849ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=pair draft=445 prop=445 pred gate=device Token # 491: 114.852ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=1 prop=445 top1=445 accp=0.957 next=draft=15900 prop=15900 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.3ms wait=0.1/45.2ms pred gate=device Token # 492: 3.775ms; value: next_token_ids=tensor([15900], device='cuda:0') mtp accept=1 prop=15900 top1=15900 accp=0.971 next=pair draft=14164 prop=30904 pred gate=device Token # 493: 115.070ms; value: next_token_ids=tensor([30904], device='cuda:0') mtp accept=1 prop=30904 top1=30904 accp=0.364 next=draft=15 prop=15 olap pair=109.8ms serial=195.2ms gain=85.4ms ratio=0.44 s0=3.8ms s1=191.4ms wait=0.1/46.3ms pred gate=device Token # 494: 3.679ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=pair draft=5949 prop=5949 pred gate=device Token # 495: 114.845ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=0 prop=5949 top1=2619 accp=0.000 next=draft=36101 prop=36101 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=4.2ms s1=190.4ms wait=0.1/45.2ms pred gate=device Token # 496: 114.724ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=1.000 next=draft=17520 prop=17520 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.2ms s1=190.2ms wait=0.1/45.3ms pred gate=device Token # 497: 3.650ms; value: next_token_ids=tensor([17520], device='cuda:0') mtp accept=1 prop=17520 top1=17520 accp=1.000 next=pair draft=666 prop=666 pred gate=device Token # 498: 115.181ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=768 prop=768 olap pair=109.9ms serial=195.2ms gain=85.3ms ratio=0.44 s0=4.1ms s1=191.1ms wait=0.1/45.7ms pred gate=device Token # 499: 3.695ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=445 prop=445 pred gate=device Token # 500: 114.914ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=1 prop=445 top1=445 accp=0.996 next=draft=2382 prop=2382 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=4.2ms s1=190.5ms wait=0.1/45.4ms pred gate=device Token # 501: 3.768ms; value: next_token_ids=tensor([2382], device='cuda:0') mtp accept=1 prop=2382 top1=2382 accp=1.000 next=pair draft=92 prop=92 pred gate=device Token # 502: 115.254ms; value: next_token_ids=tensor([92], device='cuda:0') mtp accept=1 prop=92 top1=92 accp=1.000 next=draft=31 prop=31 olap pair=110.1ms serial=195.6ms gain=85.5ms ratio=0.44 s0=4.3ms s1=191.3ms wait=0.1/45.3ms pred gate=device Token # 503: 3.801ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 504: 115.379ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=301 prop=301 olap pair=110.2ms serial=195.9ms gain=85.7ms ratio=0.44 s0=3.8ms s1=192.1ms wait=0.1/46.3ms pred gate=device Token # 505: 3.758ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=0.996 next=pair draft=36101 prop=36101 pred gate=device Token # 506: 114.899ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=1.000 next=draft=525 prop=525 olap pair=109.6ms serial=194.8ms gain=85.1ms ratio=0.44 s0=3.7ms s1=191.0ms wait=0.1/46.5ms pred gate=device Token # 507: 3.792ms; value: next_token_ids=tensor([525], device='cuda:0') mtp accept=1 prop=525 top1=525 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 508: 115.302ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=34408 prop=34408 olap pair=110.0ms serial=195.6ms gain=85.6ms ratio=0.44 s0=3.6ms s1=191.9ms wait=0.1/46.6ms pred gate=device Token # 509: 3.720ms; value: next_token_ids=tensor([34408], device='cuda:0') mtp accept=1 prop=34408 top1=34408 accp=1.000 next=pair draft=1728 prop=1728 pred gate=device Token # 510: 115.714ms; value: next_token_ids=tensor([1728], device='cuda:0') mtp accept=1 prop=1728 top1=1728 accp=1.000 next=draft=66518 prop=66518 olap pair=110.5ms serial=195.8ms gain=85.3ms ratio=0.44 s0=3.8ms s1=191.9ms wait=0.1/46.3ms pred gate=device Token # 511: 3.719ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=0.997 next=pair draft=7730 prop=7730 pred gate=device Token # 512: 115.131ms; value: next_token_ids=tensor([422], device='cuda:0') mtp accept=0 prop=7730 top1=422 accp=0.275 next=draft=18580 prop=18580 olap pair=109.9ms serial=195.4ms gain=85.5ms ratio=0.44 s0=3.6ms s1=191.7ms wait=0.1/46.5ms pred gate=device Token # 513: 115.669ms; value: next_token_ids=tensor([18580], device='cuda:0') mtp accept=1 prop=18580 top1=18580 accp=1.000 next=draft=303 prop=303 olap pair=110.4ms serial=196.2ms gain=85.8ms ratio=0.44 s0=3.6ms s1=192.5ms wait=0.1/46.5ms pred gate=device Token # 514: 3.736ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.928 next=pair draft=2524 prop=2524 pred gate=device Token # 515: 114.825ms; value: next_token_ids=tensor([45104], device='cuda:0') mtp accept=0 prop=2524 top1=45104 accp=0.245 next=draft=40401 prop=40401 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=3.8ms s1=190.9ms wait=0.1/46.4ms pred gate=device Token # 516: 115.504ms; value: next_token_ids=tensor([6525], device='cuda:0') mtp accept=0 prop=40401 top1=6525 accp=0.168 next=draft=5660 prop=4182 olap pair=110.3ms serial=195.7ms gain=85.5ms ratio=0.44 s0=4.0ms s1=191.7ms wait=0.1/45.9ms pred gate=device Token # 517: 115.461ms; value: next_token_ids=tensor([2490], device='cuda:0') mtp accept=0 prop=4182 top1=4354 accp=0.198 next=draft=34408 prop=34408 olap pair=110.1ms serial=195.2ms gain=85.1ms ratio=0.44 s0=4.2ms s1=191.0ms wait=0.1/45.5ms pred gate=device Token # 518: 114.934ms; value: next_token_ids=tensor([13097], device='cuda:0') mtp accept=0 prop=34408 top1=13097 accp=0.041 next=draft=25024 prop=25024 olap pair=109.6ms serial=194.8ms gain=85.2ms ratio=0.44 s0=3.8ms s1=191.0ms wait=0.1/46.3ms pred gate=device Token # 519: 116.219ms; value: next_token_ids=tensor([1295], device='cuda:0') mtp accept=0 prop=25024 top1=1295 accp=0.013 next=draft=14454 prop=14454 olap pair=110.3ms serial=194.8ms gain=84.5ms ratio=0.43 s0=8.0ms s1=186.8ms wait=0.2/41.6ms pred gate=device Token # 520: 115.399ms; value: next_token_ids=tensor([14454], device='cuda:0') mtp accept=1 prop=14454 top1=14454 accp=1.000 next=draft=38186 prop=38186 olap pair=110.1ms serial=195.6ms gain=85.5ms ratio=0.44 s0=3.9ms s1=191.8ms wait=0.1/46.1ms pred gate=device Token # 521: 3.765ms; value: next_token_ids=tensor([38186], device='cuda:0') mtp accept=1 prop=38186 top1=38186 accp=0.670 next=pair draft=34408 prop=34408 pred gate=device Token # 522: 115.619ms; value: next_token_ids=tensor([34408], device='cuda:0') mtp accept=1 prop=34408 top1=34408 accp=1.000 next=draft=1728 prop=1728 olap pair=110.3ms serial=196.1ms gain=85.8ms ratio=0.44 s0=3.8ms s1=192.3ms wait=0.1/46.4ms pred gate=device Token # 523: 3.732ms; value: next_token_ids=tensor([1728], device='cuda:0') mtp accept=1 prop=1728 top1=1728 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 524: 114.941ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.999 next=draft=6911 prop=6911 olap pair=109.7ms serial=195.0ms gain=85.3ms ratio=0.44 s0=3.6ms s1=191.4ms wait=0.1/46.5ms pred gate=device Token # 525: 3.676ms; value: next_token_ids=tensor([6911], device='cuda:0') mtp accept=1 prop=6911 top1=8494 accp=0.283 next=pair draft=6034 prop=6034 pred gate=device Token # 526: 115.275ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=draft=63239 prop=63239 olap pair=110.1ms serial=195.6ms gain=85.6ms ratio=0.44 s0=3.6ms s1=192.0ms wait=0.1/46.6ms pred gate=device Token # 527: 3.687ms; value: next_token_ids=tensor([63239], device='cuda:0') mtp accept=1 prop=63239 top1=63239 accp=0.966 next=pair draft=2467 prop=2467 pred gate=device Token # 528: 114.995ms; value: next_token_ids=tensor([2467], device='cuda:0') mtp accept=1 prop=2467 top1=2467 accp=0.733 next=draft=478 prop=303 olap pair=109.7ms serial=195.0ms gain=85.3ms ratio=0.44 s0=3.6ms s1=191.4ms wait=0.1/46.5ms pred gate=device Token # 529: 3.711ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.336 next=pair draft=16303 prop=16303 pred gate=device Token # 530: 115.170ms; value: next_token_ids=tensor([45045], device='cuda:0') mtp accept=0 prop=16303 top1=45045 accp=0.177 next=draft=5133 prop=5133 olap pair=110.0ms serial=195.1ms gain=85.2ms ratio=0.44 s0=4.2ms s1=191.0ms wait=0.1/45.4ms pred gate=device Token # 531: 115.229ms; value: next_token_ids=tensor([2431], device='cuda:0') mtp accept=0 prop=5133 top1=2431 accp=0.296 next=draft=30999 prop=30999 olap pair=109.9ms serial=195.5ms gain=85.6ms ratio=0.44 s0=3.7ms s1=191.8ms wait=0.1/46.5ms pred gate=device Token # 532: 115.852ms; value: next_token_ids=tensor([17541], device='cuda:0') mtp accept=0 prop=30999 top1=17545 accp=0.021 next=draft=5133 prop=5133 olap pair=110.5ms serial=196.4ms gain=85.9ms ratio=0.44 s0=3.9ms s1=192.5ms wait=0.1/46.1ms pred gate=device Token # 533: 116.040ms; value: next_token_ids=tensor([5133], device='cuda:0') mtp accept=1 prop=5133 top1=5133 accp=0.963 next=draft=478 prop=478 olap pair=110.6ms serial=196.4ms gain=85.8ms ratio=0.44 s0=4.0ms s1=192.4ms wait=0.1/46.1ms pred gate=device Token # 534: 3.782ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=0 prop=478 top1=1237 accp=0.505 next=pair draft=2524 prop=2524 pred gate=device Token # 535: 114.570ms; value: next_token_ids=tensor([4389], device='cuda:0') mtp accept=0 prop=2524 top1=4389 accp=0.508 next=draft=16303 prop=16303 olap pair=109.2ms serial=193.8ms gain=84.6ms ratio=0.44 s0=3.7ms s1=190.1ms wait=0.1/46.4ms pred gate=device Token # 536: 114.961ms; value: next_token_ids=tensor([16303], device='cuda:0') mtp accept=1 prop=16303 top1=16303 accp=0.999 next=draft=100642 prop=100642 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=3.7ms s1=191.3ms wait=0.1/46.5ms pred gate=device Token # 537: 3.695ms; value: next_token_ids=tensor([100642], device='cuda:0') mtp accept=1 prop=100642 top1=100642 accp=1.000 next=pair draft=24605 prop=24605 pred gate=device Token # 538: 115.680ms; value: next_token_ids=tensor([24605], device='cuda:0') mtp accept=1 prop=24605 top1=24605 accp=0.919 next=draft=372 prop=372 olap pair=110.5ms serial=196.4ms gain=85.9ms ratio=0.44 s0=3.6ms s1=192.8ms wait=0.1/46.5ms pred gate=device Token # 539: 3.771ms; value: next_token_ids=tensor([372], device='cuda:0') mtp accept=1 prop=372 top1=372 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 540: 114.838ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=22 prop=22 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=3.7ms s1=191.2ms wait=0.1/46.6ms pred gate=device Token # 541: 3.690ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 542: 114.587ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=223 prop=223 olap pair=109.2ms serial=194.0ms gain=84.8ms ratio=0.44 s0=3.7ms s1=190.3ms wait=0.1/46.4ms pred gate=device Token # 543: 3.773ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=10756 prop=10756 pred gate=device Token # 544: 115.810ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=1 prop=10756 top1=10756 accp=1.000 next=draft=66518 prop=66518 olap pair=110.6ms serial=196.5ms gain=85.9ms ratio=0.44 s0=3.8ms s1=192.7ms wait=0.1/46.3ms pred gate=device Token # 545: 3.680ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=pair draft=343 prop=343 pred gate=device Token # 546: 114.904ms; value: next_token_ids=tensor([343], device='cuda:0') mtp accept=1 prop=343 top1=343 accp=1.000 next=draft=7989 prop=7989 olap pair=109.7ms serial=194.7ms gain=85.0ms ratio=0.44 s0=4.1ms s1=190.6ms wait=0.1/45.7ms pred gate=device Token # 547: 3.707ms; value: next_token_ids=tensor([7989], device='cuda:0') mtp accept=1 prop=7989 top1=7989 accp=1.000 next=pair draft=682 prop=682 pred gate=device Token # 548: 115.199ms; value: next_token_ids=tensor([682], device='cuda:0') mtp accept=1 prop=682 top1=682 accp=1.000 next=draft=15 prop=15 olap pair=109.9ms serial=194.8ms gain=84.9ms ratio=0.44 s0=4.0ms s1=190.8ms wait=0.1/46.0ms pred gate=device Token # 549: 9.963ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=pair draft=437 prop=437 pred gate=device Token # 550: 114.631ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=0 prop=437 top1=2619 accp=0.001 next=draft=14087 prop=14087 olap pair=109.4ms serial=194.4ms gain=84.9ms ratio=0.44 s0=3.8ms s1=190.5ms wait=0.1/46.2ms pred gate=device Token # 551: 114.942ms; value: next_token_ids=tensor([14087], device='cuda:0') mtp accept=1 prop=14087 top1=14087 accp=0.626 next=draft=666 prop=666 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=3.7ms s1=191.0ms wait=0.1/46.6ms pred gate=device Token # 552: 3.759ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=pair draft=768 prop=768 pred gate=device Token # 553: 115.119ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=53825 prop=53825 olap pair=109.9ms serial=195.4ms gain=85.5ms ratio=0.44 s0=3.6ms s1=191.7ms wait=0.1/46.5ms pred gate=device Token # 554: 3.680ms; value: next_token_ids=tensor([53825], device='cuda:0') mtp accept=1 prop=53825 top1=53825 accp=0.922 next=pair draft=14769 prop=14769 pred gate=device Token # 555: 114.734ms; value: next_token_ids=tensor([14769], device='cuda:0') mtp accept=1 prop=14769 top1=14769 accp=0.977 next=draft=10756 prop=10756 olap pair=109.5ms serial=194.6ms gain=85.1ms ratio=0.44 s0=3.6ms s1=191.0ms wait=0.1/46.5ms pred gate=device Token # 556: 3.708ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=1 prop=10756 top1=10756 accp=1.000 next=pair draft=8842 prop=8842 pred gate=device Token # 557: 114.958ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=1.000 next=draft=1237 prop=1237 olap pair=109.6ms serial=194.8ms gain=85.2ms ratio=0.44 s0=3.6ms s1=191.1ms wait=0.1/46.6ms pred gate=device Token # 558: 3.737ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=48076 prop=48076 pred gate=device Token # 559: 115.143ms; value: next_token_ids=tensor([48076], device='cuda:0') mtp accept=1 prop=48076 top1=48076 accp=0.958 next=draft=10672 prop=10672 olap pair=109.8ms serial=195.2ms gain=85.4ms ratio=0.44 s0=3.8ms s1=191.4ms wait=0.1/46.4ms pred gate=device Token # 560: 3.710ms; value: next_token_ids=tensor([10672], device='cuda:0') mtp accept=1 prop=10672 top1=10672 accp=1.000 next=pair draft=294 prop=294 pred gate=device Token # 561: 114.480ms; value: next_token_ids=tensor([294], device='cuda:0') mtp accept=1 prop=294 top1=294 accp=1.000 next=draft=58124 prop=58124 olap pair=109.2ms serial=193.8ms gain=84.6ms ratio=0.44 s0=4.0ms s1=189.7ms wait=0.1/45.7ms pred gate=device Token # 562: 3.698ms; value: next_token_ids=tensor([58124], device='cuda:0') mtp accept=1 prop=58124 top1=58124 accp=1.000 next=pair draft=303 prop=14 pred gate=device Token # 563: 114.587ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=0.446 next=draft=10192 prop=10192 olap pair=109.3ms serial=194.1ms gain=84.9ms ratio=0.44 s0=3.8ms s1=190.4ms wait=0.1/46.3ms pred gate=device Token # 564: 3.772ms; value: next_token_ids=tensor([10192], device='cuda:0') mtp accept=1 prop=10192 top1=10192 accp=1.000 next=pair draft=39 prop=39 pred gate=device Token # 565: 114.823ms; value: next_token_ids=tensor([39], device='cuda:0') mtp accept=1 prop=39 top1=39 accp=1.000 next=draft=14164 prop=14164 olap pair=109.6ms serial=194.7ms gain=85.2ms ratio=0.44 s0=3.7ms s1=191.0ms wait=0.1/46.4ms pred gate=device Token # 566: 3.733ms; value: next_token_ids=tensor([14164], device='cuda:0') mtp accept=1 prop=14164 top1=14164 accp=0.944 next=pair draft=445 prop=445 pred gate=device Token # 567: 114.593ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=1 prop=445 top1=445 accp=0.717 next=draft=28608 prop=28608 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=3.9ms s1=190.4ms wait=0.1/46.2ms pred gate=device Token # 568: 3.775ms; value: next_token_ids=tensor([28608], device='cuda:0') mtp accept=1 prop=28608 top1=28608 accp=0.958 next=pair draft=39 prop=39 pred gate=device Token # 569: 114.923ms; value: next_token_ids=tensor([39], device='cuda:0') mtp accept=1 prop=39 top1=39 accp=1.000 next=draft=103971 prop=103971 olap pair=109.6ms serial=194.8ms gain=85.2ms ratio=0.44 s0=3.7ms s1=191.1ms wait=0.1/46.5ms pred gate=device Token # 570: 3.807ms; value: next_token_ids=tensor([103971], device='cuda:0') mtp accept=1 prop=103971 top1=103971 accp=0.986 next=pair draft=303 prop=303 pred gate=device Token # 571: 114.111ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=8842 prop=8842 olap pair=108.9ms serial=193.0ms gain=84.1ms ratio=0.44 s0=4.2ms s1=188.8ms wait=0.1/45.5ms pred gate=device Token # 572: 3.675ms; value: next_token_ids=tensor([13892], device='cuda:0') mtp accept=0 prop=8842 top1=13892 accp=0.042 next=pair draft=2827 prop=2827 pred gate=device Token # 573: 114.676ms; value: next_token_ids=tensor([2827], device='cuda:0') mtp accept=1 prop=2827 top1=2827 accp=1.000 next=draft=12519 prop=12519 olap pair=109.4ms serial=194.0ms gain=84.7ms ratio=0.44 s0=4.2ms s1=189.8ms wait=0.1/45.5ms pred gate=device Token # 574: 3.722ms; value: next_token_ids=tensor([12519], device='cuda:0') mtp accept=1 prop=12519 top1=12519 accp=1.000 next=pair draft=13097 prop=13097 pred gate=device Token # 575: 114.688ms; value: next_token_ids=tensor([13097], device='cuda:0') mtp accept=1 prop=13097 top1=13097 accp=1.000 next=draft=10756 prop=10756 olap pair=109.4ms serial=194.1ms gain=84.8ms ratio=0.44 s0=3.8ms s1=190.3ms wait=0.1/46.4ms pred gate=device Token # 576: 3.699ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=1 prop=10756 top1=10756 accp=1.000 next=pair draft=1237 prop=1237 pred gate=device Token # 577: 114.581ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=0.994 next=draft=65045 prop=65045 olap pair=109.4ms serial=194.0ms gain=84.6ms ratio=0.44 s0=4.1ms s1=189.9ms wait=0.1/45.6ms pred gate=device Token # 578: 3.664ms; value: next_token_ids=tensor([65045], device='cuda:0') mtp accept=1 prop=65045 top1=65045 accp=0.802 next=pair draft=760 prop=68318 pred gate=device Token # 579: 114.455ms; value: next_token_ids=tensor([974], device='cuda:0') mtp accept=0 prop=68318 top1=974 accp=0.062 next=draft=17862 prop=17862 olap pair=109.2ms serial=194.0ms gain=84.8ms ratio=0.44 s0=3.7ms s1=190.3ms wait=0.1/46.4ms pred gate=device Token # 580: 114.420ms; value: next_token_ids=tensor([17862], device='cuda:0') mtp accept=1 prop=17862 top1=17862 accp=1.000 next=draft=5555 prop=5555 olap pair=109.0ms serial=193.7ms gain=84.7ms ratio=0.44 s0=3.7ms s1=190.0ms wait=0.1/46.5ms pred gate=device Token # 581: 3.746ms; value: next_token_ids=tensor([5555], device='cuda:0') mtp accept=1 prop=5555 top1=5555 accp=0.735 next=pair draft=7417 prop=7417 pred gate=device Token # 582: 114.496ms; value: next_token_ids=tensor([7417], device='cuda:0') mtp accept=1 prop=7417 top1=7417 accp=1.000 next=draft=867 prop=867 olap pair=109.3ms serial=194.2ms gain=84.9ms ratio=0.44 s0=3.7ms s1=190.5ms wait=0.1/46.4ms pred gate=device Token # 583: 3.684ms; value: next_token_ids=tensor([7849], device='cuda:0') mtp accept=0 prop=867 top1=867 accp=0.781 next=pair draft=25024 prop=25024 pred gate=device Token # 584: 115.037ms; value: next_token_ids=tensor([25024], device='cuda:0') mtp accept=1 prop=25024 top1=25024 accp=0.978 next=draft=1415 prop=1415 olap pair=109.8ms serial=195.1ms gain=85.2ms ratio=0.44 s0=4.2ms s1=190.9ms wait=0.1/45.4ms pred gate=device Token # 585: 3.646ms; value: next_token_ids=tensor([3655], device='cuda:0') mtp accept=0 prop=1415 top1=7206 accp=0.008 next=pair draft=37209 prop=37209 pred gate=device Token # 586: 114.895ms; value: next_token_ids=tensor([37209], device='cuda:0') mtp accept=1 prop=37209 top1=37209 accp=0.900 next=draft=3461 prop=3461 olap pair=109.6ms serial=194.8ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.5ms wait=0.1/45.2ms pred gate=device Token # 587: 3.694ms; value: next_token_ids=tensor([51259], device='cuda:0') mtp accept=0 prop=3461 top1=51259 accp=0.093 next=pair draft=1415 prop=1415 pred gate=device Token # 588: 114.512ms; value: next_token_ids=tensor([3461], device='cuda:0') mtp accept=0 prop=1415 top1=3461 accp=0.013 next=draft=34864 prop=34864 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.8ms wait=0.1/45.3ms pred gate=device Token # 589: 115.096ms; value: next_token_ids=tensor([23945], device='cuda:0') mtp accept=0 prop=34864 top1=23945 accp=0.010 next=draft=10756 prop=10756 olap pair=109.8ms serial=195.1ms gain=85.3ms ratio=0.44 s0=4.3ms s1=190.8ms wait=0.1/45.2ms pred gate=device Token # 590: 114.474ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=1 prop=10756 top1=10756 accp=1.000 next=draft=109318 prop=109318 olap pair=109.1ms serial=193.7ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.5ms wait=0.1/45.3ms pred gate=device Token # 591: 3.753ms; value: next_token_ids=tensor([109318], device='cuda:0') mtp accept=1 prop=109318 top1=109318 accp=0.841 next=pair draft=320 prop=320 pred gate=device Token # 592: 115.139ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=10756 prop=10756 olap pair=109.9ms serial=195.2ms gain=85.3ms ratio=0.44 s0=4.2ms s1=191.0ms wait=0.1/45.3ms pred gate=device Token # 593: 3.779ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=1 prop=10756 top1=10756 accp=1.000 next=pair draft=66518 prop=66518 pred gate=device Token # 594: 115.338ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=draft=1275 prop=1275 olap pair=110.1ms serial=195.6ms gain=85.5ms ratio=0.44 s0=4.3ms s1=191.4ms wait=0.1/45.3ms pred gate=device Token # 595: 3.801ms; value: next_token_ids=tensor([1275], device='cuda:0') mtp accept=1 prop=1275 top1=1275 accp=0.985 next=pair draft=7557 prop=7557 pred gate=device Token # 596: 114.941ms; value: next_token_ids=tensor([7557], device='cuda:0') mtp accept=1 prop=7557 top1=7557 accp=1.000 next=draft=10756 prop=10756 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.3ms wait=0.1/45.2ms pred gate=device Token # 597: 3.757ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=1 prop=10756 top1=10756 accp=1.000 next=pair draft=119545 prop=119545 pred gate=device Token # 598: 114.626ms; value: next_token_ids=tensor([10780], device='cuda:0') mtp accept=0 prop=119545 top1=10780 accp=0.340 next=draft=621 prop=621 olap pair=109.4ms serial=194.2ms gain=84.9ms ratio=0.44 s0=4.3ms s1=189.9ms wait=0.1/45.1ms pred gate=device Token # 599: 115.025ms; value: next_token_ids=tensor([621], device='cuda:0') mtp accept=1 prop=621 top1=621 accp=1.000 next=draft=7557 prop=7557 olap pair=109.6ms serial=194.8ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.6ms wait=0.1/45.2ms pred gate=device Token # 600: 3.806ms; value: next_token_ids=tensor([7557], device='cuda:0') mtp accept=1 prop=7557 top1=7557 accp=1.000 next=pair draft=6034 prop=6034 pred gate=device Token # 601: 115.251ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=draft=572 prop=572 olap pair=110.0ms serial=195.4ms gain=85.4ms ratio=0.44 s0=4.2ms s1=191.1ms wait=0.1/45.4ms pred gate=device Token # 602: 3.789ms; value: next_token_ids=tensor([572], device='cuda:0') mtp accept=1 prop=572 top1=572 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 603: 114.859ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=7849 prop=7849 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=4.0ms s1=190.6ms wait=0.1/45.8ms pred gate=device Token # 604: 3.735ms; value: next_token_ids=tensor([7849], device='cuda:0') mtp accept=1 prop=7849 top1=7849 accp=1.000 next=pair draft=6034 prop=6034 pred gate=device Token # 605: 114.229ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=draft=1299 prop=1299 olap pair=108.9ms serial=193.2ms gain=84.4ms ratio=0.44 s0=4.3ms s1=188.9ms wait=0.1/45.4ms pred gate=device Token # 606: 3.771ms; value: next_token_ids=tensor([1299], device='cuda:0') mtp accept=1 prop=1299 top1=1299 accp=1.000 next=pair draft=6668 prop=6668 pred gate=device Token # 607: 115.110ms; value: next_token_ids=tensor([6668], device='cuda:0') mtp accept=1 prop=6668 top1=6668 accp=0.996 next=draft=23945 prop=23945 olap pair=109.9ms serial=195.2ms gain=85.4ms ratio=0.44 s0=4.3ms s1=190.9ms wait=0.1/45.2ms pred gate=device Token # 608: 3.794ms; value: next_token_ids=tensor([23945], device='cuda:0') mtp accept=1 prop=23945 top1=23945 accp=0.999 next=pair draft=10756 prop=10756 pred gate=device Token # 609: 114.003ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=1 prop=10756 top1=10756 accp=1.000 next=draft=876 prop=876 olap pair=108.7ms serial=192.9ms gain=84.2ms ratio=0.44 s0=4.2ms s1=188.7ms wait=0.1/45.4ms pred gate=device Token # 610: 3.802ms; value: next_token_ids=tensor([876], device='cuda:0') mtp accept=1 prop=876 top1=876 accp=0.994 next=pair draft=15 prop=15 pred gate=device Token # 611: 114.772ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=71733 prop=116045 olap pair=109.6ms serial=193.5ms gain=83.9ms ratio=0.43 s0=4.4ms s1=189.1ms wait=0.1/45.4ms pred gate=device Token # 612: 3.786ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=0 prop=116045 top1=2619 accp=0.000 next=pair draft=36101 prop=36101 pred gate=device Token # 613: 114.434ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=1.000 next=draft=17520 prop=17520 olap pair=109.1ms serial=193.8ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.5ms wait=0.1/45.4ms pred gate=device Token # 614: 3.647ms; value: next_token_ids=tensor([17520], device='cuda:0') mtp accept=1 prop=17520 top1=17520 accp=1.000 next=pair draft=666 prop=666 pred gate=device Token # 615: 114.318ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=768 prop=768 olap pair=109.1ms serial=193.7ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.4ms wait=0.1/45.4ms pred gate=device Token # 616: 3.687ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=445 prop=445 pred gate=device Token # 617: 115.287ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=1 prop=445 top1=445 accp=0.967 next=draft=2382 prop=2382 olap pair=110.0ms serial=195.5ms gain=85.6ms ratio=0.44 s0=3.9ms s1=191.6ms wait=0.1/46.1ms pred gate=device Token # 618: 3.746ms; value: next_token_ids=tensor([2382], device='cuda:0') mtp accept=1 prop=2382 top1=2382 accp=1.000 next=pair draft=92 prop=92 pred gate=device Token # 619: 114.476ms; value: next_token_ids=tensor([92], device='cuda:0') mtp accept=1 prop=92 top1=92 accp=1.000 next=draft=31 prop=31 olap pair=109.3ms serial=194.0ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.7ms wait=0.1/45.4ms pred gate=device Token # 620: 3.768ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 621: 114.791ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=301 prop=301 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=4.2ms s1=190.5ms wait=0.1/45.6ms pred gate=device Token # 622: 3.729ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=0.998 next=pair draft=36101 prop=36101 pred gate=device Token # 623: 114.827ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=1.000 next=draft=525 prop=525 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.3ms wait=0.1/45.5ms pred gate=device Token # 624: 3.788ms; value: next_token_ids=tensor([525], device='cuda:0') mtp accept=1 prop=525 top1=525 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 625: 114.378ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=2204 prop=2204 olap pair=109.1ms serial=193.9ms gain=84.7ms ratio=0.44 s0=4.2ms s1=189.6ms wait=0.1/45.4ms pred gate=device Token # 626: 3.652ms; value: next_token_ids=tensor([2204], device='cuda:0') mtp accept=1 prop=2204 top1=2204 accp=0.997 next=pair draft=8842 prop=28608 pred gate=device Token # 627: 114.583ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=0 prop=28608 top1=8842 accp=0.645 next=draft=389 prop=389 olap pair=109.3ms serial=194.2ms gain=84.9ms ratio=0.44 s0=4.2ms s1=190.0ms wait=0.1/45.4ms pred gate=device Token # 628: 114.548ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=1.000 next=draft=28608 prop=28608 olap pair=109.1ms serial=193.5ms gain=84.4ms ratio=0.44 s0=4.1ms s1=189.4ms wait=0.1/45.7ms pred gate=device Token # 629: 3.791ms; value: next_token_ids=tensor([28608], device='cuda:0') mtp accept=1 prop=28608 top1=28608 accp=1.000 next=pair draft=39 prop=39 pred gate=device Token # 630: 116.581ms; value: next_token_ids=tensor([39], device='cuda:0') mtp accept=1 prop=39 top1=39 accp=1.000 next=draft=35991 prop=35991 olap pair=108.9ms serial=193.3ms gain=84.3ms ratio=0.44 s0=4.2ms s1=189.0ms wait=0.1/45.3ms pred gate=device Token # 631: 3.765ms; value: next_token_ids=tensor([35991], device='cuda:0') mtp accept=1 prop=35991 top1=4572 accp=0.265 next=pair draft=303 prop=303 pred gate=device Token # 632: 114.267ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=3996 prop=3996 olap pair=109.0ms serial=193.6ms gain=84.5ms ratio=0.44 s0=4.2ms s1=189.3ms wait=0.1/45.5ms pred gate=device Token # 633: 3.800ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=0 prop=3996 top1=3996 accp=0.924 next=pair draft=66518 prop=66518 pred gate=device Token # 634: 114.719ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=draft=389 prop=389 olap pair=109.4ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.2ms s1=190.0ms wait=0.1/45.4ms pred gate=device Token # 635: 3.762ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=1.000 next=pair draft=79273 prop=79273 pred gate=device Token # 636: 114.875ms; value: next_token_ids=tensor([79273], device='cuda:0') mtp accept=1 prop=79273 top1=79273 accp=1.000 next=draft=320 prop=320 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.2ms s1=190.6ms wait=0.1/45.3ms pred gate=device Token # 637: 3.752ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.999 next=pair draft=2524 prop=2524 pred gate=device Token # 638: 114.329ms; value: next_token_ids=tensor([2524], device='cuda:0') mtp accept=1 prop=2524 top1=2524 accp=0.999 next=draft=45276 prop=45276 olap pair=109.1ms serial=193.7ms gain=84.6ms ratio=0.44 s0=4.2ms s1=189.5ms wait=0.1/45.4ms pred gate=device Token # 639: 3.704ms; value: next_token_ids=tensor([45276], device='cuda:0') mtp accept=1 prop=45276 top1=45276 accp=0.797 next=pair draft=25024 prop=25024 pred gate=device Token # 640: 114.963ms; value: next_token_ids=tensor([25024], device='cuda:0') mtp accept=1 prop=25024 top1=25024 accp=1.000 next=draft=9715 prop=9715 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=4.2ms s1=190.7ms wait=0.1/45.3ms pred gate=device Token # 641: 3.753ms; value: next_token_ids=tensor([2431], device='cuda:0') mtp accept=0 prop=9715 top1=2431 accp=0.229 next=pair draft=1299 prop=1299 pred gate=device Token # 642: 114.366ms; value: next_token_ids=tensor([33298], device='cuda:0') mtp accept=0 prop=1299 top1=33298 accp=0.196 next=draft=34864 prop=34864 olap pair=109.1ms serial=193.8ms gain=84.7ms ratio=0.44 s0=4.1ms s1=189.7ms wait=0.1/45.6ms pred gate=device Token # 643: 114.911ms; value: next_token_ids=tensor([34864], device='cuda:0') mtp accept=1 prop=34864 top1=34864 accp=1.000 next=draft=3343 prop=3343 olap pair=109.6ms serial=194.8ms gain=85.2ms ratio=0.44 s0=3.8ms s1=191.0ms wait=0.1/46.4ms pred gate=device Token # 644: 3.800ms; value: next_token_ids=tensor([3343], device='cuda:0') mtp accept=1 prop=3343 top1=3343 accp=0.950 next=pair draft=10756 prop=10756 pred gate=device Token # 645: 115.172ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=1 prop=10756 top1=10756 accp=1.000 next=draft=303 prop=303 olap pair=109.9ms serial=195.4ms gain=85.5ms ratio=0.44 s0=3.7ms s1=191.8ms wait=0.1/46.5ms pred gate=device Token # 646: 3.732ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=3996 prop=3996 pred gate=device Token # 647: 114.432ms; value: next_token_ids=tensor([41540], device='cuda:0') mtp accept=0 prop=3996 top1=41540 accp=0.048 next=draft=2920 prop=2920 olap pair=109.3ms serial=194.2ms gain=84.9ms ratio=0.44 s0=3.7ms s1=190.5ms wait=0.1/46.5ms pred gate=device Token # 648: 114.961ms; value: next_token_ids=tensor([2920], device='cuda:0') mtp accept=1 prop=2920 top1=2920 accp=1.000 next=draft=10756 prop=10756 olap pair=109.7ms serial=195.0ms gain=85.3ms ratio=0.44 s0=3.6ms s1=191.4ms wait=0.1/46.5ms pred gate=device Token # 649: 3.723ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=1 prop=10756 top1=10756 accp=1.000 next=pair draft=10780 prop=10780 pred gate=device Token # 650: 115.002ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=0 prop=10780 top1=4339 accp=0.243 next=draft=9501 prop=9501 olap pair=109.8ms serial=195.2ms gain=85.4ms ratio=0.44 s0=3.6ms s1=191.6ms wait=0.1/46.5ms pred gate=device Token # 651: 115.263ms; value: next_token_ids=tensor([10780], device='cuda:0') mtp accept=0 prop=9501 top1=10780 accp=0.183 next=draft=621 prop=621 olap pair=109.9ms serial=195.5ms gain=85.6ms ratio=0.44 s0=3.7ms s1=191.8ms wait=0.1/46.4ms pred gate=device Token # 652: 114.694ms; value: next_token_ids=tensor([621], device='cuda:0') mtp accept=1 prop=621 top1=621 accp=1.000 next=draft=13097 prop=13097 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=3.9ms s1=190.3ms wait=0.1/46.3ms pred gate=device Token # 653: 3.769ms; value: next_token_ids=tensor([13097], device='cuda:0') mtp accept=1 prop=13097 top1=13097 accp=0.699 next=pair draft=6034 prop=6034 pred gate=device Token # 654: 115.185ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=draft=572 prop=572 olap pair=109.9ms serial=195.3ms gain=85.4ms ratio=0.44 s0=4.3ms s1=191.0ms wait=0.1/45.3ms pred gate=device Token # 655: 3.746ms; value: next_token_ids=tensor([572], device='cuda:0') mtp accept=1 prop=572 top1=572 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 656: 114.835ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.966 next=draft=2803 prop=2803 olap pair=109.5ms serial=194.6ms gain=85.0ms ratio=0.44 s0=4.2ms s1=190.4ms wait=0.1/45.6ms pred gate=device Token # 657: 3.710ms; value: next_token_ids=tensor([2803], device='cuda:0') mtp accept=1 prop=2803 top1=2803 accp=0.997 next=pair draft=303 prop=303 pred gate=device Token # 658: 114.311ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=2204 prop=2204 olap pair=109.1ms serial=193.9ms gain=84.8ms ratio=0.44 s0=3.7ms s1=190.2ms wait=0.1/46.5ms pred gate=device Token # 659: 3.708ms; value: next_token_ids=tensor([2204], device='cuda:0') mtp accept=1 prop=2204 top1=2204 accp=0.994 next=pair draft=8842 prop=28608 pred gate=device Token # 660: 115.469ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=0 prop=28608 top1=8842 accp=0.742 next=draft=2099 prop=2099 olap pair=110.2ms serial=196.0ms gain=85.7ms ratio=0.44 s0=4.3ms s1=191.7ms wait=0.1/45.3ms pred gate=device Token # 661: 114.465ms; value: next_token_ids=tensor([2099], device='cuda:0') mtp accept=1 prop=2099 top1=2099 accp=1.000 next=draft=28608 prop=28608 olap pair=109.1ms serial=193.9ms gain=84.8ms ratio=0.44 s0=3.7ms s1=190.2ms wait=0.1/46.5ms pred gate=device Token # 662: 3.815ms; value: next_token_ids=tensor([28608], device='cuda:0') mtp accept=1 prop=28608 top1=28608 accp=1.000 next=pair draft=39 prop=39 pred gate=device Token # 663: 114.706ms; value: next_token_ids=tensor([39], device='cuda:0') mtp accept=1 prop=39 top1=39 accp=1.000 next=draft=35991 prop=35991 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.1ms wait=0.1/45.2ms pred gate=device Token # 664: 3.700ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=35991 top1=303 accp=0.412 next=pair draft=3996 prop=3996 pred gate=device Token # 665: 115.022ms; value: next_token_ids=tensor([2032], device='cuda:0') mtp accept=0 prop=3996 top1=2032 accp=0.466 next=draft=10756 prop=10756 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=4.2ms s1=190.7ms wait=0.1/45.6ms pred gate=device Token # 666: 115.690ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=1 prop=10756 top1=10756 accp=0.970 next=draft=66518 prop=66518 olap pair=110.4ms serial=196.1ms gain=85.8ms ratio=0.44 s0=4.3ms s1=191.8ms wait=0.1/45.4ms pred gate=device Token # 667: 3.775ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=pair draft=422 prop=422 pred gate=device Token # 668: 114.591ms; value: next_token_ids=tensor([422], device='cuda:0') mtp accept=1 prop=422 top1=422 accp=1.000 next=draft=18580 prop=18580 olap pair=109.3ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.2ms s1=189.9ms wait=0.1/45.3ms pred gate=device Token # 669: 3.737ms; value: next_token_ids=tensor([18580], device='cuda:0') mtp accept=1 prop=18580 top1=18580 accp=1.000 next=pair draft=478 prop=320 pred gate=device Token # 670: 115.107ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=0 prop=320 top1=478 accp=0.706 next=draft=372 prop=372 olap pair=109.9ms serial=195.2ms gain=85.4ms ratio=0.44 s0=4.3ms s1=191.0ms wait=0.1/45.5ms pred gate=device Token # 671: 114.727ms; value: next_token_ids=tensor([372], device='cuda:0') mtp accept=1 prop=372 top1=372 accp=0.982 next=draft=223 prop=223 olap pair=109.5ms serial=194.4ms gain=85.0ms ratio=0.44 s0=4.2ms s1=190.2ms wait=0.1/45.5ms pred gate=device Token # 672: 3.684ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=10728 prop=10728 pred gate=device Token # 673: 114.670ms; value: next_token_ids=tensor([10728], device='cuda:0') mtp accept=1 prop=10728 top1=10728 accp=1.000 next=draft=201 prop=271 olap pair=109.5ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.4ms s1=189.9ms wait=0.1/45.2ms pred gate=device Token # 674: 3.707ms; value: next_token_ids=tensor([947], device='cuda:0') mtp accept=0 prop=271 top1=947 accp=0.380 next=pair draft=7383 prop=7383 pred gate=device Token # 675: 114.779ms; value: next_token_ids=tensor([7383], device='cuda:0') mtp accept=1 prop=7383 top1=7383 accp=0.996 next=draft=201 prop=201 olap pair=109.6ms serial=194.5ms gain=84.9ms ratio=0.44 s0=4.7ms s1=189.8ms wait=0.1/44.8ms pred gate=device Token # 676: 3.742ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=0.661 next=pair draft=3660 prop=3660 pred gate=device Token # 677: 115.032ms; value: next_token_ids=tensor([3660], device='cuda:0') mtp accept=1 prop=3660 top1=3660 accp=0.991 next=draft=2382 prop=2382 olap pair=109.9ms serial=195.2ms gain=85.3ms ratio=0.44 s0=4.7ms s1=190.5ms wait=0.1/44.9ms pred gate=device Token # 678: 3.770ms; value: next_token_ids=tensor([2382], device='cuda:0') mtp accept=1 prop=2382 top1=2382 accp=0.969 next=pair draft=92 prop=92 pred gate=device Token # 679: 114.513ms; value: next_token_ids=tensor([92], device='cuda:0') mtp accept=1 prop=92 top1=92 accp=1.000 next=draft=31 prop=31 olap pair=109.3ms serial=194.1ms gain=84.7ms ratio=0.44 s0=4.8ms s1=189.3ms wait=0.1/44.8ms pred gate=device Token # 680: 3.724ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 681: 114.675ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=301 prop=301 olap pair=109.5ms serial=194.6ms gain=85.1ms ratio=0.44 s0=4.7ms s1=189.9ms wait=0.1/45.0ms pred gate=device Token # 682: 3.728ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=1.000 next=pair draft=3699 prop=3699 pred gate=device Token # 683: 114.954ms; value: next_token_ids=tensor([3699], device='cuda:0') mtp accept=1 prop=3699 top1=3699 accp=1.000 next=draft=47 prop=47 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=4.7ms s1=190.2ms wait=0.1/44.9ms pred gate=device Token # 684: 3.725ms; value: next_token_ids=tensor([47], device='cuda:0') mtp accept=1 prop=47 top1=47 accp=1.000 next=pair draft=36101 prop=36101 pred gate=device Token # 685: 115.140ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=1.000 next=draft=2833 prop=2833 olap pair=109.9ms serial=195.3ms gain=85.4ms ratio=0.44 s0=4.7ms s1=190.6ms wait=0.1/44.9ms pred gate=device Token # 686: 3.751ms; value: next_token_ids=tensor([2833], device='cuda:0') mtp accept=1 prop=2833 top1=2833 accp=0.978 next=pair draft=303 prop=303 pred gate=device Token # 687: 114.716ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=7524 accp=0.507 next=draft=39932 prop=39932 olap pair=109.4ms serial=193.9ms gain=84.5ms ratio=0.44 s0=5.0ms s1=188.9ms wait=0.1/44.6ms pred gate=device Token # 688: 3.752ms; value: next_token_ids=tensor([39932], device='cuda:0') mtp accept=1 prop=39932 top1=39932 accp=0.960 next=pair draft=7157 prop=7157 pred gate=device Token # 689: 114.588ms; value: next_token_ids=tensor([7157], device='cuda:0') mtp accept=1 prop=7157 top1=7157 accp=0.977 next=draft=3115 prop=3115 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.1ms s1=190.1ms wait=0.1/45.5ms pred gate=device Token # 690: 3.672ms; value: next_token_ids=tensor([4382], device='cuda:0') mtp accept=0 prop=3115 top1=4382 accp=0.105 next=pair draft=8009 prop=65913 pred gate=device Token # 691: 114.614ms; value: next_token_ids=tensor([4953], device='cuda:0') mtp accept=0 prop=65913 top1=4953 accp=0.004 next=draft=13097 prop=799 olap pair=109.3ms serial=194.2ms gain=84.9ms ratio=0.44 s0=3.8ms s1=190.4ms wait=0.1/46.3ms pred gate=device Token # 692: 115.386ms; value: next_token_ids=tensor([13097], device='cuda:0') mtp accept=0 prop=799 top1=13097 accp=0.880 next=draft=6034 prop=6034 olap pair=110.1ms serial=195.5ms gain=85.4ms ratio=0.44 s0=4.3ms s1=191.2ms wait=0.1/45.4ms pred gate=device Token # 693: 115.331ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=draft=637 prop=637 olap pair=110.0ms serial=195.4ms gain=85.4ms ratio=0.44 s0=4.4ms s1=191.0ms wait=0.1/45.3ms pred gate=device Token # 694: 3.772ms; value: next_token_ids=tensor([637], device='cuda:0') mtp accept=1 prop=637 top1=637 accp=1.000 next=pair draft=18617 prop=18617 pred gate=device Token # 695: 114.680ms; value: next_token_ids=tensor([18617], device='cuda:0') mtp accept=1 prop=18617 top1=18617 accp=1.000 next=draft=45276 prop=45276 olap pair=109.4ms serial=194.4ms gain=85.0ms ratio=0.44 s0=4.2ms s1=190.2ms wait=0.1/45.6ms pred gate=device Token # 696: 3.730ms; value: next_token_ids=tensor([45276], device='cuda:0') mtp accept=1 prop=45276 top1=45276 accp=0.994 next=pair draft=25024 prop=25024 pred gate=device Token # 697: 114.582ms; value: next_token_ids=tensor([25024], device='cuda:0') mtp accept=1 prop=25024 top1=25024 accp=1.000 next=draft=301 prop=301 olap pair=109.3ms serial=194.3ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.7ms wait=0.1/46.5ms pred gate=device Token # 698: 3.773ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=0.999 next=pair draft=36101 prop=36101 pred gate=device Token # 699: 115.853ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=1.000 next=draft=303 prop=303 olap pair=109.9ms serial=194.7ms gain=84.9ms ratio=0.44 s0=5.7ms s1=189.0ms wait=0.2/44.1ms pred gate=device Token # 700: 4.626ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.888 next=pair draft=4029 prop=4029 pred gate=device Token # 701: 115.783ms; value: next_token_ids=tensor([4029], device='cuda:0') mtp accept=1 prop=4029 top1=4029 accp=0.999 next=draft=8009 prop=8009 olap pair=109.7ms serial=193.5ms gain=83.8ms ratio=0.43 s0=8.8ms s1=184.6ms wait=0.2/40.7ms pred gate=device Token # 702: 4.586ms; value: next_token_ids=tensor([8009], device='cuda:0') mtp accept=1 prop=8009 top1=11732 accp=0.237 next=pair draft=45276 prop=45276 pred gate=device Token # 703: 115.316ms; value: next_token_ids=tensor([45276], device='cuda:0') mtp accept=1 prop=45276 top1=45276 accp=0.986 next=draft=6034 prop=6034 olap pair=109.4ms serial=193.3ms gain=83.9ms ratio=0.43 s0=6.3ms s1=187.0ms wait=0.2/43.4ms pred gate=device Token # 704: 3.779ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=0.999 next=pair draft=124356 prop=124356 pred gate=device Token # 705: 114.611ms; value: next_token_ids=tensor([124356], device='cuda:0') mtp accept=1 prop=124356 top1=124356 accp=1.000 next=draft=8815 prop=8815 olap pair=109.3ms serial=193.9ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.6ms wait=0.1/45.4ms pred gate=device Token # 706: 3.721ms; value: next_token_ids=tensor([8815], device='cuda:0') mtp accept=1 prop=8815 top1=8815 accp=0.936 next=pair draft=580 prop=580 pred gate=device Token # 707: 114.695ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=0 prop=580 top1=478 accp=0.455 next=draft=15 prop=15 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=3.9ms s1=190.6ms wait=0.1/46.0ms pred gate=device Token # 708: 121.362ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=0.991 next=draft=513 prop=513 olap pair=110.0ms serial=194.4ms gain=84.4ms ratio=0.43 s0=4.7ms s1=189.7ms wait=0.1/44.9ms pred gate=device Token # 709: 3.702ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=0 prop=513 top1=2619 accp=0.001 next=pair draft=3374 prop=3374 pred gate=device Token # 710: 114.979ms; value: next_token_ids=tensor([3374], device='cuda:0') mtp accept=1 prop=3374 top1=2204 accp=0.214 next=draft=66518 prop=66518 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.8ms s1=190.1ms wait=0.1/44.7ms pred gate=device Token # 711: 3.685ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=pair draft=666 prop=1237 pred gate=device Token # 712: 114.410ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=0 prop=1237 top1=666 accp=0.824 next=draft=768 prop=768 olap pair=109.2ms serial=194.0ms gain=84.8ms ratio=0.44 s0=4.0ms s1=190.0ms wait=0.1/45.9ms pred gate=device Token # 713: 114.468ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=422 prop=422 olap pair=109.1ms serial=193.6ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.4ms wait=0.1/45.3ms pred gate=device Token # 714: 3.679ms; value: next_token_ids=tensor([422], device='cuda:0') mtp accept=1 prop=422 top1=422 accp=1.000 next=pair draft=18580 prop=18580 pred gate=device Token # 715: 114.565ms; value: next_token_ids=tensor([18580], device='cuda:0') mtp accept=1 prop=18580 top1=18580 accp=0.987 next=draft=303 prop=303 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=4.2ms s1=189.9ms wait=0.1/45.4ms pred gate=device Token # 716: 3.761ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=2524 prop=2524 pred gate=device Token # 717: 114.130ms; value: next_token_ids=tensor([2524], device='cuda:0') mtp accept=1 prop=2524 top1=2524 accp=0.999 next=draft=6525 prop=6525 olap pair=108.9ms serial=193.4ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.1ms wait=0.1/45.4ms pred gate=device Token # 718: 3.761ms; value: next_token_ids=tensor([6525], device='cuda:0') mtp accept=1 prop=6525 top1=6525 accp=0.791 next=pair draft=31446 prop=116037 pred gate=device Token # 719: 114.677ms; value: next_token_ids=tensor([31446], device='cuda:0') mtp accept=0 prop=116037 top1=31446 accp=0.570 next=draft=3374 prop=3374 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.0ms s1=190.5ms wait=0.1/46.0ms pred gate=device Token # 720: 114.839ms; value: next_token_ids=tensor([45276], device='cuda:0') mtp accept=0 prop=3374 top1=45276 accp=0.322 next=draft=25024 prop=25024 olap pair=109.5ms serial=194.6ms gain=85.1ms ratio=0.44 s0=3.8ms s1=190.8ms wait=0.1/46.3ms pred gate=device Token # 721: 115.023ms; value: next_token_ids=tensor([25024], device='cuda:0') mtp accept=1 prop=25024 top1=25024 accp=1.000 next=draft=876 prop=876 olap pair=109.6ms serial=194.9ms gain=85.3ms ratio=0.44 s0=3.8ms s1=191.1ms wait=0.1/46.4ms pred gate=device Token # 722: 3.756ms; value: next_token_ids=tensor([876], device='cuda:0') mtp accept=1 prop=876 top1=876 accp=1.000 next=pair draft=15 prop=15 pred gate=device Token # 723: 114.358ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=2619 prop=2619 olap pair=109.2ms serial=194.0ms gain=84.8ms ratio=0.44 s0=3.7ms s1=190.4ms wait=0.1/46.5ms pred gate=device Token # 724: 3.694ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.995 next=pair draft=2353 prop=2353 pred gate=device Token # 725: 114.666ms; value: next_token_ids=tensor([2353], device='cuda:0') mtp accept=1 prop=2353 top1=2353 accp=1.000 next=draft=1121 prop=1121 olap pair=109.5ms serial=194.5ms gain=85.1ms ratio=0.44 s0=3.7ms s1=190.9ms wait=0.1/46.4ms pred gate=device Token # 726: 3.715ms; value: next_token_ids=tensor([1121], device='cuda:0') mtp accept=1 prop=1121 top1=1121 accp=1.000 next=pair draft=66518 prop=66518 pred gate=device Token # 727: 114.632ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=draft=666 prop=666 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=4.4ms s1=189.7ms wait=0.1/45.7ms pred gate=device Token # 728: 3.761ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=pair draft=768 prop=768 pred gate=device Token # 729: 115.165ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=18580 prop=18580 olap pair=109.3ms serial=193.4ms gain=84.1ms ratio=0.44 s0=5.9ms s1=187.5ms wait=0.2/43.9ms pred gate=device Token # 730: 3.794ms; value: next_token_ids=tensor([18580], device='cuda:0') mtp accept=1 prop=18580 top1=18580 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 731: 114.637ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=9158 prop=41540 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=3.7ms s1=190.6ms wait=0.1/46.5ms pred gate=device Token # 732: 3.747ms; value: next_token_ids=tensor([41540], device='cuda:0') mtp accept=1 prop=41540 top1=41540 accp=0.385 next=pair draft=8842 prop=8842 pred gate=device Token # 733: 114.715ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=0.997 next=draft=10602 prop=10602 olap pair=109.5ms serial=194.6ms gain=85.1ms ratio=0.44 s0=3.6ms s1=191.0ms wait=0.1/46.5ms pred gate=device Token # 734: 3.749ms; value: next_token_ids=tensor([10602], device='cuda:0') mtp accept=1 prop=10602 top1=10602 accp=1.000 next=pair draft=10780 prop=10780 pred gate=device Token # 735: 114.879ms; value: next_token_ids=tensor([10780], device='cuda:0') mtp accept=1 prop=10780 top1=10780 accp=0.937 next=draft=621 prop=621 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=3.7ms s1=190.9ms wait=0.1/46.5ms pred gate=device Token # 736: 3.788ms; value: next_token_ids=tensor([621], device='cuda:0') mtp accept=1 prop=621 top1=621 accp=1.000 next=pair draft=13097 prop=13097 pred gate=device Token # 737: 114.385ms; value: next_token_ids=tensor([13097], device='cuda:0') mtp accept=1 prop=13097 top1=13097 accp=1.000 next=draft=6034 prop=6034 olap pair=109.1ms serial=194.0ms gain=84.9ms ratio=0.44 s0=3.7ms s1=190.3ms wait=0.1/46.5ms pred gate=device Token # 738: 3.721ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=pair draft=572 prop=572 pred gate=device Token # 739: 114.955ms; value: next_token_ids=tensor([572], device='cuda:0') mtp accept=1 prop=572 top1=572 accp=0.986 next=draft=303 prop=303 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.3ms wait=0.1/45.7ms pred gate=device Token # 740: 3.743ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=18617 prop=8009 pred gate=device Token # 741: 114.338ms; value: next_token_ids=tensor([18617], device='cuda:0') mtp accept=0 prop=8009 top1=18617 accp=0.680 next=draft=45276 prop=45276 olap pair=109.1ms serial=193.6ms gain=84.5ms ratio=0.44 s0=4.4ms s1=189.2ms wait=0.1/45.4ms pred gate=device Token # 742: 114.725ms; value: next_token_ids=tensor([45276], device='cuda:0') mtp accept=1 prop=45276 top1=45276 accp=0.998 next=draft=25024 prop=25024 olap pair=109.4ms serial=194.3ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.6ms wait=0.1/46.5ms pred gate=device Token # 743: 3.709ms; value: next_token_ids=tensor([25024], device='cuda:0') mtp accept=1 prop=25024 top1=25024 accp=1.000 next=pair draft=14015 prop=14015 pred gate=device Token # 744: 114.398ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=0 prop=14015 top1=301 accp=0.001 next=draft=36101 prop=36101 olap pair=109.1ms serial=193.8ms gain=84.8ms ratio=0.44 s0=3.6ms s1=190.2ms wait=0.1/46.6ms pred gate=device Token # 745: 115.333ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=1.000 next=draft=320 prop=320 olap pair=110.0ms serial=195.4ms gain=85.4ms ratio=0.44 s0=4.7ms s1=190.7ms wait=0.1/44.9ms pred gate=device Token # 746: 3.740ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=320 top1=303 accp=0.151 next=pair draft=1207 prop=1207 pred gate=device Token # 747: 114.782ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=1 prop=1207 top1=1207 accp=0.999 next=draft=2386 prop=2386 olap pair=109.5ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.2ms s1=190.1ms wait=0.1/45.9ms pred gate=device Token # 748: 3.712ms; value: next_token_ids=tensor([16303], device='cuda:0') mtp accept=0 prop=2386 top1=16303 accp=0.066 next=pair draft=100642 prop=100642 pred gate=device Token # 749: 115.345ms; value: next_token_ids=tensor([100642], device='cuda:0') mtp accept=1 prop=100642 top1=100642 accp=0.999 next=draft=2431 prop=2431 olap pair=110.1ms serial=195.9ms gain=85.8ms ratio=0.44 s0=3.7ms s1=192.2ms wait=0.1/46.4ms pred gate=device Token # 750: 3.718ms; value: next_token_ids=tensor([1714], device='cuda:0') mtp accept=0 prop=2431 top1=1714 accp=0.035 next=pair draft=7157 prop=7157 pred gate=device Token # 751: 114.868ms; value: next_token_ids=tensor([7157], device='cuda:0') mtp accept=1 prop=7157 top1=7157 accp=0.987 next=draft=876 prop=876 olap pair=109.6ms serial=194.8ms gain=85.2ms ratio=0.44 s0=3.8ms s1=191.0ms wait=0.1/46.5ms pred gate=device Token # 752: 3.746ms; value: next_token_ids=tensor([876], device='cuda:0') mtp accept=1 prop=876 top1=876 accp=0.999 next=pair draft=15 prop=15 pred gate=device Token # 753: 114.535ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=2619 prop=2619 olap pair=109.3ms serial=194.3ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.7ms wait=0.1/46.5ms pred gate=device Token # 754: 3.767ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=pair draft=34408 prop=34408 pred gate=device Token # 755: 116.895ms; value: next_token_ids=tensor([34408], device='cuda:0') mtp accept=1 prop=34408 top1=34408 accp=1.000 next=draft=1728 prop=1728 olap pair=111.7ms serial=196.5ms gain=84.8ms ratio=0.43 s0=4.6ms s1=191.9ms wait=0.1/45.3ms pred gate=device Token # 756: 3.741ms; value: next_token_ids=tensor([1728], device='cuda:0') mtp accept=1 prop=1728 top1=1728 accp=1.000 next=pair draft=66518 prop=66518 pred gate=device Token # 757: 114.843ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=draft=666 prop=666 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.4ms wait=0.1/45.3ms pred gate=device Token # 758: 3.768ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=pair draft=768 prop=768 pred gate=device Token # 759: 114.448ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=422 prop=422 olap pair=109.2ms serial=193.8ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.5ms wait=0.1/45.4ms pred gate=device Token # 760: 3.723ms; value: next_token_ids=tensor([422], device='cuda:0') mtp accept=1 prop=422 top1=422 accp=0.996 next=pair draft=18580 prop=18580 pred gate=device Token # 761: 114.499ms; value: next_token_ids=tensor([18580], device='cuda:0') mtp accept=1 prop=18580 top1=18580 accp=1.000 next=draft=303 prop=303 olap pair=109.3ms serial=194.0ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.7ms wait=0.1/45.3ms pred gate=device Token # 762: 3.774ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.998 next=pair draft=2524 prop=2524 pred gate=device Token # 763: 114.497ms; value: next_token_ids=tensor([2524], device='cuda:0') mtp accept=1 prop=2524 top1=2524 accp=1.000 next=draft=45276 prop=45276 olap pair=109.2ms serial=194.0ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.7ms wait=0.1/45.5ms pred gate=device Token # 764: 3.707ms; value: next_token_ids=tensor([45276], device='cuda:0') mtp accept=1 prop=45276 top1=45276 accp=0.979 next=pair draft=25024 prop=25024 pred gate=device Token # 765: 115.024ms; value: next_token_ids=tensor([25024], device='cuda:0') mtp accept=1 prop=25024 top1=25024 accp=1.000 next=draft=6525 prop=6525 olap pair=109.6ms serial=194.5ms gain=84.8ms ratio=0.44 s0=4.0ms s1=190.5ms wait=0.1/46.0ms pred gate=device Token # 766: 3.752ms; value: next_token_ids=tensor([6525], device='cuda:0') mtp accept=1 prop=6525 top1=6525 accp=1.000 next=pair draft=38922 prop=38922 pred gate=device Token # 767: 114.567ms; value: next_token_ids=tensor([38186], device='cuda:0') mtp accept=0 prop=38922 top1=38186 accp=0.059 next=draft=34408 prop=34408 olap pair=109.2ms serial=194.1ms gain=84.9ms ratio=0.44 s0=3.7ms s1=190.4ms wait=0.1/46.5ms pred gate=device Token # 768: 115.229ms; value: next_token_ids=tensor([34408], device='cuda:0') mtp accept=1 prop=34408 top1=34408 accp=1.000 next=draft=1728 prop=1728 olap pair=109.8ms serial=194.3ms gain=84.5ms ratio=0.43 s0=4.1ms s1=190.3ms wait=0.1/46.0ms pred gate=device Token # 769: 3.743ms; value: next_token_ids=tensor([1728], device='cuda:0') mtp accept=1 prop=1728 top1=1728 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 770: 114.709ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=6911 prop=6911 olap pair=109.4ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.0ms s1=190.4ms wait=0.1/46.1ms pred gate=device Token # 771: 3.690ms; value: next_token_ids=tensor([6911], device='cuda:0') mtp accept=1 prop=6911 top1=6911 accp=0.953 next=pair draft=6034 prop=6034 pred gate=device Token # 772: 114.622ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=draft=63239 prop=63239 olap pair=109.4ms serial=194.5ms gain=85.1ms ratio=0.44 s0=3.7ms s1=190.8ms wait=0.1/46.5ms pred gate=device Token # 773: 3.700ms; value: next_token_ids=tensor([77649], device='cuda:0') mtp accept=0 prop=63239 top1=77649 accp=0.120 next=pair draft=876 prop=876 pred gate=device Token # 774: 114.799ms; value: next_token_ids=tensor([876], device='cuda:0') mtp accept=1 prop=876 top1=876 accp=0.988 next=draft=15 prop=15 olap pair=109.5ms serial=194.7ms gain=85.2ms ratio=0.44 s0=3.6ms s1=191.1ms wait=0.1/46.5ms pred gate=device Token # 775: 3.698ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=pair draft=2619 prop=2619 pred gate=device Token # 776: 114.975ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.995 next=draft=10756 prop=10756 olap pair=109.6ms serial=194.8ms gain=85.2ms ratio=0.44 s0=3.8ms s1=191.0ms wait=0.1/46.5ms pred gate=device Token # 777: 3.745ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=1 prop=10756 top1=10756 accp=1.000 next=pair draft=66518 prop=66518 pred gate=device Token # 778: 114.574ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=draft=666 prop=666 olap pair=109.4ms serial=194.4ms gain=85.0ms ratio=0.44 s0=3.8ms s1=190.5ms wait=0.1/46.4ms pred gate=device Token # 779: 3.818ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=pair draft=768 prop=768 pred gate=device Token # 780: 114.902ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=3440 prop=3440 olap pair=109.6ms serial=194.5ms gain=84.9ms ratio=0.44 s0=4.2ms s1=190.2ms wait=0.1/45.5ms pred gate=device Token # 781: 3.750ms; value: next_token_ids=tensor([3440], device='cuda:0') mtp accept=1 prop=3440 top1=3440 accp=0.997 next=pair draft=20668 prop=20668 pred gate=device Token # 782: 114.692ms; value: next_token_ids=tensor([1140], device='cuda:0') mtp accept=0 prop=20668 top1=1140 accp=0.025 next=draft=8842 prop=8842 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.1ms s1=190.1ms wait=0.1/45.8ms pred gate=device Token # 783: 115.607ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=1.000 next=draft=389 prop=389 olap pair=110.3ms serial=196.0ms gain=85.7ms ratio=0.44 s0=4.2ms s1=191.8ms wait=0.1/45.4ms pred gate=device Token # 784: 3.824ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=0.857 next=pair draft=28608 prop=28608 pred gate=device Token # 785: 114.571ms; value: next_token_ids=tensor([28608], device='cuda:0') mtp accept=1 prop=28608 top1=28608 accp=1.000 next=draft=39 prop=39 olap pair=109.3ms serial=194.0ms gain=84.8ms ratio=0.44 s0=4.1ms s1=189.9ms wait=0.1/45.6ms pred gate=device Token # 786: 3.714ms; value: next_token_ids=tensor([39], device='cuda:0') mtp accept=1 prop=39 top1=39 accp=1.000 next=pair draft=35991 prop=35991 pred gate=device Token # 787: 115.010ms; value: next_token_ids=tensor([35991], device='cuda:0') mtp accept=1 prop=35991 top1=35991 accp=1.000 next=draft=625 prop=625 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=3.7ms s1=191.2ms wait=0.1/46.4ms pred gate=device Token # 788: 3.780ms; value: next_token_ids=tensor([625], device='cuda:0') mtp accept=1 prop=625 top1=625 accp=0.999 next=pair draft=18580 prop=18580 pred gate=device Token # 789: 114.997ms; value: next_token_ids=tensor([18580], device='cuda:0') mtp accept=1 prop=18580 top1=18580 accp=1.000 next=draft=303 prop=303 olap pair=109.7ms serial=195.0ms gain=85.2ms ratio=0.44 s0=4.0ms s1=191.0ms wait=0.1/46.0ms pred gate=device Token # 790: 3.744ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=0 prop=303 top1=478 accp=0.008 next=pair draft=2204 prop=2204 pred gate=device Token # 791: 114.916ms; value: next_token_ids=tensor([50687], device='cuda:0') mtp accept=0 prop=2204 top1=50687 accp=0.312 next=draft=15133 prop=15133 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=4.4ms s1=190.4ms wait=0.1/45.1ms pred gate=device Token # 792: 115.140ms; value: next_token_ids=tensor([15133], device='cuda:0') mtp accept=1 prop=15133 top1=15133 accp=1.000 next=draft=525 prop=525 olap pair=109.8ms serial=195.0ms gain=85.2ms ratio=0.44 s0=4.7ms s1=190.2ms wait=0.1/44.8ms pred gate=device Token # 793: 3.763ms; value: next_token_ids=tensor([525], device='cuda:0') mtp accept=1 prop=525 top1=525 accp=0.999 next=pair draft=303 prop=303 pred gate=device Token # 794: 114.456ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=2204 prop=2204 olap pair=109.2ms serial=194.0ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.7ms wait=0.1/45.5ms pred gate=device Token # 795: 3.722ms; value: next_token_ids=tensor([2204], device='cuda:0') mtp accept=1 prop=2204 top1=2204 accp=0.517 next=pair draft=8842 prop=8842 pred gate=device Token # 796: 114.684ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=1.000 next=draft=12701 prop=17030 olap pair=109.5ms serial=194.5ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.3ms wait=0.1/45.2ms pred gate=device Token # 797: 3.713ms; value: next_token_ids=tensor([35987], device='cuda:0') mtp accept=0 prop=17030 top1=35987 accp=0.385 next=pair draft=303 prop=303 pred gate=device Token # 798: 114.807ms; value: next_token_ids=tensor([6525], device='cuda:0') mtp accept=0 prop=303 top1=6525 accp=0.146 next=draft=21236 prop=21236 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=3.8ms s1=190.7ms wait=0.1/46.4ms pred gate=device Token # 799: 116.206ms; value: next_token_ids=tensor([21236], device='cuda:0') mtp accept=1 prop=21236 top1=21236 accp=0.953 next=draft=45276 prop=45276 olap pair=110.4ms serial=195.4ms gain=85.0ms ratio=0.43 s0=7.6ms s1=187.8ms wait=0.2/42.0ms pred gate=device Token # 800: 3.837ms; value: next_token_ids=tensor([45276], device='cuda:0') mtp accept=1 prop=45276 top1=45276 accp=1.000 next=pair draft=6034 prop=6034 pred gate=device Token # 801: 114.632ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=draft=124356 prop=124356 olap pair=109.3ms serial=194.0ms gain=84.7ms ratio=0.44 s0=4.2ms s1=189.7ms wait=0.1/45.5ms pred gate=device Token # 802: 3.741ms; value: next_token_ids=tensor([124356], device='cuda:0') mtp accept=1 prop=124356 top1=124356 accp=0.996 next=pair draft=303 prop=303 pred gate=device Token # 803: 114.929ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=104623 prop=104623 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=4.2ms s1=190.8ms wait=0.1/45.5ms pred gate=device Token # 804: 3.708ms; value: next_token_ids=tensor([104623], device='cuda:0') mtp accept=1 prop=104623 top1=82910 accp=0.074 next=pair draft=5678 prop=5678 pred gate=device Token # 805: 114.980ms; value: next_token_ids=tensor([6094], device='cuda:0') mtp accept=0 prop=5678 top1=6094 accp=0.085 next=draft=2541 prop=2541 olap pair=109.8ms serial=195.0ms gain=85.3ms ratio=0.44 s0=4.3ms s1=190.8ms wait=0.1/45.3ms pred gate=device Token # 806: 115.777ms; value: next_token_ids=tensor([2541], device='cuda:0') mtp accept=1 prop=2541 top1=2541 accp=0.565 next=draft=2353 prop=2353 olap pair=110.4ms serial=196.1ms gain=85.7ms ratio=0.44 s0=4.3ms s1=191.8ms wait=0.1/45.3ms pred gate=device Token # 807: 3.728ms; value: next_token_ids=tensor([2353], device='cuda:0') mtp accept=1 prop=2353 top1=2353 accp=0.999 next=pair draft=1121 prop=1121 pred gate=device Token # 808: 114.995ms; value: next_token_ids=tensor([1121], device='cuda:0') mtp accept=1 prop=1121 top1=1121 accp=1.000 next=draft=66518 prop=66518 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.6ms wait=0.1/45.4ms pred gate=device Token # 809: 3.780ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=0.999 next=pair draft=548 prop=548 pred gate=device Token # 810: 115.889ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=1 prop=548 top1=548 accp=1.000 next=draft=34408 prop=34408 olap pair=110.6ms serial=196.7ms gain=86.1ms ratio=0.44 s0=4.1ms s1=192.5ms wait=0.1/45.8ms pred gate=device Token # 811: 3.739ms; value: next_token_ids=tensor([34408], device='cuda:0') mtp accept=1 prop=34408 top1=34408 accp=1.000 next=pair draft=1728 prop=1728 pred gate=device Token # 812: 114.922ms; value: next_token_ids=tensor([1728], device='cuda:0') mtp accept=1 prop=1728 top1=1728 accp=1.000 next=draft=66518 prop=66518 olap pair=109.6ms serial=194.1ms gain=84.4ms ratio=0.44 s0=4.5ms s1=189.6ms wait=0.1/45.2ms pred gate=device Token # 813: 3.757ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 814: 114.985ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.999 next=draft=2803 prop=2803 olap pair=109.7ms serial=193.3ms gain=83.7ms ratio=0.43 s0=4.5ms s1=188.8ms wait=0.1/45.1ms pred gate=device Token # 815: 3.757ms; value: next_token_ids=tensor([18332], device='cuda:0') mtp accept=0 prop=2803 top1=18332 accp=0.173 next=pair draft=2382 prop=2382 pred gate=device Token # 816: 115.390ms; value: next_token_ids=tensor([2382], device='cuda:0') mtp accept=1 prop=2382 top1=2382 accp=0.992 next=draft=92 prop=92 olap pair=110.0ms serial=195.2ms gain=85.2ms ratio=0.44 s0=4.6ms s1=190.5ms wait=0.1/44.8ms pred gate=device Token # 817: 3.741ms; value: next_token_ids=tensor([92], device='cuda:0') mtp accept=1 prop=92 top1=92 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 818: 114.541ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=19 prop=19 olap pair=109.3ms serial=194.2ms gain=84.9ms ratio=0.44 s0=4.7ms s1=189.5ms wait=0.1/45.0ms pred gate=device Token # 819: 3.728ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=301 prop=301 pred gate=device Token # 820: 114.672ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=0.709 next=draft=36101 prop=36101 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.7ms s1=189.4ms wait=0.1/44.9ms pred gate=device Token # 821: 3.704ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=1.000 next=pair draft=525 prop=525 pred gate=device Token # 822: 114.813ms; value: next_token_ids=tensor([17520], device='cuda:0') mtp accept=0 prop=525 top1=17520 accp=0.159 next=draft=713 prop=713 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.5ms s1=190.0ms wait=0.1/45.1ms pred gate=device Token # 823: 114.986ms; value: next_token_ids=tensor([713], device='cuda:0') mtp accept=1 prop=713 top1=713 accp=0.999 next=draft=303 prop=303 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.4ms wait=0.1/45.4ms pred gate=device Token # 824: 3.699ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=34408 prop=34408 pred gate=device Token # 825: 115.264ms; value: next_token_ids=tensor([34408], device='cuda:0') mtp accept=1 prop=34408 top1=34408 accp=1.000 next=draft=1728 prop=1728 olap pair=110.0ms serial=195.4ms gain=85.4ms ratio=0.44 s0=4.3ms s1=191.1ms wait=0.1/45.4ms pred gate=device Token # 826: 3.704ms; value: next_token_ids=tensor([1728], device='cuda:0') mtp accept=1 prop=1728 top1=1728 accp=1.000 next=pair draft=66518 prop=66518 pred gate=device Token # 827: 115.283ms; value: next_token_ids=tensor([1380], device='cuda:0') mtp accept=0 prop=66518 top1=1380 accp=0.231 next=draft=11035 prop=11035 olap pair=110.1ms serial=195.5ms gain=85.5ms ratio=0.44 s0=4.3ms s1=191.3ms wait=0.1/45.4ms pred gate=device Token # 828: 115.245ms; value: next_token_ids=tensor([11035], device='cuda:0') mtp accept=1 prop=11035 top1=11035 accp=1.000 next=draft=10730 prop=10730 olap pair=109.9ms serial=195.4ms gain=85.5ms ratio=0.44 s0=4.3ms s1=191.1ms wait=0.1/45.2ms pred gate=device Token # 829: 3.736ms; value: next_token_ids=tensor([10730], device='cuda:0') mtp accept=1 prop=10730 top1=10730 accp=0.985 next=pair draft=60555 prop=60555 pred gate=device Token # 830: 114.666ms; value: next_token_ids=tensor([2056], device='cuda:0') mtp accept=0 prop=60555 top1=2056 accp=0.045 next=draft=2386 prop=2386 olap pair=109.4ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.1ms s1=190.2ms wait=0.1/45.6ms pred gate=device Token # 831: 115.329ms; value: next_token_ids=tensor([2386], device='cuda:0') mtp accept=1 prop=2386 top1=2386 accp=0.690 next=draft=9209 prop=9209 olap pair=110.1ms serial=195.7ms gain=85.7ms ratio=0.44 s0=3.7ms s1=192.1ms wait=0.1/46.6ms pred gate=device Token # 832: 3.672ms; value: next_token_ids=tensor([9691], device='cuda:0') mtp accept=0 prop=9209 top1=84602 accp=0.222 next=pair draft=7157 prop=7157 pred gate=device Token # 833: 114.952ms; value: next_token_ids=tensor([7157], device='cuda:0') mtp accept=1 prop=7157 top1=7157 accp=1.000 next=draft=320 prop=320 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=3.9ms s1=190.8ms wait=0.1/46.0ms pred gate=device Token # 834: 3.639ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=4992 prop=15023 pred gate=device Token # 835: 115.208ms; value: next_token_ids=tensor([4992], device='cuda:0') mtp accept=0 prop=15023 top1=4992 accp=0.846 next=draft=25873 prop=25873 olap pair=110.0ms serial=195.4ms gain=85.4ms ratio=0.44 s0=4.0ms s1=191.3ms wait=0.1/45.9ms pred gate=device Token # 836: 114.834ms; value: next_token_ids=tensor([25873], device='cuda:0') mtp accept=1 prop=25873 top1=25873 accp=0.827 next=draft=16344 prop=16344 olap pair=109.5ms serial=194.2ms gain=84.7ms ratio=0.44 s0=4.1ms s1=190.1ms wait=0.1/45.7ms pred gate=device Token # 837: 3.750ms; value: next_token_ids=tensor([16344], device='cuda:0') mtp accept=1 prop=16344 top1=16344 accp=1.000 next=pair draft=389 prop=389 pred gate=device Token # 838: 114.871ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=1.000 next=draft=2541 prop=1299 olap pair=109.6ms serial=194.5ms gain=85.0ms ratio=0.44 s0=3.8ms s1=190.8ms wait=0.1/46.3ms pred gate=device Token # 839: 3.763ms; value: next_token_ids=tensor([2541], device='cuda:0') mtp accept=0 prop=1299 top1=2541 accp=0.821 next=pair draft=666 prop=666 pred gate=device Token # 840: 115.428ms; value: next_token_ids=tensor([2353], device='cuda:0') mtp accept=0 prop=666 top1=2353 accp=0.027 next=draft=1121 prop=1121 olap pair=109.6ms serial=193.8ms gain=84.2ms ratio=0.43 s0=6.2ms s1=187.6ms wait=0.2/43.5ms pred gate=device Token # 841: 115.255ms; value: next_token_ids=tensor([1121], device='cuda:0') mtp accept=1 prop=1121 top1=1121 accp=1.000 next=draft=66518 prop=66518 olap pair=109.9ms serial=195.3ms gain=85.3ms ratio=0.44 s0=3.7ms s1=191.5ms wait=0.1/46.3ms pred gate=device Token # 842: 3.704ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=pair draft=637 prop=637 pred gate=device Token # 843: 114.826ms; value: next_token_ids=tensor([637], device='cuda:0') mtp accept=1 prop=637 top1=637 accp=1.000 next=draft=31446 prop=31446 olap pair=109.6ms serial=193.8ms gain=84.2ms ratio=0.43 s0=4.3ms s1=189.5ms wait=0.1/45.4ms pred gate=device Token # 844: 3.700ms; value: next_token_ids=tensor([31446], device='cuda:0') mtp accept=1 prop=31446 top1=31446 accp=0.967 next=pair draft=8842 prop=8842 pred gate=device Token # 845: 114.846ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=0.957 next=draft=303 prop=303 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=4.2ms s1=190.6ms wait=0.1/45.4ms pred gate=device Token # 846: 3.708ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.999 next=pair draft=3437 prop=3437 pred gate=device Token # 847: 114.862ms; value: next_token_ids=tensor([867], device='cuda:0') mtp accept=0 prop=3437 top1=867 accp=0.055 next=draft=1275 prop=1275 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.5ms wait=0.1/45.4ms pred gate=device Token # 848: 114.789ms; value: next_token_ids=tensor([9209], device='cuda:0') mtp accept=0 prop=1275 top1=9209 accp=0.003 next=draft=34408 prop=34408 olap pair=109.5ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.7ms s1=189.6ms wait=0.1/45.0ms pred gate=device Token # 849: 114.859ms; value: next_token_ids=tensor([2541], device='cuda:0') mtp accept=0 prop=34408 top1=2541 accp=0.195 next=draft=34408 prop=34408 olap pair=109.5ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.8ms s1=189.6ms wait=0.1/44.8ms pred gate=device Token # 850: 114.817ms; value: next_token_ids=tensor([34408], device='cuda:0') mtp accept=1 prop=34408 top1=34408 accp=1.000 next=draft=1728 prop=1728 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.7ms s1=189.6ms wait=0.1/44.9ms pred gate=device Token # 851: 3.823ms; value: next_token_ids=tensor([1728], device='cuda:0') mtp accept=1 prop=1728 top1=1728 accp=1.000 next=pair draft=66518 prop=66518 pred gate=device Token # 852: 114.449ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=draft=303 prop=303 olap pair=109.2ms serial=193.7ms gain=84.5ms ratio=0.44 s0=4.5ms s1=189.2ms wait=0.1/45.2ms pred gate=device Token # 853: 3.783ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=0 prop=303 top1=320 accp=0.199 next=pair draft=4029 prop=2204 pred gate=device Token # 854: 114.927ms; value: next_token_ids=tensor([2204], device='cuda:0') mtp accept=1 prop=2204 top1=2204 accp=0.202 next=draft=8842 prop=8842 olap pair=109.6ms serial=194.4ms gain=84.8ms ratio=0.44 s0=4.5ms s1=190.0ms wait=0.1/45.0ms pred gate=device Token # 855: 3.724ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=1.000 next=pair draft=89829 prop=89829 pred gate=device Token # 856: 115.263ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=0 prop=89829 top1=389 accp=0.376 next=draft=28608 prop=28608 olap pair=109.9ms serial=195.2ms gain=85.3ms ratio=0.44 s0=4.4ms s1=190.8ms wait=0.1/45.4ms pred gate=device Token # 857: 115.239ms; value: next_token_ids=tensor([28608], device='cuda:0') mtp accept=1 prop=28608 top1=28608 accp=1.000 next=draft=39 prop=39 olap pair=109.9ms serial=195.3ms gain=85.4ms ratio=0.44 s0=3.9ms s1=191.4ms wait=0.1/46.1ms pred gate=device Token # 858: 3.722ms; value: next_token_ids=tensor([39], device='cuda:0') mtp accept=1 prop=39 top1=39 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 859: 114.996ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=2032 prop=2032 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=4.5ms s1=190.4ms wait=0.1/45.4ms pred gate=device Token # 860: 3.749ms; value: next_token_ids=tensor([2032], device='cuda:0') mtp accept=1 prop=2032 top1=2032 accp=0.708 next=pair draft=37800 prop=82910 pred gate=device Token # 861: 114.801ms; value: next_token_ids=tensor([82910], device='cuda:0') mtp accept=1 prop=82910 top1=82910 accp=0.710 next=draft=2541 prop=2541 olap pair=109.5ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.6ms s1=189.9ms wait=0.1/45.1ms pred gate=device Token # 862: 3.772ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=0 prop=2541 top1=2541 accp=0.826 next=pair draft=66518 prop=66518 pred gate=device Token # 863: 115.229ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=draft=478 prop=478 olap pair=110.0ms serial=195.4ms gain=85.4ms ratio=0.44 s0=4.3ms s1=191.1ms wait=0.1/45.5ms pred gate=device Token # 864: 3.723ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=pair draft=8685 prop=11537 pred gate=device Token # 865: 114.346ms; value: next_token_ids=tensor([11537], device='cuda:0') mtp accept=1 prop=11537 top1=8685 accp=0.779 next=draft=303 prop=303 olap pair=109.1ms serial=193.6ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.2ms wait=0.1/45.4ms pred gate=device Token # 866: 3.702ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=15410 prop=15410 pred gate=device Token # 867: 114.105ms; value: next_token_ids=tensor([15410], device='cuda:0') mtp accept=1 prop=15410 top1=15410 accp=0.844 next=draft=7157 prop=7157 olap pair=108.9ms serial=193.3ms gain=84.4ms ratio=0.44 s0=4.0ms s1=189.3ms wait=0.1/46.2ms pred gate=device Token # 868: 3.739ms; value: next_token_ids=tensor([7157], device='cuda:0') mtp accept=1 prop=7157 top1=7157 accp=0.997 next=pair draft=3442 prop=3442 pred gate=device Token # 869: 114.286ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=0 prop=3442 top1=8842 accp=0.114 next=draft=66518 prop=66518 olap pair=109.1ms serial=193.6ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.3ms wait=0.1/45.4ms pred gate=device Token # 870: 114.567ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=0.984 next=draft=1237 prop=1237 olap pair=109.3ms serial=194.2ms gain=84.9ms ratio=0.44 s0=4.1ms s1=190.1ms wait=0.1/45.8ms pred gate=device Token # 871: 3.680ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=18278 prop=18278 pred gate=device Token # 872: 114.264ms; value: next_token_ids=tensor([8449], device='cuda:0') mtp accept=0 prop=18278 top1=8449 accp=0.070 next=draft=50294 prop=50294 olap pair=109.0ms serial=193.6ms gain=84.5ms ratio=0.44 s0=4.2ms s1=189.4ms wait=0.1/45.5ms pred gate=device Token # 873: 114.634ms; value: next_token_ids=tensor([50294], device='cuda:0') mtp accept=1 prop=50294 top1=50294 accp=1.000 next=draft=1478 prop=1478 olap pair=109.3ms serial=194.2ms gain=84.9ms ratio=0.44 s0=4.0ms s1=190.1ms wait=0.1/45.8ms pred gate=device Token # 874: 3.824ms; value: next_token_ids=tensor([1478], device='cuda:0') mtp accept=1 prop=1478 top1=1478 accp=0.965 next=pair draft=1227 prop=303 pred gate=device Token # 875: 114.323ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=0 prop=303 top1=1227 accp=0.857 next=draft=39358 prop=39358 olap pair=109.1ms serial=193.7ms gain=84.6ms ratio=0.44 s0=3.8ms s1=190.0ms wait=0.1/46.4ms pred gate=device Token # 876: 115.291ms; value: next_token_ids=tensor([39358], device='cuda:0') mtp accept=1 prop=39358 top1=39358 accp=0.835 next=draft=5870 prop=1542 olap pair=110.0ms serial=195.6ms gain=85.6ms ratio=0.44 s0=3.6ms s1=191.9ms wait=0.1/46.6ms pred gate=device Token # 877: 3.790ms; value: next_token_ids=tensor([5870], device='cuda:0') mtp accept=0 prop=1542 top1=5870 accp=0.885 next=pair draft=303 prop=303 pred gate=device Token # 878: 114.355ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=8040 prop=8040 olap pair=109.1ms serial=193.8ms gain=84.8ms ratio=0.44 s0=3.7ms s1=190.2ms wait=0.1/46.6ms pred gate=device Token # 879: 3.670ms; value: next_token_ids=tensor([8040], device='cuda:0') mtp accept=1 prop=8040 top1=8040 accp=0.960 next=pair draft=1275 prop=1275 pred gate=device Token # 880: 114.674ms; value: next_token_ids=tensor([1275], device='cuda:0') mtp accept=1 prop=1275 top1=1275 accp=0.996 next=draft=2353 prop=2353 olap pair=109.5ms serial=194.6ms gain=85.1ms ratio=0.44 s0=3.6ms s1=191.0ms wait=0.1/46.6ms pred gate=device Token # 881: 3.707ms; value: next_token_ids=tensor([7557], device='cuda:0') mtp accept=0 prop=2353 top1=7557 accp=0.094 next=pair draft=2827 prop=2827 pred gate=device Token # 882: 114.785ms; value: next_token_ids=tensor([2827], device='cuda:0') mtp accept=1 prop=2827 top1=2827 accp=0.812 next=draft=119545 prop=119545 olap pair=109.5ms serial=194.6ms gain=85.1ms ratio=0.44 s0=3.8ms s1=190.8ms wait=0.1/46.4ms pred gate=device Token # 883: 3.710ms; value: next_token_ids=tensor([119545], device='cuda:0') mtp accept=1 prop=119545 top1=32073 accp=0.383 next=pair draft=7557 prop=7557 pred gate=device Token # 884: 116.139ms; value: next_token_ids=tensor([7557], device='cuda:0') mtp accept=1 prop=7557 top1=7557 accp=0.979 next=draft=6034 prop=6034 olap pair=110.0ms serial=195.0ms gain=85.0ms ratio=0.44 s0=4.7ms s1=190.2ms wait=0.1/45.4ms pred gate=device Token # 885: 4.661ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=pair draft=572 prop=572 pred gate=device Token # 886: 115.146ms; value: next_token_ids=tensor([572], device='cuda:0') mtp accept=1 prop=572 top1=572 accp=0.699 next=draft=1237 prop=1237 olap pair=109.6ms serial=193.9ms gain=84.3ms ratio=0.43 s0=7.2ms s1=186.7ms wait=0.2/42.3ms pred gate=device Token # 887: 3.738ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=49416 prop=49416 pred gate=device Token # 888: 114.828ms; value: next_token_ids=tensor([49416], device='cuda:0') mtp accept=1 prop=49416 top1=49416 accp=0.880 next=draft=34408 prop=34408 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=4.2ms s1=190.6ms wait=0.1/45.5ms pred gate=device Token # 889: 3.759ms; value: next_token_ids=tensor([34408], device='cuda:0') mtp accept=1 prop=34408 top1=34408 accp=1.000 next=pair draft=1728 prop=1728 pred gate=device Token # 890: 114.477ms; value: next_token_ids=tensor([1728], device='cuda:0') mtp accept=1 prop=1728 top1=1728 accp=1.000 next=draft=66518 prop=66518 olap pair=109.2ms serial=193.9ms gain=84.7ms ratio=0.44 s0=4.2ms s1=189.6ms wait=0.1/45.5ms pred gate=device Token # 891: 3.735ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=0.998 next=pair draft=39595 prop=39595 pred gate=device Token # 892: 114.955ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=39595 top1=303 accp=0.425 next=draft=1207 prop=1207 olap pair=109.7ms serial=195.0ms gain=85.2ms ratio=0.44 s0=4.2ms s1=190.7ms wait=0.1/45.6ms pred gate=device Token # 893: 115.019ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=1 prop=1207 top1=1207 accp=0.890 next=draft=19484 prop=2635 olap pair=109.8ms serial=195.0ms gain=85.3ms ratio=0.44 s0=4.2ms s1=190.8ms wait=0.1/45.4ms pred gate=device Token # 894: 3.712ms; value: next_token_ids=tensor([9721], device='cuda:0') mtp accept=0 prop=2635 top1=9721 accp=0.143 next=pair draft=45276 prop=45276 pred gate=device Token # 895: 114.974ms; value: next_token_ids=tensor([45276], device='cuda:0') mtp accept=1 prop=45276 top1=45276 accp=0.979 next=draft=25024 prop=25024 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=4.1ms s1=190.5ms wait=0.1/45.8ms pred gate=device Token # 896: 3.702ms; value: next_token_ids=tensor([25024], device='cuda:0') mtp accept=1 prop=25024 top1=25024 accp=1.000 next=pair draft=303 prop=625 pred gate=device Token # 897: 114.383ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=625 top1=303 accp=0.667 next=draft=18467 prop=18467 olap pair=109.1ms serial=193.8ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.5ms wait=0.1/45.5ms pred gate=device Token # 898: 115.057ms; value: next_token_ids=tensor([18467], device='cuda:0') mtp accept=1 prop=18467 top1=18467 accp=0.964 next=draft=2635 prop=2635 olap pair=109.8ms serial=195.2ms gain=85.4ms ratio=0.44 s0=4.2ms s1=191.0ms wait=0.1/45.5ms pred gate=device Token # 899: 3.693ms; value: next_token_ids=tensor([2541], device='cuda:0') mtp accept=0 prop=2635 top1=2541 accp=0.173 next=pair draft=2827 prop=2827 pred gate=device Token # 900: 114.436ms; value: next_token_ids=tensor([1395], device='cuda:0') mtp accept=0 prop=2827 top1=1395 accp=0.009 next=draft=18341 prop=18341 olap pair=109.2ms serial=194.0ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.7ms wait=0.1/45.4ms pred gate=device Token # 901: 114.639ms; value: next_token_ids=tensor([2748], device='cuda:0') mtp accept=0 prop=18341 top1=2748 accp=0.049 next=draft=9776 prop=9776 olap pair=109.3ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.9ms wait=0.1/45.3ms pred gate=device Token # 902: 115.328ms; value: next_token_ids=tensor([9776], device='cuda:0') mtp accept=1 prop=9776 top1=9776 accp=0.991 next=draft=11053 prop=11053 olap pair=110.0ms serial=195.3ms gain=85.4ms ratio=0.44 s0=4.3ms s1=191.0ms wait=0.1/45.4ms pred gate=device Token # 903: 3.699ms; value: next_token_ids=tensor([11053], device='cuda:0') mtp accept=1 prop=11053 top1=11053 accp=1.000 next=pair draft=66518 prop=66518 pred gate=device Token # 904: 114.861ms; value: next_token_ids=tensor([24495], device='cuda:0') mtp accept=0 prop=66518 top1=24495 accp=0.303 next=draft=303 prop=637 olap pair=109.6ms serial=194.8ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.5ms wait=0.1/45.6ms pred gate=device Token # 905: 114.959ms; value: next_token_ids=tensor([637], device='cuda:0') mtp accept=1 prop=637 top1=303 accp=0.974 next=draft=9209 prop=8009 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.5ms wait=0.1/45.5ms pred gate=device Token # 906: 3.691ms; value: next_token_ids=tensor([8009], device='cuda:0') mtp accept=1 prop=8009 top1=9209 accp=0.699 next=pair draft=16303 prop=16303 pred gate=device Token # 907: 115.229ms; value: next_token_ids=tensor([77649], device='cuda:0') mtp accept=0 prop=16303 top1=95427 accp=0.010 next=draft=303 prop=2130 olap pair=110.0ms serial=195.6ms gain=85.6ms ratio=0.44 s0=4.3ms s1=191.4ms wait=0.1/45.5ms pred gate=device Token # 908: 114.994ms; value: next_token_ids=tensor([2130], device='cuda:0') mtp accept=1 prop=2130 top1=2130 accp=0.385 next=draft=303 prop=303 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.4ms wait=0.1/45.4ms pred gate=device Token # 909: 3.692ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.995 next=pair draft=37658 prop=37658 pred gate=device Token # 910: 114.553ms; value: next_token_ids=tensor([37658], device='cuda:0') mtp accept=1 prop=37658 top1=37658 accp=0.667 next=draft=2386 prop=2386 olap pair=109.4ms serial=194.3ms gain=85.0ms ratio=0.44 s0=4.2ms s1=190.1ms wait=0.1/45.5ms pred gate=device Token # 911: 3.732ms; value: next_token_ids=tensor([2386], device='cuda:0') mtp accept=1 prop=2386 top1=2386 accp=0.817 next=pair draft=27675 prop=27675 pred gate=device Token # 912: 116.133ms; value: next_token_ids=tensor([29812], device='cuda:0') mtp accept=0 prop=27675 top1=29812 accp=0.003 next=draft=8842 prop=8842 olap pair=110.0ms serial=193.1ms gain=83.1ms ratio=0.43 s0=8.7ms s1=184.4ms wait=0.2/40.7ms pred gate=device Token # 913: 114.802ms; value: next_token_ids=tensor([34993], device='cuda:0') mtp accept=0 prop=8842 top1=34993 accp=0.425 next=draft=548 prop=548 olap pair=109.2ms serial=193.2ms gain=84.0ms ratio=0.43 s0=6.8ms s1=186.4ms wait=0.2/42.9ms pred gate=device Token # 914: 114.940ms; value: next_token_ids=tensor([14164], device='cuda:0') mtp accept=0 prop=548 top1=24605 accp=0.026 next=draft=3660 prop=14149 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.0ms s1=190.8ms wait=0.1/46.0ms pred gate=device Token # 915: 115.115ms; value: next_token_ids=tensor([14149], device='cuda:0') mtp accept=1 prop=14149 top1=14149 accp=0.406 next=draft=303 prop=303 olap pair=109.8ms serial=195.1ms gain=85.3ms ratio=0.44 s0=3.7ms s1=191.4ms wait=0.1/46.5ms pred gate=device Token # 916: 3.754ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=3660 prop=445 pred gate=device Token # 917: 114.894ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=1 prop=445 top1=3660 accp=0.885 next=draft=2382 prop=2382 olap pair=109.6ms serial=194.8ms gain=85.2ms ratio=0.44 s0=3.9ms s1=190.9ms wait=0.1/46.3ms pred gate=device Token # 918: 3.730ms; value: next_token_ids=tensor([2382], device='cuda:0') mtp accept=1 prop=2382 top1=2382 accp=0.923 next=pair draft=92 prop=92 pred gate=device Token # 919: 115.454ms; value: next_token_ids=tensor([92], device='cuda:0') mtp accept=1 prop=92 top1=92 accp=1.000 next=draft=31 prop=31 olap pair=110.3ms serial=196.0ms gain=85.7ms ratio=0.44 s0=4.3ms s1=191.7ms wait=0.1/45.5ms pred gate=device Token # 920: 3.767ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 921: 115.362ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=15121 prop=15121 olap pair=110.1ms serial=195.8ms gain=85.7ms ratio=0.44 s0=4.0ms s1=191.8ms wait=0.1/46.0ms pred gate=device Token # 922: 3.782ms; value: next_token_ids=tensor([625], device='cuda:0') mtp accept=0 prop=15121 top1=625 accp=0.102 next=pair draft=303 prop=303 pred gate=device Token # 923: 116.982ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=1395 prop=1395 olap pair=109.3ms serial=194.2ms gain=84.8ms ratio=0.44 s0=3.7ms s1=190.5ms wait=0.1/46.5ms pred gate=device Token # 924: 3.669ms; value: next_token_ids=tensor([1395], device='cuda:0') mtp accept=1 prop=1395 top1=1395 accp=0.860 next=pair draft=18341 prop=18341 pred gate=device Token # 925: 114.241ms; value: next_token_ids=tensor([29222], device='cuda:0') mtp accept=0 prop=18341 top1=29222 accp=0.006 next=draft=16344 prop=16344 olap pair=109.0ms serial=193.3ms gain=84.3ms ratio=0.44 s0=4.3ms s1=189.0ms wait=0.1/45.5ms pred gate=device Token # 926: 114.870ms; value: next_token_ids=tensor([13349], device='cuda:0') mtp accept=0 prop=16344 top1=15206 accp=0.059 next=draft=389 prop=389 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/45.2ms pred gate=device Token # 927: 115.056ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=0.997 next=draft=2353 prop=2541 olap pair=109.7ms serial=194.6ms gain=84.9ms ratio=0.44 s0=4.1ms s1=190.5ms wait=0.1/45.9ms pred gate=device Token # 928: 3.758ms; value: next_token_ids=tensor([2541], device='cuda:0') mtp accept=1 prop=2541 top1=2541 accp=0.626 next=pair draft=2353 prop=2353 pred gate=device Token # 929: 114.596ms; value: next_token_ids=tensor([2353], device='cuda:0') mtp accept=1 prop=2353 top1=2353 accp=0.999 next=draft=1121 prop=1121 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=4.1ms s1=190.1ms wait=0.1/45.9ms pred gate=device Token # 930: 3.756ms; value: next_token_ids=tensor([1121], device='cuda:0') mtp accept=1 prop=1121 top1=1121 accp=1.000 next=pair draft=66518 prop=66518 pred gate=device Token # 931: 114.576ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=draft=303 prop=303 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.0ms wait=0.1/45.3ms pred gate=device Token # 932: 3.713ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.977 next=pair draft=4029 prop=4029 pred gate=device Token # 933: 114.446ms; value: next_token_ids=tensor([4029], device='cuda:0') mtp accept=1 prop=4029 top1=2524 accp=0.550 next=draft=1275 prop=1275 olap pair=109.2ms serial=194.0ms gain=84.7ms ratio=0.44 s0=3.9ms s1=190.1ms wait=0.1/46.3ms pred gate=device Token # 934: 3.757ms; value: next_token_ids=tensor([1275], device='cuda:0') mtp accept=1 prop=1275 top1=1275 accp=0.788 next=pair draft=8842 prop=8842 pred gate=device Token # 935: 114.808ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=0.997 next=draft=9501 prop=9501 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=3.7ms s1=191.1ms wait=0.1/46.4ms pred gate=device Token # 936: 3.729ms; value: next_token_ids=tensor([9501], device='cuda:0') mtp accept=1 prop=9501 top1=5467 accp=0.229 next=pair draft=45276 prop=45276 pred gate=device Token # 937: 114.629ms; value: next_token_ids=tensor([45276], device='cuda:0') mtp accept=1 prop=45276 top1=45276 accp=0.999 next=draft=6034 prop=6034 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=3.7ms s1=190.4ms wait=0.1/46.6ms pred gate=device Token # 938: 3.794ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=pair draft=572 prop=572 pred gate=device Token # 939: 114.482ms; value: next_token_ids=tensor([572], device='cuda:0') mtp accept=1 prop=572 top1=572 accp=1.000 next=draft=1237 prop=1237 olap pair=109.2ms serial=193.9ms gain=84.7ms ratio=0.44 s0=3.7ms s1=190.2ms wait=0.1/46.6ms pred gate=device Token # 940: 3.724ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=2204 prop=2204 pred gate=device Token # 941: 114.803ms; value: next_token_ids=tensor([2204], device='cuda:0') mtp accept=1 prop=2204 top1=2204 accp=1.000 next=draft=2431 prop=2431 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.7ms wait=0.1/46.4ms pred gate=device Token # 942: 3.712ms; value: next_token_ids=tensor([2431], device='cuda:0') mtp accept=1 prop=2431 top1=2431 accp=0.998 next=pair draft=5319 prop=5319 pred gate=device Token # 943: 114.666ms; value: next_token_ids=tensor([7417], device='cuda:0') mtp accept=0 prop=5319 top1=7417 accp=0.002 next=draft=121994 prop=121994 olap pair=109.5ms serial=194.6ms gain=85.1ms ratio=0.44 s0=3.7ms s1=190.9ms wait=0.1/46.5ms pred gate=device Token # 944: 114.866ms; value: next_token_ids=tensor([93365], device='cuda:0') mtp accept=0 prop=121994 top1=93365 accp=0.015 next=draft=77170 prop=77170 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.1ms s1=190.4ms wait=0.1/45.8ms pred gate=device Token # 945: 115.065ms; value: next_token_ids=tensor([16303], device='cuda:0') mtp accept=0 prop=77170 top1=16303 accp=0.033 next=draft=100642 prop=100642 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.1ms s1=190.7ms wait=0.1/45.9ms pred gate=device Token # 946: 114.758ms; value: next_token_ids=tensor([100642], device='cuda:0') mtp accept=1 prop=100642 top1=100642 accp=0.991 next=draft=478 prop=478 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.0ms wait=0.1/45.4ms pred gate=device Token # 947: 3.749ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=pair draft=2204 prop=3660 pred gate=device Token # 948: 114.502ms; value: next_token_ids=tensor([4755], device='cuda:0') mtp accept=0 prop=3660 top1=4755 accp=0.009 next=draft=303 prop=303 olap pair=109.3ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.8ms wait=0.1/45.2ms pred gate=device Token # 949: 114.882ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=87243 prop=87243 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.4ms wait=0.1/45.5ms pred gate=device Token # 950: 3.663ms; value: next_token_ids=tensor([87243], device='cuda:0') mtp accept=1 prop=87243 top1=3461 accp=0.295 next=pair draft=303 prop=303 pred gate=device Token # 951: 114.450ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.998 next=draft=445 prop=445 olap pair=109.2ms serial=194.0ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.7ms wait=0.1/45.3ms pred gate=device Token # 952: 3.670ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=1 prop=445 top1=445 accp=0.677 next=pair draft=36101 prop=36101 pred gate=device Token # 953: 114.553ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=0.999 next=draft=625 prop=625 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.0ms wait=0.1/45.5ms pred gate=device Token # 954: 3.699ms; value: next_token_ids=tensor([625], device='cuda:0') mtp accept=1 prop=625 top1=625 accp=0.948 next=pair draft=303 prop=303 pred gate=device Token # 955: 114.796ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.999 next=draft=1300 prop=1300 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.3ms wait=0.1/45.4ms pred gate=device Token # 956: 3.704ms; value: next_token_ids=tensor([1300], device='cuda:0') mtp accept=1 prop=1300 top1=1300 accp=1.000 next=pair draft=9168 prop=15410 pred gate=device Token # 957: 114.914ms; value: next_token_ids=tensor([15410], device='cuda:0') mtp accept=1 prop=15410 top1=2431 accp=0.419 next=draft=2541 prop=2541 olap pair=109.7ms serial=194.7ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.5ms wait=0.1/45.4ms pred gate=device Token # 958: 3.722ms; value: next_token_ids=tensor([2541], device='cuda:0') mtp accept=1 prop=2541 top1=2541 accp=0.719 next=pair draft=46625 prop=46625 pred gate=device Token # 959: 116.162ms; value: next_token_ids=tensor([48055], device='cuda:0') mtp accept=0 prop=46625 top1=13208 accp=0.154 next=draft=88 prop=88 olap pair=111.0ms serial=196.3ms gain=85.3ms ratio=0.43 s0=4.3ms s1=192.0ms wait=0.1/45.5ms pred gate=device Token # 960: 115.401ms; value: next_token_ids=tensor([46625], device='cuda:0') mtp accept=0 prop=88 top1=46625 accp=0.004 next=draft=1237 prop=1237 olap pair=110.2ms serial=194.3ms gain=84.1ms ratio=0.43 s0=4.1ms s1=190.2ms wait=0.1/46.2ms pred gate=device Token # 961: 115.979ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=0 prop=1237 top1=410 accp=0.065 next=draft=13086 prop=13086 olap pair=110.6ms serial=194.6ms gain=83.9ms ratio=0.43 s0=4.2ms s1=190.4ms wait=0.1/46.2ms pred gate=device Token # 962: 114.846ms; value: next_token_ids=tensor([77196], device='cuda:0') mtp accept=0 prop=13086 top1=77196 accp=0.000 next=draft=13208 prop=13208 olap pair=109.6ms serial=194.3ms gain=84.7ms ratio=0.44 s0=4.0ms s1=190.3ms wait=0.1/46.1ms pred gate=device Token # 963: 115.077ms; value: next_token_ids=tensor([13208], device='cuda:0') mtp accept=1 prop=13208 top1=13208 accp=0.998 next=draft=410 prop=410 olap pair=109.8ms serial=194.9ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.7ms wait=0.1/45.2ms pred gate=device Token # 964: 3.692ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=0.881 next=pair draft=1620 prop=1620 pred gate=device Token # 965: 114.471ms; value: next_token_ids=tensor([14612], device='cuda:0') mtp accept=0 prop=1620 top1=14612 accp=0.274 next=draft=4383 prop=4383 olap pair=109.3ms serial=194.0ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.8ms wait=0.1/45.4ms pred gate=device Token # 966: 114.621ms; value: next_token_ids=tensor([4383], device='cuda:0') mtp accept=1 prop=4383 top1=4383 accp=0.839 next=draft=4398 prop=4398 olap pair=109.4ms serial=194.2ms gain=84.9ms ratio=0.44 s0=4.2ms s1=190.0ms wait=0.1/45.4ms pred gate=device Token # 967: 3.763ms; value: next_token_ids=tensor([4398], device='cuda:0') mtp accept=1 prop=4398 top1=4398 accp=0.999 next=pair draft=107400 prop=1237 pred gate=device Token # 968: 114.601ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=0.430 next=draft=10802 prop=4621 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.2ms s1=190.0ms wait=0.1/45.4ms pred gate=device Token # 969: 3.691ms; value: next_token_ids=tensor([2204], device='cuda:0') mtp accept=0 prop=4621 top1=2204 accp=0.098 next=pair draft=2382 prop=2382 pred gate=device Token # 970: 114.576ms; value: next_token_ids=tensor([13825], device='cuda:0') mtp accept=0 prop=2382 top1=13825 accp=0.000 next=draft=4383 prop=450 olap pair=109.3ms serial=194.2ms gain=84.9ms ratio=0.44 s0=4.2ms s1=190.0ms wait=0.1/45.5ms pred gate=device Token # 971: 114.287ms; value: next_token_ids=tensor([57850], device='cuda:0') mtp accept=0 prop=450 top1=19540 accp=0.000 next=draft=13097 prop=13097 olap pair=109.1ms serial=193.6ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.4ms wait=0.1/45.3ms pred gate=device Token # 972: 114.945ms; value: next_token_ids=tensor([15974], device='cuda:0') mtp accept=0 prop=13097 top1=15974 accp=0.000 next=draft=1227 prop=1227 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=4.2ms s1=190.5ms wait=0.1/45.5ms pred gate=device Token # 973: 115.498ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=1 prop=1227 top1=1227 accp=1.000 next=draft=107400 prop=107400 olap pair=110.1ms serial=195.7ms gain=85.6ms ratio=0.44 s0=4.3ms s1=191.4ms wait=0.1/45.5ms pred gate=device Token # 974: 3.759ms; value: next_token_ids=tensor([107400], device='cuda:0') mtp accept=1 prop=107400 top1=107400 accp=1.000 next=pair draft=637 prop=637 pred gate=device Token # 975: 114.696ms; value: next_token_ids=tensor([119270], device='cuda:0') mtp accept=0 prop=637 top1=119270 accp=0.037 next=draft=10730 prop=10730 olap pair=109.4ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.1ms wait=0.1/45.4ms pred gate=device Token # 976: 114.615ms; value: next_token_ids=tensor([10730], device='cuda:0') mtp accept=1 prop=10730 top1=10730 accp=0.516 next=draft=320 prop=320 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.8ms wait=0.1/45.3ms pred gate=device Token # 977: 3.703ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=3660 prop=3660 pred gate=device Token # 978: 114.455ms; value: next_token_ids=tensor([2803], device='cuda:0') mtp accept=0 prop=3660 top1=2803 accp=0.211 next=draft=303 prop=303 olap pair=109.2ms serial=193.9ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.6ms wait=0.1/45.5ms pred gate=device Token # 979: 114.681ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=4389 prop=4389 olap pair=109.3ms serial=194.0ms gain=84.6ms ratio=0.44 s0=4.4ms s1=189.6ms wait=0.1/45.2ms pred gate=device Token # 980: 3.684ms; value: next_token_ids=tensor([3660], device='cuda:0') mtp accept=0 prop=4389 top1=4389 accp=0.479 next=pair draft=45276 prop=1644 pred gate=device Token # 981: 115.214ms; value: next_token_ids=tensor([53400], device='cuda:0') mtp accept=0 prop=1644 top1=53400 accp=0.002 next=draft=2382 prop=2382 olap pair=110.0ms serial=195.4ms gain=85.4ms ratio=0.44 s0=4.3ms s1=191.1ms wait=0.1/45.4ms pred gate=device Token # 982: 114.512ms; value: next_token_ids=tensor([2382], device='cuda:0') mtp accept=1 prop=2382 top1=2382 accp=0.977 next=draft=92 prop=92 olap pair=109.1ms serial=193.8ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.5ms wait=0.1/45.4ms pred gate=device Token # 983: 3.739ms; value: next_token_ids=tensor([92], device='cuda:0') mtp accept=1 prop=92 top1=92 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 984: 114.317ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=19 prop=19 olap pair=109.0ms serial=193.6ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.3ms wait=0.1/45.5ms pred gate=device Token # 985: 3.673ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 986: 114.566ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=0 prop=303 top1=1237 accp=0.225 next=draft=2186 prop=2186 olap pair=109.3ms serial=194.0ms gain=84.7ms ratio=0.44 s0=4.2ms s1=189.8ms wait=0.1/45.3ms pred gate=device Token # 987: 114.915ms; value: next_token_ids=tensor([2186], device='cuda:0') mtp accept=1 prop=2186 top1=2186 accp=0.831 next=draft=7849 prop=7849 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=4.2ms s1=190.5ms wait=0.1/45.6ms pred gate=device Token # 988: 3.709ms; value: next_token_ids=tensor([12718], device='cuda:0') mtp accept=0 prop=7849 top1=12718 accp=0.320 next=pair draft=15974 prop=15974 pred gate=device Token # 989: 114.495ms; value: next_token_ids=tensor([1299], device='cuda:0') mtp accept=0 prop=15974 top1=1299 accp=0.000 next=draft=4398 prop=4398 olap pair=109.2ms serial=193.9ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.6ms wait=0.1/45.4ms pred gate=device Token # 990: 121.838ms; value: next_token_ids=tensor([4398], device='cuda:0') mtp accept=1 prop=4398 top1=4398 accp=1.000 next=draft=1057 prop=1057 olap pair=110.4ms serial=195.3ms gain=84.9ms ratio=0.43 s0=4.2ms s1=191.0ms wait=0.1/45.6ms pred gate=device Token # 991: 3.721ms; value: next_token_ids=tensor([1057], device='cuda:0') mtp accept=1 prop=1057 top1=1057 accp=1.000 next=pair draft=15974 prop=15974 pred gate=device Token # 992: 114.893ms; value: next_token_ids=tensor([15974], device='cuda:0') mtp accept=1 prop=15974 top1=15974 accp=1.000 next=draft=7417 prop=7417 olap pair=109.6ms serial=194.5ms gain=84.9ms ratio=0.44 s0=3.9ms s1=190.5ms wait=0.1/46.2ms pred gate=device Token # 993: 3.785ms; value: next_token_ids=tensor([10842], device='cuda:0') mtp accept=0 prop=7417 top1=10842 accp=0.024 next=pair draft=25024 prop=25024 pred gate=device Token # 994: 115.453ms; value: next_token_ids=tensor([25024], device='cuda:0') mtp accept=1 prop=25024 top1=25024 accp=1.000 next=draft=7417 prop=7417 olap pair=110.2ms serial=195.6ms gain=85.5ms ratio=0.44 s0=4.0ms s1=191.6ms wait=0.1/46.0ms pred gate=device Token # 995: 3.730ms; value: next_token_ids=tensor([7417], device='cuda:0') mtp accept=1 prop=7417 top1=7417 accp=1.000 next=pair draft=14612 prop=90779 pred gate=device Token # 996: 115.117ms; value: next_token_ids=tensor([14612], device='cuda:0') mtp accept=0 prop=90779 top1=14612 accp=0.800 next=draft=4383 prop=4383 olap pair=109.9ms serial=195.2ms gain=85.3ms ratio=0.44 s0=4.3ms s1=190.9ms wait=0.1/45.6ms pred gate=device Token # 997: 114.957ms; value: next_token_ids=tensor([4383], device='cuda:0') mtp accept=1 prop=4383 top1=4383 accp=1.000 next=draft=4398 prop=4398 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.3ms wait=0.1/45.5ms pred gate=device Token # 998: 3.703ms; value: next_token_ids=tensor([4398], device='cuda:0') mtp accept=1 prop=4398 top1=4398 accp=1.000 next=pair draft=6525 prop=6525 pred gate=device Token # 999: 114.937ms; value: next_token_ids=tensor([422], device='cuda:0') mtp accept=0 prop=6525 top1=422 accp=0.062 next=draft=18580 prop=18580 olap pair=109.7ms serial=194.7ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.5ms wait=0.1/45.6ms pred gate=device Token # 1000: 114.880ms; value: next_token_ids=tensor([18580], device='cuda:0') mtp accept=1 prop=18580 top1=18580 accp=1.000 next=draft=478 prop=478 olap pair=109.6ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.2ms s1=190.5ms wait=0.1/45.6ms pred gate=device Token # 1001: 3.698ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=pair draft=71261 prop=71261 pred gate=device Token # 1002: 114.954ms; value: next_token_ids=tensor([3655], device='cuda:0') mtp accept=0 prop=71261 top1=3655 accp=0.261 next=draft=4597 prop=4597 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.2ms s1=190.6ms wait=0.1/45.5ms pred gate=device Token # 1003: 114.614ms; value: next_token_ids=tensor([4597], device='cuda:0') mtp accept=1 prop=4597 top1=4597 accp=0.999 next=draft=6561 prop=6561 olap pair=109.4ms serial=194.1ms gain=84.7ms ratio=0.44 s0=4.4ms s1=189.7ms wait=0.1/45.4ms pred gate=device Token # 1004: 3.701ms; value: next_token_ids=tensor([5998], device='cuda:0') mtp accept=0 prop=6561 top1=28057 accp=0.067 next=pair draft=6561 prop=6561 pred gate=device Token # 1005: 114.868ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=0 prop=6561 top1=8842 accp=0.003 next=draft=12052 prop=12052 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.4ms wait=0.1/45.4ms pred gate=device Token # 1006: 114.829ms; value: next_token_ids=tensor([12052], device='cuda:0') mtp accept=1 prop=12052 top1=12052 accp=0.998 next=draft=410 prop=410 olap pair=109.5ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.1ms wait=0.1/45.4ms pred gate=device Token # 1007: 3.816ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=0.998 next=pair draft=28057 prop=28057 pred gate=device Token # 1008: 114.806ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=0 prop=28057 top1=6034 accp=0.088 next=draft=26127 prop=26127 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.4ms wait=0.1/45.4ms pred gate=device Token # 1009: 114.813ms; value: next_token_ids=tensor([9853], device='cuda:0') mtp accept=0 prop=26127 top1=9853 accp=0.143 next=draft=410 prop=410 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/45.5ms pred gate=device Token # 1010: 115.103ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=0.574 next=draft=6034 prop=6034 olap pair=109.8ms serial=195.0ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.8ms wait=0.1/45.4ms pred gate=device Token # 1011: 3.750ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=0.738 next=pair draft=1052 prop=1052 pred gate=device Token # 1012: 114.750ms; value: next_token_ids=tensor([1052], device='cuda:0') mtp accept=1 prop=1052 top1=1052 accp=0.997 next=draft=16303 prop=16303 olap pair=109.5ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.1ms wait=0.1/45.3ms pred gate=device Token # 1013: 3.707ms; value: next_token_ids=tensor([16303], device='cuda:0') mtp accept=1 prop=16303 top1=16303 accp=0.832 next=pair draft=70359 prop=70359 pred gate=device Token # 1014: 114.416ms; value: next_token_ids=tensor([70359], device='cuda:0') mtp accept=1 prop=70359 top1=70359 accp=1.000 next=draft=3515 prop=3515 olap pair=109.2ms serial=193.8ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.5ms wait=0.1/45.6ms pred gate=device Token # 1015: 3.685ms; value: next_token_ids=tensor([3515], device='cuda:0') mtp accept=1 prop=3515 top1=3515 accp=0.574 next=pair draft=45045 prop=45045 pred gate=device Token # 1016: 114.627ms; value: next_token_ids=tensor([45045], device='cuda:0') mtp accept=1 prop=45045 top1=45045 accp=1.000 next=draft=3099 prop=3099 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.9ms wait=0.1/45.6ms pred gate=device Token # 1017: 3.720ms; value: next_token_ids=tensor([3099], device='cuda:0') mtp accept=1 prop=3099 top1=3099 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 1018: 114.324ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=46025 prop=46025 olap pair=109.1ms serial=193.7ms gain=84.6ms ratio=0.44 s0=4.2ms s1=189.4ms wait=0.1/45.5ms pred gate=device Token # 1019: 3.731ms; value: next_token_ids=tensor([46025], device='cuda:0') mtp accept=1 prop=46025 top1=46025 accp=0.663 next=pair draft=30834 prop=30834 pred gate=device Token # 1020: 114.975ms; value: next_token_ids=tensor([30834], device='cuda:0') mtp accept=1 prop=30834 top1=30834 accp=0.786 next=draft=66518 prop=66518 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.7ms wait=0.1/45.5ms pred gate=device Token # 1021: 3.780ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=pair draft=13349 prop=13349 pred gate=device Token # 1022: 114.281ms; value: next_token_ids=tensor([13349], device='cuda:0') mtp accept=1 prop=13349 top1=13349 accp=1.000 next=draft=320 prop=320 olap pair=109.0ms serial=193.5ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.2ms wait=0.1/45.5ms pred gate=device Token # 1023: 3.698ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=9168 prop=8040 pred gate=device Token # 1024: 114.257ms; value: next_token_ids=tensor([8040], device='cuda:0') mtp accept=1 prop=8040 top1=9168 accp=0.704 next=draft=303 prop=303 olap pair=109.1ms serial=193.6ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.3ms wait=0.1/45.4ms pred gate=device Token # 1025: 3.706ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=2204 prop=2204 pred gate=device Token # 1026: 114.316ms; value: next_token_ids=tensor([2204], device='cuda:0') mtp accept=1 prop=2204 top1=2204 accp=0.964 next=draft=8842 prop=8842 olap pair=109.1ms serial=193.6ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.4ms wait=0.1/45.5ms pred gate=device Token # 1027: 3.671ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=0.997 next=pair draft=12701 prop=12701 pred gate=device Token # 1028: 114.452ms; value: next_token_ids=tensor([39133], device='cuda:0') mtp accept=0 prop=12701 top1=39133 accp=0.109 next=draft=303 prop=303 olap pair=109.2ms serial=193.9ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.6ms wait=0.1/45.5ms pred gate=device Token # 1029: 115.023ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.993 next=draft=1263 prop=1263 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=4.2ms s1=190.7ms wait=0.1/45.5ms pred gate=device Token # 1030: 3.692ms; value: next_token_ids=tensor([1263], device='cuda:0') mtp accept=1 prop=1263 top1=1263 accp=0.996 next=pair draft=9501 prop=9501 pred gate=device Token # 1031: 114.806ms; value: next_token_ids=tensor([9501], device='cuda:0') mtp accept=1 prop=9501 top1=9501 accp=0.996 next=draft=45276 prop=45276 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=4.4ms s1=190.2ms wait=0.1/45.3ms pred gate=device Token # 1032: 3.720ms; value: next_token_ids=tensor([45276], device='cuda:0') mtp accept=1 prop=45276 top1=45276 accp=1.000 next=pair draft=6034 prop=6034 pred gate=device Token # 1033: 114.521ms; value: next_token_ids=tensor([90738], device='cuda:0') mtp accept=0 prop=6034 top1=90738 accp=0.146 next=draft=572 prop=572 olap pair=109.2ms serial=193.9ms gain=84.6ms ratio=0.44 s0=4.7ms s1=189.2ms wait=0.1/44.9ms pred gate=device Token # 1034: 114.945ms; value: next_token_ids=tensor([572], device='cuda:0') mtp accept=1 prop=572 top1=572 accp=1.000 next=draft=1847 prop=1847 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=4.7ms s1=190.0ms wait=0.1/45.0ms pred gate=device Token # 1035: 3.735ms; value: next_token_ids=tensor([1847], device='cuda:0') mtp accept=1 prop=1847 top1=1847 accp=1.000 next=pair draft=2204 prop=2204 pred gate=device Token # 1036: 114.611ms; value: next_token_ids=tensor([2204], device='cuda:0') mtp accept=1 prop=2204 top1=2204 accp=1.000 next=draft=8842 prop=8842 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.8ms wait=0.1/45.5ms pred gate=device Token # 1037: 3.713ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=1.000 next=pair draft=17030 prop=17030 pred gate=device Token # 1038: 114.797ms; value: next_token_ids=tensor([17030], device='cuda:0') mtp accept=1 prop=17030 top1=17030 accp=0.995 next=draft=303 prop=303 olap pair=109.6ms serial=194.5ms gain=84.9ms ratio=0.44 s0=4.2ms s1=190.2ms wait=0.1/45.6ms pred gate=device Token # 1039: 3.738ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=37800 prop=37800 pred gate=device Token # 1040: 114.566ms; value: next_token_ids=tensor([37800], device='cuda:0') mtp accept=1 prop=37800 top1=37800 accp=0.982 next=draft=2353 prop=2353 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.8ms wait=0.1/45.3ms pred gate=device Token # 1041: 3.796ms; value: next_token_ids=tensor([2353], device='cuda:0') mtp accept=1 prop=2353 top1=2353 accp=0.676 next=pair draft=1121 prop=1121 pred gate=device Token # 1042: 114.679ms; value: next_token_ids=tensor([1121], device='cuda:0') mtp accept=1 prop=1121 top1=1121 accp=1.000 next=draft=66518 prop=66518 olap pair=109.5ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.4ms s1=190.0ms wait=0.1/45.3ms pred gate=device Token # 1043: 3.711ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=pair draft=10581 prop=10581 pred gate=device Token # 1044: 114.731ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=0 prop=10581 top1=445 accp=0.206 next=draft=13097 prop=13097 olap pair=109.5ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.1ms wait=0.1/45.3ms pred gate=device Token # 1045: 115.711ms; value: next_token_ids=tensor([13097], device='cuda:0') mtp accept=1 prop=13097 top1=13097 accp=1.000 next=draft=90738 prop=90738 olap pair=110.4ms serial=196.2ms gain=85.8ms ratio=0.44 s0=4.3ms s1=191.9ms wait=0.1/45.3ms pred gate=device Token # 1046: 3.709ms; value: next_token_ids=tensor([90738], device='cuda:0') mtp accept=1 prop=90738 top1=90738 accp=1.000 next=pair draft=572 prop=572 pred gate=device Token # 1047: 114.809ms; value: next_token_ids=tensor([572], device='cuda:0') mtp accept=1 prop=572 top1=572 accp=1.000 next=draft=31446 prop=8738 olap pair=109.5ms serial=194.6ms gain=85.1ms ratio=0.44 s0=3.9ms s1=190.7ms wait=0.1/46.2ms pred gate=device Token # 1048: 3.683ms; value: next_token_ids=tensor([10780], device='cuda:0') mtp accept=0 prop=8738 top1=10780 accp=0.197 next=pair draft=1847 prop=1847 pred gate=device Token # 1049: 115.499ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=0 prop=1847 top1=8842 accp=0.006 next=draft=320 prop=320 olap pair=110.2ms serial=195.9ms gain=85.7ms ratio=0.44 s0=3.7ms s1=192.2ms wait=0.1/46.6ms pred gate=device Token # 1050: 115.480ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.922 next=draft=128799 prop=128799 olap pair=110.2ms serial=195.6ms gain=85.4ms ratio=0.44 s0=3.9ms s1=191.7ms wait=0.1/46.3ms pred gate=device Token # 1051: 3.660ms; value: next_token_ids=tensor([128799], device='cuda:0') mtp accept=1 prop=128799 top1=128799 accp=0.960 next=pair draft=445 prop=445 pred gate=device Token # 1052: 114.697ms; value: next_token_ids=tensor([47507], device='cuda:0') mtp accept=0 prop=445 top1=38463 accp=0.193 next=draft=12201 prop=12201 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=3.7ms s1=190.9ms wait=0.1/46.6ms pred gate=device Token # 1053: 115.004ms; value: next_token_ids=tensor([12201], device='cuda:0') mtp accept=1 prop=12201 top1=12201 accp=0.919 next=draft=9068 prop=9068 olap pair=109.7ms serial=194.6ms gain=84.9ms ratio=0.44 s0=4.7ms s1=190.0ms wait=0.1/45.5ms pred gate=device Token # 1054: 3.698ms; value: next_token_ids=tensor([21373], device='cuda:0') mtp accept=0 prop=9068 top1=9068 accp=0.972 next=pair draft=3699 prop=3699 pred gate=device Token # 1055: 115.018ms; value: next_token_ids=tensor([19570], device='cuda:0') mtp accept=0 prop=3699 top1=63812 accp=0.316 next=draft=9048 prop=9048 olap pair=109.8ms serial=195.1ms gain=85.3ms ratio=0.44 s0=3.8ms s1=191.3ms wait=0.1/46.5ms pred gate=device Token # 1056: 114.974ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=0 prop=9048 top1=8842 accp=0.281 next=draft=36101 prop=36101 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.5ms wait=0.1/45.4ms pred gate=device Token # 1057: 114.991ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=0.991 next=draft=525 prop=525 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.5ms wait=0.1/45.4ms pred gate=device Token # 1058: 3.697ms; value: next_token_ids=tensor([525], device='cuda:0') mtp accept=1 prop=525 top1=525 accp=0.825 next=pair draft=25873 prop=25873 pred gate=device Token # 1059: 114.793ms; value: next_token_ids=tensor([25873], device='cuda:0') mtp accept=1 prop=25873 top1=5372 accp=0.548 next=draft=66518 prop=66518 olap pair=109.6ms serial=194.5ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/45.6ms pred gate=device Token # 1060: 3.695ms; value: next_token_ids=tensor([45650], device='cuda:0') mtp accept=0 prop=66518 top1=66518 accp=0.538 next=pair draft=66518 prop=66518 pred gate=device Token # 1061: 114.406ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=draft=13349 prop=13349 olap pair=109.1ms serial=193.6ms gain=84.5ms ratio=0.44 s0=4.2ms s1=189.5ms wait=0.1/45.6ms pred gate=device Token # 1062: 3.694ms; value: next_token_ids=tensor([13349], device='cuda:0') mtp accept=1 prop=13349 top1=13349 accp=0.991 next=pair draft=10179 prop=10179 pred gate=device Token # 1063: 114.694ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=0 prop=10179 top1=303 accp=0.333 next=draft=9422 prop=9422 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/45.4ms pred gate=device Token # 1064: 115.019ms; value: next_token_ids=tensor([9422], device='cuda:0') mtp accept=1 prop=9422 top1=9422 accp=0.975 next=draft=410 prop=410 olap pair=109.8ms serial=194.8ms gain=85.0ms ratio=0.44 s0=4.4ms s1=190.5ms wait=0.1/45.4ms pred gate=device Token # 1065: 3.755ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=0.979 next=pair draft=8835 prop=8835 pred gate=device Token # 1066: 115.290ms; value: next_token_ids=tensor([8835], device='cuda:0') mtp accept=1 prop=8835 top1=8835 accp=1.000 next=draft=410 prop=410 olap pair=110.1ms serial=195.6ms gain=85.5ms ratio=0.44 s0=4.3ms s1=191.4ms wait=0.1/45.5ms pred gate=device Token # 1067: 3.777ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=1.000 next=pair draft=10124 prop=10124 pred gate=device Token # 1068: 115.551ms; value: next_token_ids=tensor([10124], device='cuda:0') mtp accept=1 prop=10124 top1=10124 accp=1.000 next=draft=410 prop=410 olap pair=110.4ms serial=196.2ms gain=85.8ms ratio=0.44 s0=4.2ms s1=192.0ms wait=0.1/45.7ms pred gate=device Token # 1069: 3.770ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=0.993 next=pair draft=7989 prop=7989 pred gate=device Token # 1070: 114.578ms; value: next_token_ids=tensor([7989], device='cuda:0') mtp accept=1 prop=7989 top1=7989 accp=1.000 next=draft=303 prop=303 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.1ms wait=0.1/45.6ms pred gate=device Token # 1071: 3.690ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.951 next=pair draft=1380 prop=1380 pred gate=device Token # 1072: 114.975ms; value: next_token_ids=tensor([1380], device='cuda:0') mtp accept=1 prop=1380 top1=1380 accp=0.987 next=draft=3803 prop=3803 olap pair=109.8ms serial=195.0ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.7ms wait=0.1/45.4ms pred gate=device Token # 1073: 3.664ms; value: next_token_ids=tensor([3803], device='cuda:0') mtp accept=1 prop=3803 top1=3803 accp=0.997 next=pair draft=89083 prop=89083 pred gate=device Token # 1074: 115.138ms; value: next_token_ids=tensor([89083], device='cuda:0') mtp accept=1 prop=89083 top1=89083 accp=0.886 next=draft=2382 prop=4383 olap pair=110.0ms serial=195.3ms gain=85.3ms ratio=0.44 s0=4.4ms s1=191.0ms wait=0.1/45.4ms pred gate=device Token # 1075: 3.762ms; value: next_token_ids=tensor([4383], device='cuda:0') mtp accept=1 prop=4383 top1=2382 accp=0.819 next=pair draft=114710 prop=12052 pred gate=device Token # 1076: 114.606ms; value: next_token_ids=tensor([12052], device='cuda:0') mtp accept=1 prop=12052 top1=12052 accp=0.439 next=draft=1237 prop=1237 olap pair=109.4ms serial=194.4ms gain=85.0ms ratio=0.44 s0=3.9ms s1=190.5ms wait=0.1/46.3ms pred gate=device Token # 1077: 3.711ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=0.795 next=pair draft=2382 prop=2382 pred gate=device Token # 1078: 114.970ms; value: next_token_ids=tensor([2382], device='cuda:0') mtp accept=1 prop=2382 top1=2382 accp=0.990 next=draft=92 prop=92 olap pair=109.7ms serial=195.2ms gain=85.5ms ratio=0.44 s0=3.7ms s1=191.5ms wait=0.1/46.6ms pred gate=device Token # 1079: 3.700ms; value: next_token_ids=tensor([92], device='cuda:0') mtp accept=1 prop=92 top1=92 accp=1.000 next=pair draft=73076 prop=1227 pred gate=device Token # 1080: 115.340ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=0 prop=1227 top1=31 accp=0.713 next=draft=19 prop=19 olap pair=110.1ms serial=195.0ms gain=84.9ms ratio=0.44 s0=4.0ms s1=191.0ms wait=0.1/46.3ms pred gate=device Token # 1081: 115.679ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=1227 prop=1227 olap pair=110.4ms serial=195.1ms gain=84.7ms ratio=0.43 s0=4.0ms s1=191.1ms wait=0.1/46.4ms pred gate=device Token # 1082: 3.702ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=1 prop=1227 top1=1227 accp=1.000 next=pair draft=36101 prop=11753 pred gate=device Token # 1083: 114.772ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=0 prop=11753 top1=36101 accp=0.757 next=draft=36101 prop=36101 olap pair=109.6ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.2ms s1=190.4ms wait=0.1/45.7ms pred gate=device Token # 1084: 114.987ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=0.937 next=draft=17520 prop=17520 olap pair=109.7ms serial=194.9ms gain=85.1ms ratio=0.44 s0=4.2ms s1=190.7ms wait=0.1/45.7ms pred gate=device Token # 1085: 3.837ms; value: next_token_ids=tensor([17520], device='cuda:0') mtp accept=1 prop=17520 top1=17520 accp=0.903 next=pair draft=6985 prop=6985 pred gate=device Token # 1086: 117.899ms; value: next_token_ids=tensor([6985], device='cuda:0') mtp accept=1 prop=6985 top1=6985 accp=1.000 next=draft=18580 prop=18580 olap pair=112.6ms serial=197.2ms gain=84.6ms ratio=0.43 s0=4.7ms s1=192.5ms wait=0.1/45.1ms pred gate=device Token # 1087: 3.735ms; value: next_token_ids=tensor([18580], device='cuda:0') mtp accept=1 prop=18580 top1=18580 accp=1.000 next=pair draft=946 prop=946 pred gate=device Token # 1088: 114.762ms; value: next_token_ids=tensor([946], device='cuda:0') mtp accept=1 prop=946 top1=946 accp=1.000 next=draft=478 prop=478 olap pair=109.5ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.7ms s1=189.6ms wait=0.1/44.9ms pred gate=device Token # 1089: 3.730ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=pair draft=372 prop=372 pred gate=device Token # 1090: 114.322ms; value: next_token_ids=tensor([372], device='cuda:0') mtp accept=1 prop=372 top1=372 accp=1.000 next=draft=223 prop=223 olap pair=109.2ms serial=193.8ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.6ms wait=0.1/45.4ms pred gate=device Token # 1091: 3.696ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.999 next=pair draft=378 prop=378 pred gate=device Token # 1092: 115.102ms; value: next_token_ids=tensor([378], device='cuda:0') mtp accept=1 prop=378 top1=378 accp=0.842 next=draft=410 prop=410 olap pair=109.9ms serial=195.3ms gain=85.4ms ratio=0.44 s0=4.3ms s1=191.1ms wait=0.1/45.6ms pred gate=device Token # 1093: 3.668ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=1.000 next=pair draft=45650 prop=45650 pred gate=device Token # 1094: 114.693ms; value: next_token_ids=tensor([45650], device='cuda:0') mtp accept=1 prop=45650 top1=45650 accp=0.983 next=draft=66518 prop=66518 olap pair=109.5ms serial=194.4ms gain=84.8ms ratio=0.44 s0=4.1ms s1=190.3ms wait=0.1/45.9ms pred gate=device Token # 1095: 3.699ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=pair draft=13349 prop=13349 pred gate=device Token # 1096: 113.996ms; value: next_token_ids=tensor([13349], device='cuda:0') mtp accept=1 prop=13349 top1=13349 accp=1.000 next=draft=60360 prop=60360 olap pair=108.7ms serial=193.1ms gain=84.4ms ratio=0.44 s0=3.7ms s1=189.4ms wait=0.1/46.7ms pred gate=device Token # 1097: 3.707ms; value: next_token_ids=tensor([60360], device='cuda:0') mtp accept=1 prop=60360 top1=60360 accp=1.000 next=pair draft=271 prop=271 pred gate=device Token # 1098: 114.619ms; value: next_token_ids=tensor([271], device='cuda:0') mtp accept=1 prop=271 top1=271 accp=1.000 next=draft=795 prop=795 olap pair=109.4ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.9ms wait=0.1/45.4ms pred gate=device Token # 1099: 3.692ms; value: next_token_ids=tensor([795], device='cuda:0') mtp accept=1 prop=795 top1=795 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1100: 115.917ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.992 next=draft=19 prop=19 olap pair=110.6ms serial=195.7ms gain=85.1ms ratio=0.43 s0=4.3ms s1=191.4ms wait=0.1/45.5ms pred gate=device Token # 1101: 3.652ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 1102: 114.214ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=2619 prop=2619 olap pair=109.0ms serial=193.6ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.3ms wait=0.1/45.6ms pred gate=device Token # 1103: 3.694ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.995 next=pair draft=3374 prop=3374 pred gate=device Token # 1104: 114.582ms; value: next_token_ids=tensor([3374], device='cuda:0') mtp accept=1 prop=3374 top1=3374 accp=0.994 next=draft=66518 prop=66518 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.8ms wait=0.1/45.5ms pred gate=device Token # 1105: 3.712ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=pair draft=1237 prop=343 pred gate=device Token # 1106: 114.581ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=0 prop=343 top1=1237 accp=0.750 next=draft=9422 prop=4532 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.2ms s1=190.1ms wait=0.1/45.4ms pred gate=device Token # 1107: 115.097ms; value: next_token_ids=tensor([4532], device='cuda:0') mtp accept=1 prop=4532 top1=4532 accp=0.779 next=draft=50294 prop=50294 olap pair=109.8ms serial=195.1ms gain=85.3ms ratio=0.44 s0=4.2ms s1=190.8ms wait=0.1/45.6ms pred gate=device Token # 1108: 3.694ms; value: next_token_ids=tensor([50294], device='cuda:0') mtp accept=1 prop=50294 top1=50294 accp=1.000 next=pair draft=1478 prop=1478 pred gate=device Token # 1109: 114.441ms; value: next_token_ids=tensor([1478], device='cuda:0') mtp accept=1 prop=1478 top1=1478 accp=1.000 next=draft=14 prop=14 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=4.2ms s1=189.8ms wait=0.1/45.5ms pred gate=device Token # 1110: 3.747ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=0.886 next=pair draft=43151 prop=43151 pred gate=device Token # 1111: 114.841ms; value: next_token_ids=tensor([43151], device='cuda:0') mtp accept=1 prop=43151 top1=43151 accp=1.000 next=draft=1227 prop=1227 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=4.2ms s1=190.6ms wait=0.1/45.6ms pred gate=device Token # 1112: 3.657ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=1 prop=1227 top1=1227 accp=1.000 next=pair draft=5866 prop=5866 pred gate=device Token # 1113: 114.705ms; value: next_token_ids=tensor([5866], device='cuda:0') mtp accept=1 prop=5866 top1=5866 accp=1.000 next=draft=15 prop=15 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/45.5ms pred gate=device Token # 1114: 3.678ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=0.799 next=pair draft=2619 prop=2619 pred gate=device Token # 1115: 114.513ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=draft=14087 prop=14087 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=4.2ms s1=189.8ms wait=0.1/45.6ms pred gate=device Token # 1116: 3.677ms; value: next_token_ids=tensor([14087], device='cuda:0') mtp accept=1 prop=14087 top1=14087 accp=0.937 next=pair draft=666 prop=666 pred gate=device Token # 1117: 114.604ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=768 prop=768 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.0ms s1=190.5ms wait=0.1/46.1ms pred gate=device Token # 1118: 3.678ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=1275 prop=1275 pred gate=device Token # 1119: 114.753ms; value: next_token_ids=tensor([1275], device='cuda:0') mtp accept=1 prop=1275 top1=1275 accp=0.974 next=draft=13380 prop=30869 olap pair=109.6ms serial=194.8ms gain=85.2ms ratio=0.44 s0=3.7ms s1=191.2ms wait=0.1/46.8ms pred gate=device Token # 1120: 3.660ms; value: next_token_ids=tensor([8979], device='cuda:0') mtp accept=0 prop=30869 top1=8979 accp=0.034 next=pair draft=3374 prop=3374 pred gate=device Token # 1121: 114.465ms; value: next_token_ids=tensor([3374], device='cuda:0') mtp accept=1 prop=3374 top1=3374 accp=0.998 next=draft=60540 prop=60540 olap pair=109.3ms serial=194.1ms gain=84.9ms ratio=0.44 s0=3.7ms s1=190.4ms wait=0.1/46.6ms pred gate=device Token # 1122: 3.666ms; value: next_token_ids=tensor([31446], device='cuda:0') mtp accept=0 prop=60540 top1=31446 accp=0.144 next=pair draft=621 prop=621 pred gate=device Token # 1123: 114.485ms; value: next_token_ids=tensor([621], device='cuda:0') mtp accept=1 prop=621 top1=621 accp=0.992 next=draft=13097 prop=13097 olap pair=109.2ms serial=193.9ms gain=84.7ms ratio=0.44 s0=4.7ms s1=189.2ms wait=0.1/45.2ms pred gate=device Token # 1124: 3.721ms; value: next_token_ids=tensor([13097], device='cuda:0') mtp accept=1 prop=13097 top1=13097 accp=0.983 next=pair draft=6034 prop=6034 pred gate=device Token # 1125: 114.292ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=0.996 next=draft=572 prop=572 olap pair=109.1ms serial=193.7ms gain=84.6ms ratio=0.44 s0=4.8ms s1=188.9ms wait=0.1/45.0ms pred gate=device Token # 1126: 3.724ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=572 top1=303 accp=0.114 next=pair draft=7849 prop=7849 pred gate=device Token # 1127: 114.897ms; value: next_token_ids=tensor([7849], device='cuda:0') mtp accept=1 prop=7849 top1=7849 accp=1.000 next=draft=6034 prop=6034 olap pair=109.6ms serial=194.3ms gain=84.7ms ratio=0.44 s0=4.8ms s1=189.4ms wait=0.1/44.8ms pred gate=device Token # 1128: 3.705ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=pair draft=7712 prop=450 pred gate=device Token # 1129: 114.655ms; value: next_token_ids=tensor([7712], device='cuda:0') mtp accept=0 prop=450 top1=7712 accp=0.805 next=draft=30869 prop=30869 olap pair=109.4ms serial=194.1ms gain=84.7ms ratio=0.44 s0=4.2ms s1=189.8ms wait=0.1/45.6ms pred gate=device Token # 1130: 114.506ms; value: next_token_ids=tensor([30869], device='cuda:0') mtp accept=1 prop=30869 top1=30869 accp=0.999 next=draft=8842 prop=8842 olap pair=109.2ms serial=193.9ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.6ms wait=0.1/45.6ms pred gate=device Token # 1131: 3.700ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=1.000 next=pair draft=52600 prop=52600 pred gate=device Token # 1132: 114.091ms; value: next_token_ids=tensor([52600], device='cuda:0') mtp accept=1 prop=52600 top1=52600 accp=1.000 next=draft=303 prop=201 olap pair=108.9ms serial=193.4ms gain=84.5ms ratio=0.44 s0=4.2ms s1=189.2ms wait=0.1/45.6ms pred gate=device Token # 1133: 3.681ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=201 top1=303 accp=0.742 next=pair draft=9968 prop=9968 pred gate=device Token # 1134: 114.592ms; value: next_token_ids=tensor([9968], device='cuda:0') mtp accept=1 prop=9968 top1=9968 accp=0.985 next=draft=4339 prop=4339 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.9ms wait=0.1/45.6ms pred gate=device Token # 1135: 3.716ms; value: next_token_ids=tensor([1824], device='cuda:0') mtp accept=0 prop=4339 top1=1824 accp=0.317 next=pair draft=974 prop=974 pred gate=device Token # 1136: 114.333ms; value: next_token_ids=tensor([974], device='cuda:0') mtp accept=1 prop=974 top1=974 accp=1.000 next=draft=1427 prop=1427 olap pair=109.0ms serial=193.5ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.3ms wait=0.1/45.7ms pred gate=device Token # 1137: 3.714ms; value: next_token_ids=tensor([1427], device='cuda:0') mtp accept=1 prop=1427 top1=1427 accp=1.000 next=pair draft=13062 prop=13062 pred gate=device Token # 1138: 114.786ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=0 prop=13062 top1=17 accp=0.339 next=draft=303 prop=303 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.3ms wait=0.1/45.6ms pred gate=device Token # 1139: 114.446ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=0 prop=303 top1=201 accp=0.786 next=draft=15 prop=15 olap pair=109.2ms serial=193.8ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.5ms wait=0.1/45.4ms pred gate=device Token # 1140: 114.529ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=2619 prop=437 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=4.2ms s1=189.9ms wait=0.1/45.5ms pred gate=device Token # 1141: 3.721ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=0 prop=437 top1=2619 accp=0.894 next=pair draft=16303 prop=16303 pred gate=device Token # 1142: 117.081ms; value: next_token_ids=tensor([16303], device='cuda:0') mtp accept=1 prop=16303 top1=16303 accp=0.971 next=draft=3606 prop=3606 olap pair=109.4ms serial=193.8ms gain=84.5ms ratio=0.44 s0=4.5ms s1=189.3ms wait=0.1/45.2ms pred gate=device Token # 1143: 3.709ms; value: next_token_ids=tensor([6508], device='cuda:0') mtp accept=0 prop=3606 top1=6508 accp=0.006 next=pair draft=666 prop=666 pred gate=device Token # 1144: 114.515ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=768 prop=768 olap pair=109.2ms serial=193.9ms gain=84.7ms ratio=0.44 s0=3.8ms s1=190.1ms wait=0.1/46.6ms pred gate=device Token # 1145: 3.697ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.997 next=pair draft=57168 prop=57168 pred gate=device Token # 1146: 114.730ms; value: next_token_ids=tensor([974], device='cuda:0') mtp accept=0 prop=57168 top1=974 accp=0.248 next=draft=1427 prop=1427 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.3ms wait=0.1/45.6ms pred gate=device Token # 1147: 114.697ms; value: next_token_ids=tensor([1427], device='cuda:0') mtp accept=1 prop=1427 top1=1427 accp=0.753 next=draft=13062 prop=13062 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/45.5ms pred gate=device Token # 1148: 3.637ms; value: next_token_ids=tensor([13062], device='cuda:0') mtp accept=1 prop=13062 top1=13062 accp=0.929 next=pair draft=776 prop=625 pred gate=device Token # 1149: 114.193ms; value: next_token_ids=tensor([37240], device='cuda:0') mtp accept=0 prop=625 top1=776 accp=0.927 next=draft=16303 prop=16303 olap pair=109.0ms serial=193.5ms gain=84.5ms ratio=0.44 s0=4.2ms s1=189.3ms wait=0.1/45.6ms pred gate=device Token # 1150: 115.286ms; value: next_token_ids=tensor([16303], device='cuda:0') mtp accept=1 prop=16303 top1=16303 accp=0.999 next=draft=303 prop=303 olap pair=110.1ms serial=195.6ms gain=85.5ms ratio=0.44 s0=4.3ms s1=191.3ms wait=0.1/45.6ms pred gate=device Token # 1151: 3.679ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.999 next=pair draft=57168 prop=57168 pred gate=device Token # 1152: 114.763ms; value: next_token_ids=tensor([57168], device='cuda:0') mtp accept=1 prop=57168 top1=57168 accp=1.000 next=draft=13062 prop=13062 olap pair=109.6ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/45.6ms pred gate=device Token # 1153: 3.655ms; value: next_token_ids=tensor([13062], device='cuda:0') mtp accept=1 prop=13062 top1=13062 accp=1.000 next=pair draft=2386 prop=2386 pred gate=device Token # 1154: 114.036ms; value: next_token_ids=tensor([2386], device='cuda:0') mtp accept=1 prop=2386 top1=2386 accp=0.693 next=draft=22710 prop=22710 olap pair=108.9ms serial=193.2ms gain=84.3ms ratio=0.44 s0=4.7ms s1=188.5ms wait=0.1/45.0ms pred gate=device Token # 1155: 3.708ms; value: next_token_ids=tensor([5480], device='cuda:0') mtp accept=0 prop=22710 top1=59563 accp=0.404 next=pair draft=102407 prop=102407 pred gate=device Token # 1156: 114.172ms; value: next_token_ids=tensor([102407], device='cuda:0') mtp accept=1 prop=102407 top1=102407 accp=0.940 next=draft=1237 prop=1237 olap pair=108.9ms serial=193.3ms gain=84.4ms ratio=0.44 s0=4.7ms s1=188.6ms wait=0.1/45.0ms pred gate=device Token # 1157: 3.744ms; value: next_token_ids=tensor([22710], device='cuda:0') mtp accept=0 prop=1237 top1=22710 accp=0.100 next=pair draft=59563 prop=59563 pred gate=device Token # 1158: 114.894ms; value: next_token_ids=tensor([59563], device='cuda:0') mtp accept=1 prop=59563 top1=59563 accp=1.000 next=draft=201 prop=201 olap pair=109.6ms serial=194.8ms gain=85.2ms ratio=0.44 s0=4.7ms s1=190.1ms wait=0.1/45.0ms pred gate=device Token # 1159: 3.704ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=1237 accp=0.344 next=pair draft=15 prop=15 pred gate=device Token # 1160: 114.171ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=2619 prop=2619 olap pair=109.0ms serial=193.6ms gain=84.5ms ratio=0.44 s0=4.7ms s1=188.8ms wait=0.1/45.0ms pred gate=device Token # 1161: 3.729ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.984 next=pair draft=36101 prop=36101 pred gate=device Token # 1162: 114.258ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=0.910 next=draft=17520 prop=17520 olap pair=109.1ms serial=193.7ms gain=84.6ms ratio=0.44 s0=4.8ms s1=189.0ms wait=0.1/44.9ms pred gate=device Token # 1163: 3.676ms; value: next_token_ids=tensor([17520], device='cuda:0') mtp accept=1 prop=17520 top1=17520 accp=0.851 next=pair draft=666 prop=666 pred gate=device Token # 1164: 114.365ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=0.998 next=draft=768 prop=7524 olap pair=109.2ms serial=193.9ms gain=84.6ms ratio=0.44 s0=4.7ms s1=189.2ms wait=0.1/44.9ms pred gate=device Token # 1165: 3.738ms; value: next_token_ids=tensor([7524], device='cuda:0') mtp accept=1 prop=7524 top1=768 accp=0.860 next=pair draft=223 prop=223 pred gate=device Token # 1166: 114.398ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.999 next=draft=565 prop=565 olap pair=109.2ms serial=193.8ms gain=84.6ms ratio=0.44 s0=4.7ms s1=189.1ms wait=0.1/45.0ms pred gate=device Token # 1167: 3.661ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=0.999 next=pair draft=223 prop=223 pred gate=device Token # 1168: 113.735ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.930 next=draft=10172 prop=10172 olap pair=108.6ms serial=192.7ms gain=84.1ms ratio=0.44 s0=4.7ms s1=188.0ms wait=0.1/45.0ms pred gate=device Token # 1169: 3.680ms; value: next_token_ids=tensor([10172], device='cuda:0') mtp accept=1 prop=10172 top1=10172 accp=0.973 next=pair draft=625 prop=625 pred gate=device Token # 1170: 114.478ms; value: next_token_ids=tensor([525], device='cuda:0') mtp accept=0 prop=625 top1=768 accp=0.119 next=draft=23750 prop=23750 olap pair=109.4ms serial=194.1ms gain=84.8ms ratio=0.44 s0=4.7ms s1=189.4ms wait=0.1/45.0ms pred gate=device Token # 1171: 114.531ms; value: next_token_ids=tensor([23750], device='cuda:0') mtp accept=1 prop=23750 top1=768 accp=0.444 next=draft=303 prop=303 olap pair=109.3ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.7ms s1=189.5ms wait=0.1/45.0ms pred gate=device Token # 1172: 3.640ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.689 next=pair draft=1207 prop=6441 pred gate=device Token # 1173: 114.142ms; value: next_token_ids=tensor([653], device='cuda:0') mtp accept=0 prop=6441 top1=653 accp=0.199 next=draft=22997 prop=18617 olap pair=109.0ms serial=193.4ms gain=84.4ms ratio=0.44 s0=4.7ms s1=188.8ms wait=0.1/45.0ms pred gate=device Token # 1174: 115.006ms; value: next_token_ids=tensor([18617], device='cuda:0') mtp accept=1 prop=18617 top1=4398 accp=0.239 next=draft=13097 prop=13097 olap pair=109.8ms serial=195.0ms gain=85.2ms ratio=0.44 s0=4.7ms s1=190.3ms wait=0.1/45.0ms pred gate=device Token # 1175: 3.644ms; value: next_token_ids=tensor([10172], device='cuda:0') mtp accept=0 prop=13097 top1=10172 accp=0.157 next=pair draft=3523 prop=3523 pred gate=device Token # 1176: 114.759ms; value: next_token_ids=tensor([3523], device='cuda:0') mtp accept=1 prop=3523 top1=3523 accp=0.768 next=draft=201 prop=201 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=4.0ms s1=190.7ms wait=0.1/46.3ms pred gate=device Token # 1177: 3.727ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1178: 114.631ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=565 prop=565 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=3.8ms s1=190.6ms wait=0.1/46.6ms pred gate=device Token # 1179: 3.719ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=1.000 next=pair draft=223 prop=2619 pred gate=device Token # 1180: 114.345ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.273 next=draft=36101 prop=36101 olap pair=109.2ms serial=193.8ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.5ms wait=0.1/45.6ms pred gate=device Token # 1181: 3.675ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=0.990 next=pair draft=625 prop=625 pred gate=device Token # 1182: 114.690ms; value: next_token_ids=tensor([625], device='cuda:0') mtp accept=1 prop=625 top1=625 accp=0.624 next=draft=666 prop=666 olap pair=109.5ms serial=194.6ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.3ms wait=0.1/45.5ms pred gate=device Token # 1183: 3.691ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=0.998 next=pair draft=768 prop=768 pred gate=device Token # 1184: 114.301ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=2204 prop=2204 olap pair=109.2ms serial=193.8ms gain=84.6ms ratio=0.44 s0=4.1ms s1=189.7ms wait=0.1/45.9ms pred gate=device Token # 1185: 3.649ms; value: next_token_ids=tensor([2204], device='cuda:0') mtp accept=1 prop=2204 top1=2386 accp=0.230 next=pair draft=2382 prop=2382 pred gate=device Token # 1186: 114.751ms; value: next_token_ids=tensor([2382], device='cuda:0') mtp accept=1 prop=2382 top1=2382 accp=0.825 next=draft=92 prop=92 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.9ms wait=0.1/46.7ms pred gate=device Token # 1187: 3.637ms; value: next_token_ids=tensor([92], device='cuda:0') mtp accept=1 prop=92 top1=92 accp=1.000 next=pair draft=32 prop=32 pred gate=device Token # 1188: 114.558ms; value: next_token_ids=tensor([32], device='cuda:0') mtp accept=1 prop=32 top1=32 accp=0.999 next=draft=19 prop=19 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=3.7ms s1=190.5ms wait=0.1/46.6ms pred gate=device Token # 1189: 3.685ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 1190: 115.280ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=41540 prop=41540 olap pair=110.1ms serial=194.8ms gain=84.7ms ratio=0.43 s0=3.7ms s1=191.2ms wait=0.1/46.7ms pred gate=device Token # 1191: 3.696ms; value: next_token_ids=tensor([41540], device='cuda:0') mtp accept=1 prop=41540 top1=55138 accp=0.915 next=pair draft=25024 prop=25024 pred gate=device Token # 1192: 114.683ms; value: next_token_ids=tensor([81143], device='cuda:0') mtp accept=0 prop=25024 top1=3007 accp=0.125 next=draft=119545 prop=13053 olap pair=109.5ms serial=194.6ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.9ms wait=0.1/46.6ms pred gate=device Token # 1193: 115.020ms; value: next_token_ids=tensor([751], device='cuda:0') mtp accept=0 prop=13053 top1=116037 accp=0.218 next=draft=621 prop=621 olap pair=109.7ms serial=195.0ms gain=85.3ms ratio=0.44 s0=3.7ms s1=191.3ms wait=0.1/46.5ms pred gate=device Token # 1194: 115.333ms; value: next_token_ids=tensor([621], device='cuda:0') mtp accept=1 prop=621 top1=621 accp=0.999 next=draft=13097 prop=3007 olap pair=110.0ms serial=195.5ms gain=85.5ms ratio=0.44 s0=3.7ms s1=191.7ms wait=0.1/46.6ms pred gate=device Token # 1195: 3.693ms; value: next_token_ids=tensor([3007], device='cuda:0') mtp accept=1 prop=3007 top1=13097 accp=0.978 next=pair draft=6034 prop=6034 pred gate=device Token # 1196: 114.545ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=0.999 next=draft=1847 prop=1847 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=3.7ms s1=190.6ms wait=0.1/46.7ms pred gate=device Token # 1197: 3.710ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=0 prop=1847 top1=201 accp=0.440 next=pair draft=223 prop=223 pred gate=device Token # 1198: 114.998ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=565 prop=565 olap pair=109.8ms serial=195.1ms gain=85.3ms ratio=0.44 s0=3.7ms s1=191.4ms wait=0.1/46.9ms pred gate=device Token # 1199: 3.688ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=0.999 next=pair draft=2619 prop=2619 pred gate=device Token # 1200: 114.683ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.921 next=draft=2382 prop=2382 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=3.8ms s1=190.7ms wait=0.1/46.8ms pred gate=device Token # 1201: 3.742ms; value: next_token_ids=tensor([2382], device='cuda:0') mtp accept=1 prop=2382 top1=2382 accp=1.000 next=pair draft=92 prop=92 pred gate=device Token # 1202: 114.986ms; value: next_token_ids=tensor([92], device='cuda:0') mtp accept=1 prop=92 top1=92 accp=1.000 next=draft=31 prop=31 olap pair=109.8ms serial=195.1ms gain=85.3ms ratio=0.44 s0=3.7ms s1=191.4ms wait=0.1/46.7ms pred gate=device Token # 1203: 3.740ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 1204: 114.841ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=625 prop=625 olap pair=109.7ms serial=194.2ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.8ms wait=0.1/45.9ms pred gate=device Token # 1205: 3.751ms; value: next_token_ids=tensor([625], device='cuda:0') mtp accept=1 prop=625 top1=625 accp=0.989 next=pair draft=666 prop=666 pred gate=device Token # 1206: 114.526ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=768 prop=768 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.1ms s1=190.2ms wait=0.1/46.2ms pred gate=device Token # 1207: 3.656ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=6525 prop=6525 pred gate=device Token # 1208: 115.307ms; value: next_token_ids=tensor([6525], device='cuda:0') mtp accept=1 prop=6525 top1=6525 accp=0.536 next=draft=2541 prop=2541 olap pair=109.4ms serial=193.8ms gain=84.4ms ratio=0.44 s0=5.8ms s1=188.0ms wait=0.2/44.2ms pred gate=device Token # 1209: 4.544ms; value: next_token_ids=tensor([31446], device='cuda:0') mtp accept=0 prop=2541 top1=31446 accp=0.164 next=pair draft=3374 prop=3374 pred gate=device Token # 1210: 115.393ms; value: next_token_ids=tensor([45276], device='cuda:0') mtp accept=0 prop=3374 top1=45276 accp=0.026 next=draft=25024 prop=25024 olap pair=110.0ms serial=195.2ms gain=85.2ms ratio=0.44 s0=4.7ms s1=190.5ms wait=0.1/45.0ms pred gate=device Token # 1211: 115.121ms; value: next_token_ids=tensor([25024], device='cuda:0') mtp accept=1 prop=25024 top1=25024 accp=1.000 next=draft=303 prop=303 olap pair=109.9ms serial=195.3ms gain=85.4ms ratio=0.44 s0=4.3ms s1=191.0ms wait=0.1/45.5ms pred gate=device Token # 1212: 3.684ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=422 prop=666 pred gate=device Token # 1213: 114.421ms; value: next_token_ids=tensor([9422], device='cuda:0') mtp accept=0 prop=666 top1=9422 accp=0.668 next=draft=30463 prop=422 olap pair=109.3ms serial=194.2ms gain=84.9ms ratio=0.44 s0=4.0ms s1=190.1ms wait=0.1/46.0ms pred gate=device Token # 1214: 114.519ms; value: next_token_ids=tensor([422], device='cuda:0') mtp accept=1 prop=422 top1=422 accp=0.311 next=draft=18580 prop=18580 olap pair=109.3ms serial=194.3ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.6ms wait=0.1/46.7ms pred gate=device Token # 1215: 3.684ms; value: next_token_ids=tensor([18580], device='cuda:0') mtp accept=1 prop=18580 top1=18580 accp=0.999 next=pair draft=271 prop=271 pred gate=device Token # 1216: 114.532ms; value: next_token_ids=tensor([271], device='cuda:0') mtp accept=1 prop=271 top1=271 accp=0.986 next=draft=795 prop=795 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.2ms s1=190.0ms wait=0.1/45.5ms pred gate=device Token # 1217: 3.755ms; value: next_token_ids=tensor([795], device='cuda:0') mtp accept=1 prop=795 top1=795 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1218: 115.008ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=20 prop=20 olap pair=109.8ms serial=195.2ms gain=85.3ms ratio=0.44 s0=4.3ms s1=190.9ms wait=0.1/45.6ms pred gate=device Token # 1219: 9.882ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 1220: 114.514ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=2619 prop=2619 olap pair=109.3ms serial=193.9ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.5ms wait=0.1/45.5ms pred gate=device Token # 1221: 3.757ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=pair draft=2353 prop=2353 pred gate=device Token # 1222: 114.793ms; value: next_token_ids=tensor([2353], device='cuda:0') mtp accept=1 prop=2353 top1=2353 accp=1.000 next=draft=1121 prop=1121 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.3ms wait=0.1/45.7ms pred gate=device Token # 1223: 3.720ms; value: next_token_ids=tensor([1121], device='cuda:0') mtp accept=1 prop=1121 top1=1121 accp=1.000 next=pair draft=66518 prop=66518 pred gate=device Token # 1224: 114.483ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=draft=1237 prop=1237 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=4.2ms s1=189.9ms wait=0.1/45.6ms pred gate=device Token # 1225: 3.695ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=0.999 next=pair draft=95975 prop=95975 pred gate=device Token # 1226: 114.930ms; value: next_token_ids=tensor([95975], device='cuda:0') mtp accept=1 prop=95975 top1=95975 accp=1.000 next=draft=50294 prop=50294 olap pair=109.7ms serial=194.8ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.6ms wait=0.1/45.5ms pred gate=device Token # 1227: 3.763ms; value: next_token_ids=tensor([50294], device='cuda:0') mtp accept=1 prop=50294 top1=50294 accp=1.000 next=pair draft=1478 prop=1478 pred gate=device Token # 1228: 114.391ms; value: next_token_ids=tensor([1478], device='cuda:0') mtp accept=1 prop=1478 top1=1478 accp=1.000 next=draft=14 prop=14 olap pair=109.3ms serial=194.0ms gain=84.8ms ratio=0.44 s0=4.2ms s1=189.8ms wait=0.1/45.6ms pred gate=device Token # 1229: 3.812ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=1.000 next=pair draft=48159 prop=48159 pred gate=device Token # 1230: 114.505ms; value: next_token_ids=tensor([48159], device='cuda:0') mtp accept=1 prop=48159 top1=48159 accp=1.000 next=draft=1227 prop=1227 olap pair=109.4ms serial=194.2ms gain=84.9ms ratio=0.44 s0=4.2ms s1=190.0ms wait=0.1/45.7ms pred gate=device Token # 1231: 3.644ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=1 prop=1227 top1=1227 accp=0.999 next=pair draft=5866 prop=5866 pred gate=device Token # 1232: 114.516ms; value: next_token_ids=tensor([5866], device='cuda:0') mtp accept=1 prop=5866 top1=5866 accp=1.000 next=draft=15 prop=15 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.2ms s1=190.1ms wait=0.1/45.7ms pred gate=device Token # 1233: 3.639ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=pair draft=2619 prop=2619 pred gate=device Token # 1234: 114.459ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.752 next=draft=14087 prop=14087 olap pair=109.3ms serial=194.1ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.7ms wait=0.1/45.5ms pred gate=device Token # 1235: 3.675ms; value: next_token_ids=tensor([14087], device='cuda:0') mtp accept=1 prop=14087 top1=14087 accp=1.000 next=pair draft=666 prop=666 pred gate=device Token # 1236: 114.338ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=768 prop=768 olap pair=109.1ms serial=193.7ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.5ms wait=0.1/45.6ms pred gate=device Token # 1237: 3.731ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=1275 prop=1275 pred gate=device Token # 1238: 114.464ms; value: next_token_ids=tensor([1275], device='cuda:0') mtp accept=1 prop=1275 top1=1275 accp=1.000 next=draft=8842 prop=8842 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.8ms wait=0.1/45.4ms pred gate=device Token # 1239: 3.721ms; value: next_token_ids=tensor([52727], device='cuda:0') mtp accept=0 prop=8842 top1=8842 accp=0.679 next=pair draft=51259 prop=51259 pred gate=device Token # 1240: 115.062ms; value: next_token_ids=tensor([45276], device='cuda:0') mtp accept=0 prop=51259 top1=45276 accp=0.479 next=draft=51259 prop=51259 olap pair=109.9ms serial=195.1ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.7ms wait=0.1/45.4ms pred gate=device Token # 1241: 114.828ms; value: next_token_ids=tensor([2827], device='cuda:0') mtp accept=0 prop=51259 top1=2827 accp=0.133 next=draft=1237 prop=1237 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=4.1ms s1=190.6ms wait=0.1/45.7ms pred gate=device Token # 1242: 114.857ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=0.989 next=draft=883 prop=883 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.4ms wait=0.1/45.4ms pred gate=device Token # 1243: 3.700ms; value: next_token_ids=tensor([883], device='cuda:0') mtp accept=1 prop=883 top1=883 accp=0.990 next=pair draft=108558 prop=108558 pred gate=device Token # 1244: 114.809ms; value: next_token_ids=tensor([120784], device='cuda:0') mtp accept=0 prop=108558 top1=34081 accp=0.586 next=draft=410 prop=410 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.5ms wait=0.1/45.5ms pred gate=device Token # 1245: 115.009ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=0 prop=410 top1=301 accp=0.123 next=draft=6459 prop=6459 olap pair=109.8ms serial=195.0ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.7ms wait=0.1/46.1ms pred gate=device Token # 1246: 115.273ms; value: next_token_ids=tensor([7398], device='cuda:0') mtp accept=0 prop=6459 top1=7398 accp=0.036 next=draft=50 prop=50 olap pair=110.0ms serial=195.4ms gain=85.4ms ratio=0.44 s0=4.3ms s1=191.0ms wait=0.1/46.4ms pred gate=device Token # 1247: 115.403ms; value: next_token_ids=tensor([50], device='cuda:0') mtp accept=1 prop=50 top1=50 accp=1.000 next=draft=410 prop=1316 olap pair=110.2ms serial=195.7ms gain=85.6ms ratio=0.44 s0=4.2ms s1=191.5ms wait=0.1/46.2ms pred gate=device Token # 1248: 3.781ms; value: next_token_ids=tensor([1316], device='cuda:0') mtp accept=1 prop=1316 top1=410 accp=0.687 next=pair draft=108558 prop=108558 pred gate=device Token # 1249: 114.973ms; value: next_token_ids=tensor([108558], device='cuda:0') mtp accept=1 prop=108558 top1=108558 accp=0.986 next=draft=2827 prop=2827 olap pair=109.8ms serial=194.9ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.6ms wait=0.1/46.0ms pred gate=device Token # 1250: 3.676ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=0 prop=2827 top1=1227 accp=0.373 next=pair draft=301 prop=301 pred gate=device Token # 1251: 114.455ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=0.715 next=draft=51259 prop=51259 olap pair=109.2ms serial=193.8ms gain=84.6ms ratio=0.44 s0=4.1ms s1=189.7ms wait=0.1/46.7ms pred gate=device Token # 1252: 3.735ms; value: next_token_ids=tensor([51259], device='cuda:0') mtp accept=1 prop=51259 top1=51259 accp=0.998 next=pair draft=24479 prop=24479 pred gate=device Token # 1253: 114.283ms; value: next_token_ids=tensor([24479], device='cuda:0') mtp accept=1 prop=24479 top1=24479 accp=1.000 next=draft=2693 prop=8689 olap pair=109.1ms serial=193.6ms gain=84.5ms ratio=0.44 s0=4.0ms s1=189.6ms wait=0.1/47.0ms pred gate=device Token # 1254: 3.704ms; value: next_token_ids=tensor([5302], device='cuda:0') mtp accept=0 prop=8689 top1=2693 accp=0.454 next=pair draft=2693 prop=2693 pred gate=device Token # 1255: 114.543ms; value: next_token_ids=tensor([1316], device='cuda:0') mtp accept=0 prop=2693 top1=1316 accp=0.000 next=draft=21620 prop=21620 olap pair=109.3ms serial=193.9ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.6ms wait=0.1/46.3ms pred gate=device Token # 1256: 114.911ms; value: next_token_ids=tensor([21620], device='cuda:0') mtp accept=1 prop=21620 top1=21620 accp=1.000 next=draft=31446 prop=31446 olap pair=109.6ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/46.5ms pred gate=device Token # 1257: 3.750ms; value: next_token_ids=tensor([2693], device='cuda:0') mtp accept=0 prop=31446 top1=2693 accp=0.372 next=pair draft=751 prop=751 pred gate=device Token # 1258: 115.273ms; value: next_token_ids=tensor([751], device='cuda:0') mtp accept=1 prop=751 top1=751 accp=1.000 next=draft=621 prop=621 olap pair=109.6ms serial=194.1ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.8ms wait=0.1/46.2ms pred gate=device Token # 1259: 3.795ms; value: next_token_ids=tensor([621], device='cuda:0') mtp accept=1 prop=621 top1=621 accp=0.742 next=pair draft=13097 prop=13097 pred gate=device Token # 1260: 116.009ms; value: next_token_ids=tensor([13097], device='cuda:0') mtp accept=1 prop=13097 top1=13097 accp=0.893 next=draft=6034 prop=6034 olap pair=110.7ms serial=195.3ms gain=84.6ms ratio=0.43 s0=4.1ms s1=191.2ms wait=0.1/47.4ms pred gate=device Token # 1261: 3.672ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=pair draft=201 prop=201 pred gate=device Token # 1262: 114.806ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=1.000 next=draft=15 prop=15 olap pair=109.7ms serial=194.7ms gain=85.1ms ratio=0.44 s0=3.6ms s1=191.1ms wait=0.1/47.8ms pred gate=device Token # 1263: 3.659ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=0.999 next=pair draft=2619 prop=2619 pred gate=device Token # 1264: 114.310ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=draft=13295 prop=16303 olap pair=109.2ms serial=193.8ms gain=84.6ms ratio=0.44 s0=3.6ms s1=190.2ms wait=0.1/47.7ms pred gate=device Token # 1265: 3.657ms; value: next_token_ids=tensor([16303], device='cuda:0') mtp accept=1 prop=16303 top1=16303 accp=0.472 next=pair draft=6508 prop=6508 pred gate=device Token # 1266: 114.489ms; value: next_token_ids=tensor([6508], device='cuda:0') mtp accept=1 prop=6508 top1=6508 accp=0.978 next=draft=666 prop=666 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=3.9ms s1=190.3ms wait=0.1/47.2ms pred gate=device Token # 1267: 3.641ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=pair draft=768 prop=768 pred gate=device Token # 1268: 115.158ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.992 next=draft=1833 prop=1833 olap pair=110.0ms serial=195.4ms gain=85.5ms ratio=0.44 s0=3.7ms s1=191.7ms wait=0.1/47.6ms pred gate=device Token # 1269: 3.763ms; value: next_token_ids=tensor([1833], device='cuda:0') mtp accept=1 prop=1833 top1=1833 accp=0.663 next=pair draft=2827 prop=2827 pred gate=device Token # 1270: 114.744ms; value: next_token_ids=tensor([2827], device='cuda:0') mtp accept=1 prop=2827 top1=2827 accp=1.000 next=draft=2386 prop=38102 olap pair=109.6ms serial=194.3ms gain=84.7ms ratio=0.44 s0=4.3ms s1=190.1ms wait=0.1/46.7ms pred gate=device Token # 1271: 3.673ms; value: next_token_ids=tensor([38102], device='cuda:0') mtp accept=1 prop=38102 top1=4339 accp=0.219 next=pair draft=5480 prop=5480 pred gate=device Token # 1272: 114.812ms; value: next_token_ids=tensor([5480], device='cuda:0') mtp accept=1 prop=5480 top1=5480 accp=0.922 next=draft=41 prop=8555 olap pair=109.7ms serial=194.0ms gain=84.3ms ratio=0.43 s0=3.8ms s1=190.2ms wait=0.1/47.6ms pred gate=device Token # 1273: 3.732ms; value: next_token_ids=tensor([102407], device='cuda:0') mtp accept=0 prop=8555 top1=102407 accp=0.068 next=pair draft=1316 prop=1316 pred gate=device Token # 1274: 115.099ms; value: next_token_ids=tensor([1316], device='cuda:0') mtp accept=1 prop=1316 top1=1316 accp=0.978 next=draft=5480 prop=5480 olap pair=109.9ms serial=195.2ms gain=85.3ms ratio=0.44 s0=3.6ms s1=191.6ms wait=0.1/47.7ms pred gate=device Token # 1275: 3.699ms; value: next_token_ids=tensor([5480], device='cuda:0') mtp accept=1 prop=5480 top1=5480 accp=0.995 next=pair draft=41 prop=41 pred gate=device Token # 1276: 114.672ms; value: next_token_ids=tensor([41], device='cuda:0') mtp accept=1 prop=41 top1=41 accp=1.000 next=draft=1929 prop=1929 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.2ms s1=190.0ms wait=0.1/46.7ms pred gate=device Token # 1277: 3.701ms; value: next_token_ids=tensor([1929], device='cuda:0') mtp accept=1 prop=1929 top1=1929 accp=1.000 next=pair draft=16303 prop=16303 pred gate=device Token # 1278: 114.630ms; value: next_token_ids=tensor([16303], device='cuda:0') mtp accept=1 prop=16303 top1=16303 accp=0.912 next=draft=201 prop=201 olap pair=109.5ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.2ms s1=190.1ms wait=0.1/46.7ms pred gate=device Token # 1279: 3.680ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=0.991 next=pair draft=15 prop=15 pred gate=device Token # 1280: 116.265ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=0.999 next=draft=2619 prop=2619 olap pair=111.0ms serial=196.3ms gain=85.2ms ratio=0.43 s0=5.1ms s1=191.2ms wait=0.2/46.0ms pred gate=device Token # 1281: 3.768ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.973 next=pair draft=14902 prop=31669 pred gate=device Token # 1282: 115.082ms; value: next_token_ids=tensor([14902], device='cuda:0') mtp accept=0 prop=31669 top1=14902 accp=0.762 next=draft=20602 prop=20602 olap pair=109.9ms serial=195.2ms gain=85.2ms ratio=0.44 s0=4.2ms s1=190.9ms wait=0.1/46.7ms pred gate=device Token # 1283: 114.948ms; value: next_token_ids=tensor([4354], device='cuda:0') mtp accept=0 prop=20602 top1=4354 accp=0.068 next=draft=666 prop=666 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=3.7ms s1=191.1ms wait=0.1/47.8ms pred gate=device Token # 1284: 114.697ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=7524 prop=768 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=3.6ms s1=190.8ms wait=0.1/47.9ms pred gate=device Token # 1285: 3.707ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=7524 accp=0.883 next=pair draft=87954 prop=87954 pred gate=device Token # 1286: 115.161ms; value: next_token_ids=tensor([87954], device='cuda:0') mtp accept=1 prop=87954 top1=87954 accp=1.000 next=draft=268 prop=268 olap pair=110.0ms serial=195.3ms gain=85.3ms ratio=0.44 s0=3.8ms s1=191.5ms wait=0.1/47.8ms pred gate=device Token # 1287: 3.775ms; value: next_token_ids=tensor([268], device='cuda:0') mtp accept=1 prop=268 top1=268 accp=1.000 next=pair draft=1867 prop=1867 pred gate=device Token # 1288: 114.841ms; value: next_token_ids=tensor([1867], device='cuda:0') mtp accept=1 prop=1867 top1=1867 accp=1.000 next=draft=8023 prop=8023 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=3.7ms s1=191.1ms wait=0.1/48.0ms pred gate=device Token # 1289: 3.768ms; value: next_token_ids=tensor([8023], device='cuda:0') mtp accept=1 prop=8023 top1=8023 accp=1.000 next=pair draft=47 prop=47 pred gate=device Token # 1290: 115.303ms; value: next_token_ids=tensor([47], device='cuda:0') mtp accept=1 prop=47 top1=47 accp=1.000 next=draft=301 prop=301 olap pair=109.6ms serial=193.4ms gain=83.8ms ratio=0.43 s0=7.6ms s1=185.8ms wait=0.2/43.5ms pred gate=device Token # 1291: 3.770ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=0.570 next=pair draft=2693 prop=2693 pred gate=device Token # 1292: 114.719ms; value: next_token_ids=tensor([2693], device='cuda:0') mtp accept=1 prop=2693 top1=66518 accp=0.120 next=draft=751 prop=751 olap pair=109.5ms serial=194.4ms gain=84.9ms ratio=0.44 s0=3.7ms s1=190.7ms wait=0.1/47.8ms pred gate=device Token # 1293: 3.728ms; value: next_token_ids=tensor([751], device='cuda:0') mtp accept=1 prop=751 top1=751 accp=1.000 next=pair draft=3606 prop=3606 pred gate=device Token # 1294: 113.924ms; value: next_token_ids=tensor([3606], device='cuda:0') mtp accept=1 prop=3606 top1=3606 accp=0.997 next=draft=201 prop=201 olap pair=108.8ms serial=193.1ms gain=84.4ms ratio=0.44 s0=3.7ms s1=189.4ms wait=0.1/47.9ms pred gate=device Token # 1295: 3.744ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=0.999 next=pair draft=15 prop=15 pred gate=device Token # 1296: 114.695ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=0.759 next=draft=2619 prop=2619 olap pair=109.5ms serial=194.1ms gain=84.6ms ratio=0.44 s0=3.8ms s1=190.3ms wait=0.1/47.6ms pred gate=device Token # 1297: 3.722ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=pair draft=36101 prop=9567 pred gate=device Token # 1298: 117.170ms; value: next_token_ids=tensor([18580], device='cuda:0') mtp accept=0 prop=9567 top1=18580 accp=0.655 next=draft=17520 prop=17520 olap pair=109.3ms serial=193.9ms gain=84.6ms ratio=0.44 s0=3.7ms s1=190.3ms wait=0.1/48.0ms pred gate=device Token # 1299: 114.766ms; value: next_token_ids=tensor([17520], device='cuda:0') mtp accept=1 prop=17520 top1=17520 accp=0.989 next=draft=666 prop=666 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=3.8ms s1=190.8ms wait=0.1/47.7ms pred gate=device Token # 1300: 3.674ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=pair draft=7524 prop=7524 pred gate=device Token # 1301: 114.538ms; value: next_token_ids=tensor([7524], device='cuda:0') mtp accept=1 prop=7524 top1=7524 accp=1.000 next=draft=223 prop=223 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=3.8ms s1=190.3ms wait=0.1/47.8ms pred gate=device Token # 1302: 3.663ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=565 prop=565 pred gate=device Token # 1303: 114.824ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=0.874 next=draft=223 prop=223 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.9ms wait=0.1/47.8ms pred gate=device Token # 1304: 3.706ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.869 next=pair draft=96489 prop=96489 pred gate=device Token # 1305: 114.332ms; value: next_token_ids=tensor([10447], device='cuda:0') mtp accept=0 prop=96489 top1=10447 accp=0.508 next=draft=8842 prop=1644 olap pair=109.2ms serial=193.8ms gain=84.6ms ratio=0.44 s0=3.7ms s1=190.1ms wait=0.1/47.9ms pred gate=device Token # 1306: 114.758ms; value: next_token_ids=tensor([45276], device='cuda:0') mtp accept=0 prop=1644 top1=8842 accp=0.819 next=draft=6034 prop=6034 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.9ms wait=0.1/48.0ms pred gate=device Token # 1307: 114.646ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=0.653 next=draft=26127 prop=26127 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=3.6ms s1=190.7ms wait=0.1/48.0ms pred gate=device Token # 1308: 3.680ms; value: next_token_ids=tensor([26127], device='cuda:0') mtp accept=1 prop=26127 top1=26127 accp=0.965 next=pair draft=10052 prop=10052 pred gate=device Token # 1309: 114.604ms; value: next_token_ids=tensor([10052], device='cuda:0') mtp accept=1 prop=10052 top1=10052 accp=0.993 next=draft=7953 prop=7953 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=3.6ms s1=190.7ms wait=0.1/47.9ms pred gate=device Token # 1310: 3.712ms; value: next_token_ids=tensor([5209], device='cuda:0') mtp accept=0 prop=7953 top1=40423 accp=0.259 next=pair draft=8842 prop=8842 pred gate=device Token # 1311: 114.382ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=1.000 next=draft=201 prop=201 olap pair=109.1ms serial=193.7ms gain=84.5ms ratio=0.44 s0=3.8ms s1=189.9ms wait=0.1/47.8ms pred gate=device Token # 1312: 3.720ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=0.994 next=pair draft=223 prop=223 pred gate=device Token # 1313: 114.855ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=565 prop=565 olap pair=109.6ms serial=194.7ms gain=85.0ms ratio=0.44 s0=3.9ms s1=190.8ms wait=0.1/47.6ms pred gate=device Token # 1314: 3.725ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1315: 115.645ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.740 next=draft=16303 prop=16303 olap pair=109.7ms serial=194.1ms gain=84.4ms ratio=0.43 s0=6.2ms s1=187.9ms wait=0.2/44.9ms pred gate=device Token # 1316: 4.548ms; value: next_token_ids=tensor([16303], device='cuda:0') mtp accept=1 prop=16303 top1=16303 accp=0.839 next=pair draft=100642 prop=100642 pred gate=device Token # 1317: 115.395ms; value: next_token_ids=tensor([100642], device='cuda:0') mtp accept=1 prop=100642 top1=100642 accp=0.808 next=draft=8725 prop=2787 olap pair=110.1ms serial=194.9ms gain=84.8ms ratio=0.44 s0=6.2ms s1=188.6ms wait=0.2/45.0ms pred gate=device Token # 1318: 3.697ms; value: next_token_ids=tensor([17030], device='cuda:0') mtp accept=0 prop=2787 top1=17030 accp=0.408 next=pair draft=303 prop=303 pred gate=device Token # 1319: 114.618ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.891 next=draft=1207 prop=56918 olap pair=109.4ms serial=194.4ms gain=84.9ms ratio=0.44 s0=3.7ms s1=190.7ms wait=0.1/48.1ms pred gate=device Token # 1320: 3.642ms; value: next_token_ids=tensor([10447], device='cuda:0') mtp accept=0 prop=56918 top1=10447 accp=0.117 next=pair draft=6034 prop=6034 pred gate=device Token # 1321: 115.662ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=0.987 next=draft=1052 prop=1052 olap pair=109.6ms serial=193.9ms gain=84.3ms ratio=0.43 s0=5.7ms s1=188.2ms wait=0.2/45.6ms pred gate=device Token # 1322: 4.656ms; value: next_token_ids=tensor([1052], device='cuda:0') mtp accept=1 prop=1052 top1=1052 accp=0.758 next=pair draft=15870 prop=15870 pred gate=device Token # 1323: 115.104ms; value: next_token_ids=tensor([15870], device='cuda:0') mtp accept=1 prop=15870 top1=15870 accp=0.858 next=draft=59539 prop=59539 olap pair=109.9ms serial=195.2ms gain=85.3ms ratio=0.44 s0=3.7ms s1=191.5ms wait=0.1/48.0ms pred gate=device Token # 1324: 3.710ms; value: next_token_ids=tensor([59539], device='cuda:0') mtp accept=1 prop=59539 top1=59539 accp=0.797 next=pair draft=1237 prop=1237 pred gate=device Token # 1325: 114.378ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=0.996 next=draft=883 prop=883 olap pair=109.2ms serial=193.9ms gain=84.7ms ratio=0.44 s0=3.7ms s1=190.2ms wait=0.1/48.1ms pred gate=device Token # 1326: 3.726ms; value: next_token_ids=tensor([883], device='cuda:0') mtp accept=1 prop=883 top1=883 accp=0.888 next=pair draft=76399 prop=76399 pred gate=device Token # 1327: 114.276ms; value: next_token_ids=tensor([76399], device='cuda:0') mtp accept=1 prop=76399 top1=76399 accp=1.000 next=draft=14668 prop=14668 olap pair=109.0ms serial=193.4ms gain=84.4ms ratio=0.44 s0=3.7ms s1=189.7ms wait=0.1/47.9ms pred gate=device Token # 1328: 3.730ms; value: next_token_ids=tensor([14668], device='cuda:0') mtp accept=1 prop=14668 top1=14668 accp=1.000 next=pair draft=6412 prop=6412 pred gate=device Token # 1329: 115.350ms; value: next_token_ids=tensor([6412], device='cuda:0') mtp accept=1 prop=6412 top1=6412 accp=0.992 next=draft=795 prop=795 olap pair=110.2ms serial=195.6ms gain=85.4ms ratio=0.44 s0=3.9ms s1=191.7ms wait=0.1/47.5ms pred gate=device Token # 1330: 3.712ms; value: next_token_ids=tensor([795], device='cuda:0') mtp accept=1 prop=795 top1=795 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1331: 114.813ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=21 prop=21 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=3.7ms s1=191.0ms wait=0.1/48.0ms pred gate=device Token # 1332: 3.713ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 1333: 114.599ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=2619 prop=2619 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=3.7ms s1=190.6ms wait=0.1/47.9ms pred gate=device Token # 1334: 3.740ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=pair draft=34408 prop=34408 pred gate=device Token # 1335: 114.628ms; value: next_token_ids=tensor([34408], device='cuda:0') mtp accept=1 prop=34408 top1=34408 accp=1.000 next=draft=1728 prop=1728 olap pair=109.5ms serial=194.4ms gain=84.9ms ratio=0.44 s0=3.7ms s1=190.7ms wait=0.1/47.9ms pred gate=device Token # 1336: 3.780ms; value: next_token_ids=tensor([1728], device='cuda:0') mtp accept=1 prop=1728 top1=1728 accp=1.000 next=pair draft=66518 prop=66518 pred gate=device Token # 1337: 115.389ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=draft=1237 prop=1237 olap pair=110.2ms serial=194.0ms gain=83.8ms ratio=0.43 s0=4.1ms s1=189.9ms wait=0.1/47.7ms pred gate=device Token # 1338: 3.771ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=122492 prop=122492 pred gate=device Token # 1339: 115.044ms; value: next_token_ids=tensor([122492], device='cuda:0') mtp accept=1 prop=122492 top1=122492 accp=1.000 next=draft=50294 prop=50294 olap pair=109.8ms serial=194.9ms gain=85.0ms ratio=0.44 s0=4.1ms s1=190.7ms wait=0.1/47.2ms pred gate=device Token # 1340: 3.731ms; value: next_token_ids=tensor([50294], device='cuda:0') mtp accept=1 prop=50294 top1=50294 accp=1.000 next=pair draft=1478 prop=1478 pred gate=device Token # 1341: 114.725ms; value: next_token_ids=tensor([1478], device='cuda:0') mtp accept=1 prop=1478 top1=1478 accp=1.000 next=draft=14 prop=14 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.2ms s1=190.3ms wait=0.1/46.8ms pred gate=device Token # 1342: 3.693ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=1.000 next=pair draft=32907 prop=32907 pred gate=device Token # 1343: 115.710ms; value: next_token_ids=tensor([32907], device='cuda:0') mtp accept=1 prop=32907 top1=32907 accp=1.000 next=draft=1227 prop=1227 olap pair=110.5ms serial=196.5ms gain=86.0ms ratio=0.44 s0=4.3ms s1=192.2ms wait=0.1/46.9ms pred gate=device Token # 1344: 3.690ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=1 prop=1227 top1=1227 accp=1.000 next=pair draft=5866 prop=5866 pred gate=device Token # 1345: 114.966ms; value: next_token_ids=tensor([5866], device='cuda:0') mtp accept=1 prop=5866 top1=5866 accp=1.000 next=draft=15 prop=15 olap pair=109.8ms serial=194.9ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.6ms wait=0.1/46.9ms pred gate=device Token # 1346: 3.660ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=pair draft=2619 prop=2619 pred gate=device Token # 1347: 114.774ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.889 next=draft=14087 prop=14087 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=3.8ms s1=190.9ms wait=0.1/47.9ms pred gate=device Token # 1348: 3.660ms; value: next_token_ids=tensor([14087], device='cuda:0') mtp accept=1 prop=14087 top1=14087 accp=1.000 next=pair draft=666 prop=666 pred gate=device Token # 1349: 114.594ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=768 prop=768 olap pair=109.4ms serial=194.4ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.7ms wait=0.1/48.0ms pred gate=device Token # 1350: 3.740ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=1275 prop=1275 pred gate=device Token # 1351: 114.373ms; value: next_token_ids=tensor([1275], device='cuda:0') mtp accept=1 prop=1275 top1=1275 accp=1.000 next=draft=8842 prop=8842 olap pair=109.1ms serial=193.8ms gain=84.7ms ratio=0.44 s0=3.6ms s1=190.2ms wait=0.1/48.0ms pred gate=device Token # 1352: 3.701ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=1.000 next=pair draft=2635 prop=2635 pred gate=device Token # 1353: 114.709ms; value: next_token_ids=tensor([2635], device='cuda:0') mtp accept=1 prop=2635 top1=2635 accp=0.995 next=draft=2827 prop=2827 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=3.6ms s1=190.9ms wait=0.1/47.9ms pred gate=device Token # 1354: 3.701ms; value: next_token_ids=tensor([2827], device='cuda:0') mtp accept=1 prop=2827 top1=2827 accp=1.000 next=pair draft=2693 prop=2693 pred gate=device Token # 1355: 114.465ms; value: next_token_ids=tensor([2693], device='cuda:0') mtp accept=1 prop=2693 top1=2693 accp=0.613 next=draft=751 prop=751 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=3.6ms s1=190.5ms wait=0.1/47.9ms pred gate=device Token # 1356: 3.692ms; value: next_token_ids=tensor([751], device='cuda:0') mtp accept=1 prop=751 top1=751 accp=0.984 next=pair draft=621 prop=621 pred gate=device Token # 1357: 115.008ms; value: next_token_ids=tensor([621], device='cuda:0') mtp accept=1 prop=621 top1=621 accp=1.000 next=draft=3007 prop=3007 olap pair=109.8ms serial=195.1ms gain=85.3ms ratio=0.44 s0=3.6ms s1=191.5ms wait=0.1/48.0ms pred gate=device Token # 1358: 3.752ms; value: next_token_ids=tensor([13097], device='cuda:0') mtp accept=0 prop=3007 top1=13097 accp=0.350 next=pair draft=6034 prop=6034 pred gate=device Token # 1359: 114.859ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=draft=303 prop=303 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=3.6ms s1=191.0ms wait=0.1/48.0ms pred gate=device Token # 1360: 3.751ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=4889 prop=4889 pred gate=device Token # 1361: 115.327ms; value: next_token_ids=tensor([4889], device='cuda:0') mtp accept=1 prop=4889 top1=4889 accp=0.981 next=draft=34408 prop=34408 olap pair=110.1ms serial=195.2ms gain=85.1ms ratio=0.44 s0=4.4ms s1=190.9ms wait=0.1/46.7ms pred gate=device Token # 1362: 3.804ms; value: next_token_ids=tensor([34408], device='cuda:0') mtp accept=1 prop=34408 top1=34408 accp=0.914 next=pair draft=1728 prop=1728 pred gate=device Token # 1363: 114.822ms; value: next_token_ids=tensor([1728], device='cuda:0') mtp accept=1 prop=1728 top1=1728 accp=1.000 next=draft=201 prop=201 olap pair=109.6ms serial=194.6ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.3ms wait=0.1/47.0ms pred gate=device Token # 1364: 3.767ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=0.990 next=pair draft=15 prop=15 pred gate=device Token # 1365: 114.622ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=2619 prop=2619 olap pair=109.5ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.3ms s1=190.0ms wait=0.1/47.1ms pred gate=device Token # 1366: 3.724ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.896 next=pair draft=16303 prop=16303 pred gate=device Token # 1367: 114.908ms; value: next_token_ids=tensor([16303], device='cuda:0') mtp accept=1 prop=16303 top1=16303 accp=0.999 next=draft=6508 prop=6508 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.6ms wait=0.1/46.8ms pred gate=device Token # 1368: 3.752ms; value: next_token_ids=tensor([6508], device='cuda:0') mtp accept=1 prop=6508 top1=6508 accp=1.000 next=pair draft=666 prop=666 pred gate=device Token # 1369: 114.777ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=768 prop=768 olap pair=109.5ms serial=194.6ms gain=85.1ms ratio=0.44 s0=3.9ms s1=190.7ms wait=0.1/47.7ms pred gate=device Token # 1370: 3.727ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=42338 prop=985 pred gate=device Token # 1371: 114.972ms; value: next_token_ids=tensor([42338], device='cuda:0') mtp accept=0 prop=985 top1=42338 accp=0.984 next=draft=6034 prop=6034 olap pair=109.8ms serial=195.1ms gain=85.3ms ratio=0.44 s0=3.7ms s1=191.4ms wait=0.1/48.1ms pred gate=device Token # 1372: 115.115ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=0.977 next=draft=1052 prop=1052 olap pair=109.8ms serial=195.2ms gain=85.4ms ratio=0.44 s0=3.6ms s1=191.6ms wait=0.1/48.0ms pred gate=device Token # 1373: 3.676ms; value: next_token_ids=tensor([1052], device='cuda:0') mtp accept=1 prop=1052 top1=1052 accp=1.000 next=pair draft=18100 prop=18100 pred gate=device Token # 1374: 114.688ms; value: next_token_ids=tensor([18100], device='cuda:0') mtp accept=1 prop=18100 top1=18100 accp=0.742 next=draft=34864 prop=34864 olap pair=109.5ms serial=194.6ms gain=85.1ms ratio=0.44 s0=3.6ms s1=191.0ms wait=0.1/48.1ms pred gate=device Token # 1375: 3.704ms; value: next_token_ids=tensor([34864], device='cuda:0') mtp accept=1 prop=34864 top1=34864 accp=0.949 next=pair draft=2022 prop=2022 pred gate=device Token # 1376: 114.623ms; value: next_token_ids=tensor([2022], device='cuda:0') mtp accept=1 prop=2022 top1=2022 accp=0.778 next=draft=1237 prop=1237 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=3.7ms s1=190.4ms wait=0.1/47.8ms pred gate=device Token # 1377: 3.728ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=0.861 next=pair draft=68134 prop=974 pred gate=device Token # 1378: 115.251ms; value: next_token_ids=tensor([974], device='cuda:0') mtp accept=1 prop=974 top1=974 accp=0.285 next=draft=1427 prop=1427 olap pair=110.0ms serial=195.6ms gain=85.5ms ratio=0.44 s0=3.8ms s1=191.8ms wait=0.1/48.0ms pred gate=device Token # 1379: 3.705ms; value: next_token_ids=tensor([1427], device='cuda:0') mtp accept=1 prop=1427 top1=1427 accp=1.000 next=pair draft=1227 prop=1227 pred gate=device Token # 1380: 114.641ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=1 prop=1227 top1=1227 accp=1.000 next=draft=548 prop=548 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=3.7ms s1=190.7ms wait=0.1/48.0ms pred gate=device Token # 1381: 3.761ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=1 prop=548 top1=548 accp=1.000 next=pair draft=59563 prop=59563 pred gate=device Token # 1382: 114.673ms; value: next_token_ids=tensor([59563], device='cuda:0') mtp accept=1 prop=59563 top1=59563 accp=1.000 next=draft=1237 prop=1237 olap pair=109.5ms serial=194.5ms gain=85.1ms ratio=0.44 s0=3.6ms s1=190.9ms wait=0.1/48.1ms pred gate=device Token # 1383: 3.726ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=57168 prop=57168 pred gate=device Token # 1384: 114.867ms; value: next_token_ids=tensor([57168], device='cuda:0') mtp accept=1 prop=57168 top1=57168 accp=0.989 next=draft=7807 prop=7807 olap pair=109.7ms serial=194.8ms gain=85.2ms ratio=0.44 s0=3.6ms s1=191.2ms wait=0.1/48.2ms pred gate=device Token # 1385: 3.730ms; value: next_token_ids=tensor([7807], device='cuda:0') mtp accept=1 prop=7807 top1=7807 accp=1.000 next=pair draft=15 prop=15 pred gate=device Token # 1386: 115.258ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=2619 prop=2619 olap pair=110.1ms serial=195.7ms gain=85.6ms ratio=0.44 s0=3.6ms s1=192.1ms wait=0.1/48.2ms pred gate=device Token # 1387: 3.684ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.918 next=pair draft=36101 prop=36101 pred gate=device Token # 1388: 114.117ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=0.976 next=draft=8035 prop=8035 olap pair=109.0ms serial=193.5ms gain=84.5ms ratio=0.44 s0=3.7ms s1=189.9ms wait=0.1/48.2ms pred gate=device Token # 1389: 3.708ms; value: next_token_ids=tensor([8035], device='cuda:0') mtp accept=1 prop=8035 top1=15064 accp=0.492 next=pair draft=666 prop=666 pred gate=device Token # 1390: 114.850ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=7524 prop=7524 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=3.6ms s1=191.3ms wait=0.1/48.2ms pred gate=device Token # 1391: 3.721ms; value: next_token_ids=tensor([7524], device='cuda:0') mtp accept=1 prop=7524 top1=7524 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1392: 114.679ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.999 next=draft=565 prop=565 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=3.6ms s1=190.9ms wait=0.1/48.1ms pred gate=device Token # 1393: 3.758ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1394: 114.402ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.979 next=draft=2386 prop=2386 olap pair=109.2ms serial=193.6ms gain=84.5ms ratio=0.44 s0=4.2ms s1=189.4ms wait=0.1/47.0ms pred gate=device Token # 1395: 3.724ms; value: next_token_ids=tensor([2386], device='cuda:0') mtp accept=1 prop=2386 top1=2386 accp=0.999 next=pair draft=38186 prop=75777 pred gate=device Token # 1396: 115.473ms; value: next_token_ids=tensor([13097], device='cuda:0') mtp accept=0 prop=75777 top1=13097 accp=0.015 next=draft=56560 prop=56560 olap pair=110.3ms serial=195.4ms gain=85.1ms ratio=0.44 s0=4.4ms s1=191.1ms wait=0.1/47.2ms pred gate=device Token # 1397: 115.069ms; value: next_token_ids=tensor([25024], device='cuda:0') mtp accept=0 prop=56560 top1=25024 accp=0.116 next=draft=6206 prop=6206 olap pair=109.8ms serial=195.0ms gain=85.1ms ratio=0.44 s0=3.7ms s1=191.2ms wait=0.1/48.0ms pred gate=device Token # 1398: 114.380ms; value: next_token_ids=tensor([38186], device='cuda:0') mtp accept=0 prop=6206 top1=38186 accp=0.259 next=draft=34408 prop=34408 olap pair=109.1ms serial=193.6ms gain=84.5ms ratio=0.44 s0=4.1ms s1=189.5ms wait=0.1/47.3ms pred gate=device Token # 1399: 115.092ms; value: next_token_ids=tensor([34408], device='cuda:0') mtp accept=1 prop=34408 top1=34408 accp=1.000 next=draft=1728 prop=1728 olap pair=109.8ms serial=194.9ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.6ms wait=0.1/47.0ms pred gate=device Token # 1400: 3.709ms; value: next_token_ids=tensor([1728], device='cuda:0') mtp accept=1 prop=1728 top1=1728 accp=1.000 next=pair draft=6206 prop=6206 pred gate=device Token # 1401: 114.887ms; value: next_token_ids=tensor([6206], device='cuda:0') mtp accept=1 prop=6206 top1=6206 accp=0.969 next=draft=6772 prop=6772 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=3.8ms s1=191.2ms wait=0.1/48.0ms pred gate=device Token # 1402: 3.676ms; value: next_token_ids=tensor([6772], device='cuda:0') mtp accept=1 prop=6772 top1=6772 accp=0.834 next=pair draft=937 prop=937 pred gate=device Token # 1403: 114.398ms; value: next_token_ids=tensor([937], device='cuda:0') mtp accept=1 prop=937 top1=937 accp=0.927 next=draft=63239 prop=63239 olap pair=109.3ms serial=193.9ms gain=84.7ms ratio=0.44 s0=3.8ms s1=190.2ms wait=0.1/47.9ms pred gate=device Token # 1404: 3.712ms; value: next_token_ids=tensor([63239], device='cuda:0') mtp accept=1 prop=63239 top1=63239 accp=0.997 next=pair draft=201 prop=201 pred gate=device Token # 1405: 114.919ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=0.997 next=draft=223 prop=223 olap pair=109.7ms serial=194.8ms gain=85.0ms ratio=0.44 s0=4.6ms s1=190.1ms wait=0.1/46.6ms pred gate=device Token # 1406: 3.752ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=565 prop=565 pred gate=device Token # 1407: 115.552ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=1.000 next=draft=2619 prop=2619 olap pair=110.3ms serial=195.3ms gain=85.0ms ratio=0.44 s0=4.5ms s1=190.8ms wait=0.1/46.9ms pred gate=device Token # 1408: 3.694ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.797 next=pair draft=2382 prop=2382 pred gate=device Token # 1409: 115.233ms; value: next_token_ids=tensor([2382], device='cuda:0') mtp accept=1 prop=2382 top1=2382 accp=0.999 next=draft=92 prop=92 olap pair=110.1ms serial=193.2ms gain=83.1ms ratio=0.43 s0=4.5ms s1=188.7ms wait=0.1/47.3ms pred gate=device Token # 1410: 3.720ms; value: next_token_ids=tensor([92], device='cuda:0') mtp accept=1 prop=92 top1=92 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1411: 115.491ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=19 prop=19 olap pair=110.3ms serial=194.0ms gain=83.7ms ratio=0.43 s0=4.1ms s1=189.9ms wait=0.1/47.9ms pred gate=device Token # 1412: 3.699ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=625 prop=625 pred gate=device Token # 1413: 115.863ms; value: next_token_ids=tensor([625], device='cuda:0') mtp accept=1 prop=625 top1=625 accp=0.997 next=draft=666 prop=666 olap pair=109.9ms serial=193.6ms gain=83.7ms ratio=0.43 s0=5.8ms s1=187.8ms wait=0.2/45.8ms pred gate=device Token # 1414: 4.565ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=0.999 next=pair draft=768 prop=768 pred gate=device Token # 1415: 114.766ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=34408 prop=34408 olap pair=109.4ms serial=193.6ms gain=84.2ms ratio=0.44 s0=6.2ms s1=187.4ms wait=0.2/45.1ms pred gate=device Token # 1416: 3.693ms; value: next_token_ids=tensor([34408], device='cuda:0') mtp accept=1 prop=34408 top1=34408 accp=0.982 next=pair draft=1728 prop=1728 pred gate=device Token # 1417: 115.161ms; value: next_token_ids=tensor([1728], device='cuda:0') mtp accept=1 prop=1728 top1=1728 accp=1.000 next=draft=6525 prop=6525 olap pair=110.0ms serial=194.4ms gain=84.4ms ratio=0.43 s0=4.0ms s1=190.4ms wait=0.1/47.9ms pred gate=device Token # 1418: 3.716ms; value: next_token_ids=tensor([10730], device='cuda:0') mtp accept=0 prop=6525 top1=6525 accp=0.541 next=pair draft=2577 prop=2577 pred gate=device Token # 1419: 115.541ms; value: next_token_ids=tensor([2577], device='cuda:0') mtp accept=1 prop=2577 top1=2577 accp=0.587 next=draft=2467 prop=2467 olap pair=110.2ms serial=194.7ms gain=84.5ms ratio=0.43 s0=4.3ms s1=190.4ms wait=0.1/47.3ms pred gate=device Token # 1420: 3.709ms; value: next_token_ids=tensor([2467], device='cuda:0') mtp accept=1 prop=2467 top1=2467 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 1421: 114.631ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.874 next=draft=4162 prop=4162 olap pair=109.5ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.3ms s1=190.0ms wait=0.1/47.0ms pred gate=device Token # 1422: 3.720ms; value: next_token_ids=tensor([4162], device='cuda:0') mtp accept=1 prop=4162 top1=13529 accp=0.423 next=pair draft=11925 prop=11925 pred gate=device Token # 1423: 114.738ms; value: next_token_ids=tensor([11925], device='cuda:0') mtp accept=1 prop=11925 top1=11925 accp=0.899 next=draft=4 prop=4 olap pair=109.5ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.2ms s1=190.2ms wait=0.1/47.1ms pred gate=device Token # 1424: 3.720ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=0 prop=4 top1=75777 accp=0.247 next=pair draft=95427 prop=95427 pred gate=device Token # 1425: 114.833ms; value: next_token_ids=tensor([95427], device='cuda:0') mtp accept=1 prop=95427 top1=95427 accp=0.973 next=draft=271 prop=271 olap pair=109.6ms serial=194.5ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/47.0ms pred gate=device Token # 1426: 3.775ms; value: next_token_ids=tensor([2130], device='cuda:0') mtp accept=0 prop=271 top1=2130 accp=0.022 next=pair draft=271 prop=271 pred gate=device Token # 1427: 114.921ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=0 prop=271 top1=1237 accp=0.464 next=draft=75777 prop=68 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.2ms s1=190.6ms wait=0.1/47.3ms pred gate=device Token # 1428: 115.266ms; value: next_token_ids=tensor([68], device='cuda:0') mtp accept=1 prop=68 top1=75777 accp=0.875 next=draft=19474 prop=19474 olap pair=110.0ms serial=195.3ms gain=85.4ms ratio=0.44 s0=4.1ms s1=191.2ms wait=0.1/47.3ms pred gate=device Token # 1429: 3.704ms; value: next_token_ids=tensor([19474], device='cuda:0') mtp accept=1 prop=19474 top1=19474 accp=1.000 next=pair draft=6412 prop=6412 pred gate=device Token # 1430: 115.434ms; value: next_token_ids=tensor([6412], device='cuda:0') mtp accept=1 prop=6412 top1=6412 accp=0.942 next=draft=795 prop=795 olap pair=110.3ms serial=196.0ms gain=85.7ms ratio=0.44 s0=4.3ms s1=191.7ms wait=0.1/46.9ms pred gate=device Token # 1431: 3.719ms; value: next_token_ids=tensor([795], device='cuda:0') mtp accept=1 prop=795 top1=795 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1432: 114.754ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=22 prop=22 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.3ms wait=0.1/47.0ms pred gate=device Token # 1433: 3.767ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 1434: 114.932ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=2619 prop=2619 olap pair=109.8ms serial=194.8ms gain=85.0ms ratio=0.44 s0=4.2ms s1=190.6ms wait=0.1/47.1ms pred gate=device Token # 1435: 3.707ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=pair draft=10756 prop=10756 pred gate=device Token # 1436: 115.044ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=1 prop=10756 top1=10756 accp=1.000 next=draft=66518 prop=66518 olap pair=109.8ms serial=194.1ms gain=84.3ms ratio=0.43 s0=4.2ms s1=189.9ms wait=0.1/47.4ms pred gate=device Token # 1437: 3.694ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=pair draft=1237 prop=1237 pred gate=device Token # 1438: 114.668ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=draft=45324 prop=45324 olap pair=109.5ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.1ms wait=0.1/47.2ms pred gate=device Token # 1439: 3.778ms; value: next_token_ids=tensor([45324], device='cuda:0') mtp accept=1 prop=45324 top1=45324 accp=1.000 next=pair draft=50294 prop=50294 pred gate=device Token # 1440: 115.295ms; value: next_token_ids=tensor([50294], device='cuda:0') mtp accept=1 prop=50294 top1=50294 accp=1.000 next=draft=1478 prop=1478 olap pair=110.1ms serial=195.2ms gain=85.1ms ratio=0.44 s0=4.4ms s1=190.8ms wait=0.1/47.0ms pred gate=device Token # 1441: 3.729ms; value: next_token_ids=tensor([1478], device='cuda:0') mtp accept=1 prop=1478 top1=1478 accp=1.000 next=pair draft=14 prop=14 pred gate=device Token # 1442: 115.027ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=0.999 next=draft=21425 prop=21425 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.5ms wait=0.1/47.2ms pred gate=device Token # 1443: 3.684ms; value: next_token_ids=tensor([21425], device='cuda:0') mtp accept=1 prop=21425 top1=21425 accp=1.000 next=pair draft=1227 prop=1227 pred gate=device Token # 1444: 114.794ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=1 prop=1227 top1=1227 accp=1.000 next=draft=5866 prop=5866 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=4.0ms s1=190.6ms wait=0.1/47.9ms pred gate=device Token # 1445: 3.724ms; value: next_token_ids=tensor([5866], device='cuda:0') mtp accept=1 prop=5866 top1=5866 accp=1.000 next=pair draft=15 prop=15 pred gate=device Token # 1446: 114.434ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=2619 prop=2619 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=3.7ms s1=190.4ms wait=0.1/48.5ms pred gate=device Token # 1447: 3.732ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.773 next=pair draft=14087 prop=14087 pred gate=device Token # 1448: 116.791ms; value: next_token_ids=tensor([14087], device='cuda:0') mtp accept=1 prop=14087 top1=14087 accp=1.000 next=draft=666 prop=666 olap pair=111.6ms serial=196.4ms gain=84.8ms ratio=0.43 s0=4.3ms s1=192.1ms wait=0.1/47.7ms pred gate=device Token # 1449: 3.716ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=pair draft=768 prop=768 pred gate=device Token # 1450: 114.396ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=15091 prop=15091 olap pair=109.2ms serial=193.8ms gain=84.6ms ratio=0.44 s0=3.7ms s1=190.0ms wait=0.1/48.5ms pred gate=device Token # 1451: 3.710ms; value: next_token_ids=tensor([2216], device='cuda:0') mtp accept=0 prop=15091 top1=2216 accp=0.390 next=pair draft=545 prop=545 pred gate=device Token # 1452: 114.397ms; value: next_token_ids=tensor([545], device='cuda:0') mtp accept=1 prop=545 top1=545 accp=0.998 next=draft=28608 prop=28608 olap pair=109.1ms serial=193.6ms gain=84.5ms ratio=0.44 s0=3.7ms s1=189.9ms wait=0.1/48.5ms pred gate=device Token # 1453: 3.824ms; value: next_token_ids=tensor([28608], device='cuda:0') mtp accept=1 prop=28608 top1=28608 accp=0.998 next=pair draft=39 prop=39 pred gate=device Token # 1454: 114.778ms; value: next_token_ids=tensor([39], device='cuda:0') mtp accept=1 prop=39 top1=39 accp=1.000 next=draft=1237 prop=1237 olap pair=109.6ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.1ms wait=0.1/47.2ms pred gate=device Token # 1455: 3.736ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=48076 prop=48076 pred gate=device Token # 1456: 115.553ms; value: next_token_ids=tensor([48076], device='cuda:0') mtp accept=1 prop=48076 top1=48076 accp=0.998 next=draft=10672 prop=10672 olap pair=110.2ms serial=195.7ms gain=85.5ms ratio=0.44 s0=4.3ms s1=191.4ms wait=0.1/47.2ms pred gate=device Token # 1457: 3.742ms; value: next_token_ids=tensor([10672], device='cuda:0') mtp accept=1 prop=10672 top1=10672 accp=1.000 next=pair draft=294 prop=294 pred gate=device Token # 1458: 115.066ms; value: next_token_ids=tensor([294], device='cuda:0') mtp accept=1 prop=294 top1=294 accp=0.993 next=draft=58124 prop=58124 olap pair=109.8ms serial=194.8ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.5ms wait=0.1/47.1ms pred gate=device Token # 1459: 3.725ms; value: next_token_ids=tensor([58124], device='cuda:0') mtp accept=1 prop=58124 top1=58124 accp=1.000 next=pair draft=1227 prop=1227 pred gate=device Token # 1460: 114.991ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=1 prop=1227 top1=1227 accp=1.000 next=draft=8842 prop=8842 olap pair=109.8ms serial=194.9ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.6ms wait=0.1/47.1ms pred gate=device Token # 1461: 3.739ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=0.988 next=pair draft=3774 prop=3774 pred gate=device Token # 1462: 117.373ms; value: next_token_ids=tensor([3774], device='cuda:0') mtp accept=1 prop=3774 top1=3774 accp=1.000 next=draft=303 prop=303 olap pair=109.7ms serial=194.8ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.5ms wait=0.1/47.2ms pred gate=device Token # 1463: 3.721ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.997 next=pair draft=1275 prop=1275 pred gate=device Token # 1464: 114.542ms; value: next_token_ids=tensor([1275], device='cuda:0') mtp accept=1 prop=1275 top1=1275 accp=0.937 next=draft=3007 prop=3007 olap pair=109.3ms serial=194.0ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.8ms wait=0.1/47.2ms pred gate=device Token # 1465: 3.760ms; value: next_token_ids=tensor([3007], device='cuda:0') mtp accept=1 prop=3007 top1=3007 accp=0.996 next=pair draft=10756 prop=10756 pred gate=device Token # 1466: 114.813ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=1 prop=10756 top1=10756 accp=1.000 next=draft=119545 prop=10780 olap pair=109.6ms serial=194.5ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/47.1ms pred gate=device Token # 1467: 3.745ms; value: next_token_ids=tensor([10780], device='cuda:0') mtp accept=1 prop=10780 top1=119545 accp=0.713 next=pair draft=621 prop=621 pred gate=device Token # 1468: 115.096ms; value: next_token_ids=tensor([621], device='cuda:0') mtp accept=1 prop=621 top1=621 accp=0.998 next=draft=3007 prop=3007 olap pair=109.8ms serial=195.0ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.8ms wait=0.1/47.2ms pred gate=device Token # 1469: 3.771ms; value: next_token_ids=tensor([3007], device='cuda:0') mtp accept=1 prop=3007 top1=3007 accp=1.000 next=pair draft=6034 prop=6034 pred gate=device Token # 1470: 114.963ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=1.000 next=draft=201 prop=201 olap pair=109.8ms serial=195.0ms gain=85.2ms ratio=0.44 s0=4.2ms s1=190.8ms wait=0.1/47.5ms pred gate=device Token # 1471: 3.820ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=1.000 next=pair draft=15 prop=15 pred gate=device Token # 1472: 115.203ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=2619 prop=201 olap pair=110.0ms serial=195.7ms gain=85.6ms ratio=0.44 s0=3.8ms s1=191.9ms wait=0.1/48.4ms pred gate=device Token # 1473: 3.716ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=0 prop=201 top1=2619 accp=0.266 next=pair draft=16303 prop=16303 pred gate=device Token # 1474: 115.652ms; value: next_token_ids=tensor([16303], device='cuda:0') mtp accept=1 prop=16303 top1=16303 accp=1.000 next=draft=6508 prop=6508 olap pair=110.4ms serial=196.4ms gain=86.0ms ratio=0.44 s0=3.8ms s1=192.7ms wait=0.1/48.5ms pred gate=device Token # 1475: 3.675ms; value: next_token_ids=tensor([6508], device='cuda:0') mtp accept=1 prop=6508 top1=6508 accp=1.000 next=pair draft=666 prop=666 pred gate=device Token # 1476: 115.462ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=768 prop=768 olap pair=110.2ms serial=195.6ms gain=85.4ms ratio=0.44 s0=3.8ms s1=191.8ms wait=0.1/48.3ms pred gate=device Token # 1477: 3.669ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=3655 prop=2386 pred gate=device Token # 1478: 114.788ms; value: next_token_ids=tensor([2386], device='cuda:0') mtp accept=1 prop=2386 top1=2386 accp=0.169 next=draft=5480 prop=5480 olap pair=109.7ms serial=194.7ms gain=85.1ms ratio=0.44 s0=4.1ms s1=190.6ms wait=0.1/47.8ms pred gate=device Token # 1479: 3.712ms; value: next_token_ids=tensor([5480], device='cuda:0') mtp accept=1 prop=5480 top1=37209 accp=0.426 next=pair draft=6005 prop=6005 pred gate=device Token # 1480: 114.546ms; value: next_token_ids=tensor([6005], device='cuda:0') mtp accept=1 prop=6005 top1=6005 accp=0.995 next=draft=15 prop=15 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=3.7ms s1=190.4ms wait=0.1/48.5ms pred gate=device Token # 1481: 3.751ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=pair draft=5480 prop=5480 pred gate=device Token # 1482: 115.208ms; value: next_token_ids=tensor([5480], device='cuda:0') mtp accept=1 prop=5480 top1=5480 accp=1.000 next=draft=16303 prop=16303 olap pair=110.0ms serial=195.4ms gain=85.4ms ratio=0.44 s0=3.7ms s1=191.7ms wait=0.1/48.4ms pred gate=device Token # 1483: 3.689ms; value: next_token_ids=tensor([16303], device='cuda:0') mtp accept=1 prop=16303 top1=16303 accp=1.000 next=pair draft=3461 prop=3461 pred gate=device Token # 1484: 114.795ms; value: next_token_ids=tensor([1824], device='cuda:0') mtp accept=0 prop=3461 top1=1824 accp=0.046 next=draft=37209 prop=37209 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=3.9ms s1=190.8ms wait=0.1/48.1ms pred gate=device Token # 1485: 115.624ms; value: next_token_ids=tensor([37209], device='cuda:0') mtp accept=1 prop=37209 top1=10756 accp=0.213 next=draft=3461 prop=3461 olap pair=110.4ms serial=196.1ms gain=85.7ms ratio=0.44 s0=4.3ms s1=191.7ms wait=0.1/47.0ms pred gate=device Token # 1486: 3.748ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=0 prop=3461 top1=201 accp=0.127 next=pair draft=15 prop=15 pred gate=device Token # 1487: 114.671ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=2619 prop=2619 olap pair=109.5ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.9ms wait=0.1/47.3ms pred gate=device Token # 1488: 3.720ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.770 next=pair draft=18580 prop=18580 pred gate=device Token # 1489: 114.579ms; value: next_token_ids=tensor([18580], device='cuda:0') mtp accept=1 prop=18580 top1=18580 accp=1.000 next=draft=17520 prop=17520 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.0ms wait=0.1/47.3ms pred gate=device Token # 1490: 3.702ms; value: next_token_ids=tensor([17520], device='cuda:0') mtp accept=1 prop=17520 top1=17520 accp=0.886 next=pair draft=666 prop=666 pred gate=device Token # 1491: 114.801ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=7524 prop=7524 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=4.2ms s1=190.5ms wait=0.1/47.2ms pred gate=device Token # 1492: 3.726ms; value: next_token_ids=tensor([7524], device='cuda:0') mtp accept=1 prop=7524 top1=7524 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1493: 115.051ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=565 prop=565 olap pair=109.9ms serial=195.2ms gain=85.3ms ratio=0.44 s0=4.3ms s1=190.8ms wait=0.1/47.3ms pred gate=device Token # 1494: 3.708ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1495: 114.571ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.998 next=draft=3440 prop=3440 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.9ms wait=0.1/47.2ms pred gate=device Token # 1496: 3.699ms; value: next_token_ids=tensor([3440], device='cuda:0') mtp accept=1 prop=3440 top1=3440 accp=0.999 next=pair draft=20668 prop=20668 pred gate=device Token # 1497: 114.982ms; value: next_token_ids=tensor([20668], device='cuda:0') mtp accept=1 prop=20668 top1=20668 accp=0.995 next=draft=28608 prop=28608 olap pair=109.8ms serial=194.9ms gain=85.1ms ratio=0.44 s0=4.2ms s1=190.7ms wait=0.1/47.4ms pred gate=device Token # 1498: 3.750ms; value: next_token_ids=tensor([28608], device='cuda:0') mtp accept=1 prop=28608 top1=28608 accp=1.000 next=pair draft=39 prop=39 pred gate=device Token # 1499: 114.333ms; value: next_token_ids=tensor([39], device='cuda:0') mtp accept=1 prop=39 top1=39 accp=1.000 next=draft=35991 prop=35991 olap pair=109.1ms serial=193.5ms gain=84.5ms ratio=0.44 s0=4.1ms s1=189.4ms wait=0.1/47.4ms pred gate=device Token # 1500: 3.709ms; value: next_token_ids=tensor([35991], device='cuda:0') mtp accept=1 prop=35991 top1=35991 accp=1.000 next=pair draft=301 prop=1237 pred gate=device Token # 1501: 114.637ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=0 prop=1237 top1=8842 accp=0.064 next=draft=1237 prop=1237 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=3.9ms s1=190.3ms wait=0.1/48.4ms pred gate=device Token # 1502: 115.491ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=0.998 next=draft=883 prop=883 olap pair=110.2ms serial=195.8ms gain=85.6ms ratio=0.44 s0=3.8ms s1=192.0ms wait=0.1/48.4ms pred gate=device Token # 1503: 3.665ms; value: next_token_ids=tensor([883], device='cuda:0') mtp accept=1 prop=883 top1=883 accp=1.000 next=pair draft=48076 prop=48076 pred gate=device Token # 1504: 114.977ms; value: next_token_ids=tensor([57922], device='cuda:0') mtp accept=0 prop=48076 top1=57922 accp=0.448 next=draft=105668 prop=105668 olap pair=109.8ms serial=195.0ms gain=85.2ms ratio=0.44 s0=4.2ms s1=190.8ms wait=0.1/47.5ms pred gate=device Token # 1505: 114.951ms; value: next_token_ids=tensor([105668], device='cuda:0') mtp accept=1 prop=105668 top1=105668 accp=0.900 next=draft=410 prop=410 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=3.7ms s1=191.1ms wait=0.1/48.6ms pred gate=device Token # 1506: 3.656ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=0.937 next=pair draft=48076 prop=48076 pred gate=device Token # 1507: 114.971ms; value: next_token_ids=tensor([48076], device='cuda:0') mtp accept=1 prop=48076 top1=48076 accp=0.991 next=draft=829 prop=829 olap pair=109.8ms serial=195.1ms gain=85.3ms ratio=0.44 s0=3.7ms s1=191.3ms wait=0.1/48.4ms pred gate=device Token # 1508: 3.733ms; value: next_token_ids=tensor([829], device='cuda:0') mtp accept=1 prop=829 top1=829 accp=1.000 next=pair draft=1985 prop=1985 pred gate=device Token # 1509: 114.660ms; value: next_token_ids=tensor([1985], device='cuda:0') mtp accept=1 prop=1985 top1=1985 accp=1.000 next=draft=7807 prop=7807 olap pair=109.5ms serial=194.4ms gain=84.9ms ratio=0.44 s0=3.8ms s1=190.6ms wait=0.1/48.4ms pred gate=device Token # 1510: 3.754ms; value: next_token_ids=tensor([7807], device='cuda:0') mtp accept=1 prop=7807 top1=7807 accp=0.759 next=pair draft=223 prop=223 pred gate=device Token # 1511: 115.010ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=565 prop=565 olap pair=109.8ms serial=195.1ms gain=85.3ms ratio=0.44 s0=3.7ms s1=191.4ms wait=0.1/48.7ms pred gate=device Token # 1512: 3.839ms; value: next_token_ids=tensor([565], device='cuda:0') mtp accept=1 prop=565 top1=565 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1513: 114.557ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.906 next=draft=7849 prop=7849 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.6ms s1=189.6ms wait=0.1/47.0ms pred gate=device Token # 1514: 3.714ms; value: next_token_ids=tensor([7849], device='cuda:0') mtp accept=1 prop=7849 top1=36101 accp=0.684 next=pair draft=10756 prop=10756 pred gate=device Token # 1515: 114.478ms; value: next_token_ids=tensor([33912], device='cuda:0') mtp accept=0 prop=10756 top1=33912 accp=0.022 next=draft=1299 prop=3461 olap pair=109.3ms serial=193.9ms gain=84.5ms ratio=0.44 s0=4.6ms s1=189.3ms wait=0.1/46.7ms pred gate=device Token # 1516: 114.769ms; value: next_token_ids=tensor([1299], device='cuda:0') mtp accept=0 prop=3461 top1=1299 accp=0.578 next=draft=34864 prop=34864 olap pair=109.5ms serial=194.2ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.9ms wait=0.1/47.1ms pred gate=device Token # 1517: 114.594ms; value: next_token_ids=tensor([34864], device='cuda:0') mtp accept=1 prop=34864 top1=34864 accp=1.000 next=draft=3343 prop=3343 olap pair=109.3ms serial=194.0ms gain=84.7ms ratio=0.44 s0=4.2ms s1=189.8ms wait=0.1/47.2ms pred gate=device Token # 1518: 3.757ms; value: next_token_ids=tensor([3343], device='cuda:0') mtp accept=1 prop=3343 top1=3343 accp=0.910 next=pair draft=10756 prop=10756 pred gate=device Token # 1519: 114.465ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=1 prop=10756 top1=10756 accp=1.000 next=draft=303 prop=303 olap pair=109.2ms serial=193.4ms gain=84.2ms ratio=0.44 s0=4.5ms s1=189.0ms wait=0.1/47.1ms pred gate=device Token # 1520: 3.776ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.947 next=pair draft=16303 prop=16303 pred gate=device Token # 1521: 114.727ms; value: next_token_ids=tensor([653], device='cuda:0') mtp accept=0 prop=16303 top1=4339 accp=0.309 next=draft=8009 prop=22997 olap pair=109.5ms serial=194.1ms gain=84.6ms ratio=0.44 s0=4.4ms s1=189.7ms wait=0.1/47.1ms pred gate=device Token # 1522: 114.800ms; value: next_token_ids=tensor([39668], device='cuda:0') mtp accept=0 prop=22997 top1=8009 accp=0.451 next=draft=26127 prop=26127 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.1ms s1=190.4ms wait=0.1/47.7ms pred gate=device Token # 1523: 115.330ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=0 prop=26127 top1=4339 accp=0.285 next=draft=1121 prop=1121 olap pair=110.0ms serial=195.4ms gain=85.4ms ratio=0.44 s0=3.9ms s1=191.5ms wait=0.1/48.3ms pred gate=device Token # 1524: 115.061ms; value: next_token_ids=tensor([271], device='cuda:0') mtp accept=0 prop=1121 top1=271 accp=0.553 next=draft=372 prop=372 olap pair=109.8ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.5ms wait=0.1/47.3ms pred gate=device Token # 1525: 114.561ms; value: next_token_ids=tensor([372], device='cuda:0') mtp accept=1 prop=372 top1=372 accp=1.000 next=draft=223 prop=223 olap pair=109.3ms serial=194.1ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.8ms wait=0.1/47.3ms pred gate=device Token # 1526: 3.796ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=1304 prop=1304 pred gate=device Token # 1527: 115.017ms; value: next_token_ids=tensor([1304], device='cuda:0') mtp accept=1 prop=1304 top1=1304 accp=1.000 next=draft=410 prop=410 olap pair=109.7ms serial=194.3ms gain=84.5ms ratio=0.44 s0=4.4ms s1=189.8ms wait=0.1/47.1ms pred gate=device Token # 1528: 3.704ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=1.000 next=pair draft=2382 prop=2382 pred gate=device Token # 1529: 114.930ms; value: next_token_ids=tensor([2382], device='cuda:0') mtp accept=1 prop=2382 top1=2382 accp=0.998 next=draft=92 prop=92 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.4ms wait=0.1/47.2ms pred gate=device Token # 1530: 3.727ms; value: next_token_ids=tensor([92], device='cuda:0') mtp accept=1 prop=92 top1=92 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1531: 114.104ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=19 prop=19 olap pair=108.8ms serial=193.0ms gain=84.2ms ratio=0.44 s0=4.3ms s1=188.7ms wait=0.1/47.0ms pred gate=device Token # 1532: 3.710ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=36101 prop=36101 pred gate=device Token # 1533: 114.872ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=0.999 next=draft=17520 prop=17520 olap pair=109.7ms serial=194.6ms gain=84.9ms ratio=0.44 s0=4.4ms s1=190.2ms wait=0.1/46.9ms pred gate=device Token # 1534: 3.710ms; value: next_token_ids=tensor([17520], device='cuda:0') mtp accept=1 prop=17520 top1=17520 accp=0.906 next=pair draft=6985 prop=6985 pred gate=device Token # 1535: 114.525ms; value: next_token_ids=tensor([6985], device='cuda:0') mtp accept=1 prop=6985 top1=6985 accp=0.859 next=draft=18580 prop=18580 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=4.2ms s1=189.9ms wait=0.1/47.5ms pred gate=device Token # 1536: 3.733ms; value: next_token_ids=tensor([18580], device='cuda:0') mtp accept=1 prop=18580 top1=18580 accp=1.000 next=pair draft=946 prop=946 pred gate=device Token # 1537: 114.647ms; value: next_token_ids=tensor([946], device='cuda:0') mtp accept=1 prop=946 top1=946 accp=1.000 next=draft=3803 prop=3803 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=3.7ms s1=190.5ms wait=0.1/48.6ms pred gate=device Token # 1538: 3.793ms; value: next_token_ids=tensor([3803], device='cuda:0') mtp accept=1 prop=3803 top1=3803 accp=1.000 next=pair draft=271 prop=271 pred gate=device Token # 1539: 114.408ms; value: next_token_ids=tensor([271], device='cuda:0') mtp accept=1 prop=271 top1=271 accp=1.000 next=draft=795 prop=795 olap pair=109.2ms serial=193.8ms gain=84.5ms ratio=0.44 s0=4.4ms s1=189.3ms wait=0.1/47.3ms pred gate=device Token # 1540: 3.721ms; value: next_token_ids=tensor([795], device='cuda:0') mtp accept=1 prop=795 top1=795 accp=0.975 next=pair draft=223 prop=223 pred gate=device Token # 1541: 114.733ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=0 prop=223 top1=2619 accp=0.168 next=draft=10242 prop=10242 olap pair=109.6ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.7ms s1=189.7ms wait=0.1/46.6ms pred gate=device Token # 1542: 114.970ms; value: next_token_ids=tensor([10242], device='cuda:0') mtp accept=1 prop=10242 top1=10242 accp=0.684 next=draft=7185 prop=7185 olap pair=109.7ms serial=195.0ms gain=85.3ms ratio=0.44 s0=3.9ms s1=191.1ms wait=0.1/48.2ms pred gate=device Token # 1543: 3.699ms; value: next_token_ids=tensor([7185], device='cuda:0') mtp accept=1 prop=7185 top1=13349 accp=0.497 next=pair draft=666 prop=666 pred gate=device Token # 1544: 114.804ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=768 accp=0.284 next=draft=7524 prop=7524 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=3.7ms s1=191.2ms wait=0.1/48.6ms pred gate=device Token # 1545: 3.710ms; value: next_token_ids=tensor([7524], device='cuda:0') mtp accept=1 prop=7524 top1=7524 accp=0.950 next=pair draft=9854 prop=28986 pred gate=device Token # 1546: 114.781ms; value: next_token_ids=tensor([28986], device='cuda:0') mtp accept=1 prop=28986 top1=28986 accp=0.362 next=draft=7861 prop=7861 olap pair=109.7ms serial=194.7ms gain=85.1ms ratio=0.44 s0=3.7ms s1=191.1ms wait=0.1/48.7ms pred gate=device Token # 1547: 3.739ms; value: next_token_ids=tensor([2382], device='cuda:0') mtp accept=0 prop=7861 top1=2382 accp=0.312 next=pair draft=92 prop=92 pred gate=device Token # 1548: 114.578ms; value: next_token_ids=tensor([92], device='cuda:0') mtp accept=1 prop=92 top1=92 accp=1.000 next=draft=31 prop=31 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=3.6ms s1=190.6ms wait=0.1/48.6ms pred gate=device Token # 1549: 3.745ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 1550: 114.745ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=36101 prop=6787 olap pair=109.6ms serial=194.8ms gain=85.2ms ratio=0.44 s0=3.6ms s1=191.1ms wait=0.1/48.7ms pred gate=device Token # 1551: 3.623ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=0 prop=6787 top1=36101 accp=0.885 next=pair draft=17520 prop=17520 pred gate=device Token # 1552: 114.264ms; value: next_token_ids=tensor([6787], device='cuda:0') mtp accept=0 prop=17520 top1=6787 accp=0.158 next=draft=223 prop=223 olap pair=109.1ms serial=193.7ms gain=84.6ms ratio=0.44 s0=3.7ms s1=190.0ms wait=0.1/48.6ms pred gate=device Token # 1553: 114.608ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.948 next=draft=20432 prop=20432 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=3.6ms s1=190.6ms wait=0.1/48.7ms pred gate=device Token # 1554: 3.713ms; value: next_token_ids=tensor([20432], device='cuda:0') mtp accept=1 prop=20432 top1=20432 accp=0.887 next=pair draft=7157 prop=7157 pred gate=device Token # 1555: 114.955ms; value: next_token_ids=tensor([7157], device='cuda:0') mtp accept=1 prop=7157 top1=7157 accp=0.989 next=draft=8835 prop=8835 olap pair=109.8ms serial=195.0ms gain=85.2ms ratio=0.44 s0=3.8ms s1=191.2ms wait=0.1/48.5ms pred gate=device Token # 1556: 3.670ms; value: next_token_ids=tensor([8835], device='cuda:0') mtp accept=1 prop=8835 top1=8835 accp=0.949 next=pair draft=940 prop=940 pred gate=device Token # 1557: 114.679ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=940 top1=303 accp=0.107 next=draft=52940 prop=70196 olap pair=109.5ms serial=194.6ms gain=85.1ms ratio=0.44 s0=3.8ms s1=190.9ms wait=0.1/48.5ms pred gate=device Token # 1558: 115.429ms; value: next_token_ids=tensor([9209], device='cuda:0') mtp accept=0 prop=70196 top1=52940 accp=0.708 next=draft=9422 prop=9422 olap pair=110.2ms serial=195.8ms gain=85.6ms ratio=0.44 s0=3.7ms s1=192.1ms wait=0.1/48.6ms pred gate=device Token # 1559: 115.025ms; value: next_token_ids=tensor([10124], device='cuda:0') mtp accept=0 prop=9422 top1=10124 accp=0.086 next=draft=303 prop=548 olap pair=109.8ms serial=195.2ms gain=85.4ms ratio=0.44 s0=3.6ms s1=191.5ms wait=0.1/48.7ms pred gate=device Token # 1560: 115.343ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=548 top1=303 accp=0.738 next=draft=7157 prop=32012 olap pair=110.1ms serial=195.7ms gain=85.6ms ratio=0.44 s0=3.6ms s1=192.1ms wait=0.1/48.8ms pred gate=device Token # 1561: 115.531ms; value: next_token_ids=tensor([9422], device='cuda:0') mtp accept=0 prop=32012 top1=9422 accp=0.005 next=draft=30463 prop=100787 olap pair=110.3ms serial=195.8ms gain=85.5ms ratio=0.44 s0=4.7ms s1=191.0ms wait=0.1/46.6ms pred gate=device Token # 1562: 115.360ms; value: next_token_ids=tensor([422], device='cuda:0') mtp accept=0 prop=100787 top1=422 accp=0.333 next=draft=18580 prop=18580 olap pair=110.1ms serial=195.5ms gain=85.4ms ratio=0.44 s0=4.8ms s1=190.7ms wait=0.1/46.5ms pred gate=device Token # 1563: 115.247ms; value: next_token_ids=tensor([18580], device='cuda:0') mtp accept=1 prop=18580 top1=18580 accp=1.000 next=draft=303 prop=303 olap pair=110.0ms serial=195.3ms gain=85.3ms ratio=0.44 s0=4.4ms s1=190.9ms wait=0.1/47.2ms pred gate=device Token # 1564: 3.732ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=0 prop=303 top1=201 accp=0.295 next=pair draft=20759 prop=20759 pred gate=device Token # 1565: 115.249ms; value: next_token_ids=tensor([20759], device='cuda:0') mtp accept=1 prop=20759 top1=20759 accp=0.967 next=draft=795 prop=795 olap pair=110.1ms serial=195.5ms gain=85.4ms ratio=0.44 s0=4.3ms s1=191.2ms wait=0.1/47.2ms pred gate=device Token # 1566: 3.701ms; value: next_token_ids=tensor([795], device='cuda:0') mtp accept=1 prop=795 top1=795 accp=0.993 next=pair draft=2619 prop=2619 pred gate=device Token # 1567: 114.486ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.714 next=draft=12201 prop=12201 olap pair=109.3ms serial=193.9ms gain=84.6ms ratio=0.44 s0=4.2ms s1=189.6ms wait=0.1/47.2ms pred gate=device Token # 1568: 3.724ms; value: next_token_ids=tensor([12201], device='cuda:0') mtp accept=1 prop=12201 top1=12201 accp=0.857 next=pair draft=17395 prop=17395 pred gate=device Token # 1569: 115.086ms; value: next_token_ids=tensor([17395], device='cuda:0') mtp accept=1 prop=17395 top1=17395 accp=0.578 next=draft=666 prop=666 olap pair=109.8ms serial=194.3ms gain=84.5ms ratio=0.43 s0=4.3ms s1=190.0ms wait=0.1/47.5ms pred gate=device Token # 1570: 3.767ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=0.990 next=pair draft=5189 prop=5189 pred gate=device Token # 1571: 114.982ms; value: next_token_ids=tensor([5189], device='cuda:0') mtp accept=1 prop=5189 top1=5189 accp=0.999 next=draft=94 prop=94 olap pair=109.7ms serial=193.7ms gain=83.9ms ratio=0.43 s0=4.0ms s1=189.7ms wait=0.1/48.4ms pred gate=device Token # 1572: 3.725ms; value: next_token_ids=tensor([94], device='cuda:0') mtp accept=1 prop=94 top1=94 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1573: 113.977ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.998 next=draft=66518 prop=66518 olap pair=108.7ms serial=192.7ms gain=84.0ms ratio=0.44 s0=3.7ms s1=189.0ms wait=0.1/48.5ms pred gate=device Token # 1574: 3.751ms; value: next_token_ids=tensor([66518], device='cuda:0') mtp accept=1 prop=66518 top1=66518 accp=1.000 next=pair draft=13349 prop=13349 pred gate=device Token # 1575: 115.005ms; value: next_token_ids=tensor([13349], device='cuda:0') mtp accept=1 prop=13349 top1=13349 accp=0.995 next=draft=369 prop=369 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=3.9ms s1=190.9ms wait=0.1/48.1ms pred gate=device Token # 1576: 3.722ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=1.000 next=pair draft=291 prop=291 pred gate=device Token # 1577: 116.008ms; value: next_token_ids=tensor([291], device='cuda:0') mtp accept=1 prop=291 top1=291 accp=0.999 next=draft=11528 prop=11528 olap pair=110.7ms serial=196.7ms gain=86.0ms ratio=0.44 s0=4.0ms s1=192.7ms wait=0.1/48.0ms pred gate=device Token # 1578: 3.721ms; value: next_token_ids=tensor([11528], device='cuda:0') mtp accept=1 prop=11528 top1=11528 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1579: 114.372ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=19 prop=19 olap pair=109.2ms serial=193.8ms gain=84.6ms ratio=0.44 s0=3.9ms s1=189.9ms wait=0.1/48.1ms pred gate=device Token # 1580: 3.693ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=18580 prop=18580 pred gate=device Token # 1581: 115.038ms; value: next_token_ids=tensor([18580], device='cuda:0') mtp accept=1 prop=18580 top1=18580 accp=0.608 next=draft=946 prop=946 olap pair=109.9ms serial=195.3ms gain=85.4ms ratio=0.44 s0=3.7ms s1=191.6ms wait=0.1/48.6ms pred gate=device Token # 1582: 3.722ms; value: next_token_ids=tensor([946], device='cuda:0') mtp accept=1 prop=946 top1=946 accp=1.000 next=pair draft=369 prop=369 pred gate=device Token # 1583: 114.528ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=1.000 next=draft=223 prop=223 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=3.7ms s1=190.4ms wait=0.1/48.7ms pred gate=device Token # 1584: 3.759ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=24157 prop=5616 pred gate=device Token # 1585: 114.652ms; value: next_token_ids=tensor([5616], device='cuda:0') mtp accept=1 prop=5616 top1=24157 accp=0.810 next=draft=369 prop=369 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=3.6ms s1=190.7ms wait=0.1/48.6ms pred gate=device Token # 1586: 3.716ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=0.937 next=pair draft=223 prop=223 pred gate=device Token # 1587: 115.107ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=7383 prop=7383 olap pair=109.9ms serial=195.3ms gain=85.4ms ratio=0.44 s0=3.7ms s1=191.7ms wait=0.1/48.6ms pred gate=device Token # 1588: 3.698ms; value: next_token_ids=tensor([7383], device='cuda:0') mtp accept=1 prop=7383 top1=7383 accp=0.929 next=pair draft=12145 prop=12145 pred gate=device Token # 1589: 114.514ms; value: next_token_ids=tensor([7640], device='cuda:0') mtp accept=0 prop=12145 top1=7640 accp=0.002 next=draft=94 prop=94 olap pair=109.2ms serial=194.0ms gain=84.7ms ratio=0.44 s0=3.7ms s1=190.3ms wait=0.1/48.5ms pred gate=device Token # 1590: 115.397ms; value: next_token_ids=tensor([94], device='cuda:0') mtp accept=1 prop=94 top1=94 accp=1.000 next=draft=94916 prop=94916 olap pair=110.1ms serial=195.6ms gain=85.5ms ratio=0.44 s0=3.8ms s1=191.8ms wait=0.1/48.5ms pred gate=device Token # 1591: 3.765ms; value: next_token_ids=tensor([94916], device='cuda:0') mtp accept=1 prop=94916 top1=94916 accp=1.000 next=pair draft=94 prop=94 pred gate=device Token # 1592: 115.273ms; value: next_token_ids=tensor([94], device='cuda:0') mtp accept=1 prop=94 top1=94 accp=1.000 next=draft=25711 prop=25711 olap pair=109.9ms serial=193.4ms gain=83.5ms ratio=0.43 s0=4.2ms s1=189.2ms wait=0.1/48.0ms pred gate=device Token # 1593: 3.754ms; value: next_token_ids=tensor([25711], device='cuda:0') mtp accept=1 prop=25711 top1=25711 accp=0.998 next=pair draft=94 prop=94 pred gate=device Token # 1594: 114.847ms; value: next_token_ids=tensor([94], device='cuda:0') mtp accept=1 prop=94 top1=94 accp=1.000 next=draft=20004 prop=20004 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.9ms wait=0.1/48.7ms pred gate=device Token # 1595: 3.792ms; value: next_token_ids=tensor([20004], device='cuda:0') mtp accept=1 prop=20004 top1=20004 accp=1.000 next=pair draft=94 prop=94 pred gate=device Token # 1596: 114.770ms; value: next_token_ids=tensor([94], device='cuda:0') mtp accept=1 prop=94 top1=94 accp=1.000 next=draft=20004 prop=20004 olap pair=109.6ms serial=194.4ms gain=84.9ms ratio=0.44 s0=3.7ms s1=190.8ms wait=0.1/48.6ms pred gate=device Token # 1597: 3.729ms; value: next_token_ids=tensor([20004], device='cuda:0') mtp accept=1 prop=20004 top1=20004 accp=0.997 next=pair draft=22301 prop=22301 pred gate=device Token # 1598: 114.670ms; value: next_token_ids=tensor([22301], device='cuda:0') mtp accept=1 prop=22301 top1=22301 accp=1.000 next=draft=94 prop=94 olap pair=109.5ms serial=194.3ms gain=84.9ms ratio=0.44 s0=3.7ms s1=190.7ms wait=0.1/48.6ms pred gate=device Token # 1599: 3.706ms; value: next_token_ids=tensor([94], device='cuda:0') mtp accept=1 prop=94 top1=94 accp=1.000 next=pair draft=2619 prop=2619 pred gate=device Token # 1600: 115.229ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=draft=9422 prop=9422 olap pair=110.0ms serial=195.3ms gain=85.3ms ratio=0.44 s0=3.7ms s1=191.7ms wait=0.1/48.6ms pred gate=device Token # 1601: 3.636ms; value: next_token_ids=tensor([9422], device='cuda:0') mtp accept=1 prop=9422 top1=9422 accp=1.000 next=pair draft=666 prop=666 pred gate=device Token # 1602: 116.220ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=369 prop=369 olap pair=110.2ms serial=195.4ms gain=85.2ms ratio=0.44 s0=4.0ms s1=191.5ms wait=0.1/48.3ms pred gate=device Token # 1603: 4.693ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=1.000 next=pair draft=53341 prop=53341 pred gate=device Token # 1604: 114.761ms; value: next_token_ids=tensor([53341], device='cuda:0') mtp accept=1 prop=53341 top1=53341 accp=1.000 next=draft=237 prop=237 olap pair=109.3ms serial=193.0ms gain=83.8ms ratio=0.43 s0=7.9ms s1=185.1ms wait=0.2/43.4ms pred gate=device Token # 1605: 3.700ms; value: next_token_ids=tensor([237], device='cuda:0') mtp accept=1 prop=237 top1=237 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1606: 114.542ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.906 next=draft=82377 prop=422 olap pair=109.3ms serial=194.0ms gain=84.7ms ratio=0.44 s0=3.7ms s1=190.3ms wait=0.1/48.6ms pred gate=device Token # 1607: 3.791ms; value: next_token_ids=tensor([422], device='cuda:0') mtp accept=1 prop=422 top1=422 accp=0.314 next=pair draft=18580 prop=18580 pred gate=device Token # 1608: 114.756ms; value: next_token_ids=tensor([18580], device='cuda:0') mtp accept=1 prop=18580 top1=18580 accp=1.000 next=draft=369 prop=369 olap pair=109.3ms serial=194.0ms gain=84.7ms ratio=0.44 s0=3.7ms s1=190.3ms wait=0.1/48.6ms pred gate=device Token # 1609: 3.778ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1610: 114.771ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=6525 prop=6525 olap pair=109.5ms serial=194.3ms gain=84.8ms ratio=0.44 s0=3.7ms s1=190.6ms wait=0.1/48.6ms pred gate=device Token # 1611: 3.731ms; value: next_token_ids=tensor([6525], device='cuda:0') mtp accept=1 prop=6525 top1=6525 accp=0.811 next=pair draft=31446 prop=31446 pred gate=device Token # 1612: 114.331ms; value: next_token_ids=tensor([31446], device='cuda:0') mtp accept=1 prop=31446 top1=31446 accp=0.999 next=draft=45276 prop=45276 olap pair=109.1ms serial=193.7ms gain=84.6ms ratio=0.44 s0=3.7ms s1=190.1ms wait=0.1/48.7ms pred gate=device Token # 1613: 3.778ms; value: next_token_ids=tensor([45276], device='cuda:0') mtp accept=1 prop=45276 top1=45276 accp=0.999 next=pair draft=25024 prop=25024 pred gate=device Token # 1614: 115.241ms; value: next_token_ids=tensor([25024], device='cuda:0') mtp accept=1 prop=25024 top1=25024 accp=1.000 next=draft=369 prop=369 olap pair=110.1ms serial=194.9ms gain=84.8ms ratio=0.44 s0=4.0ms s1=190.8ms wait=0.1/48.4ms pred gate=device Token # 1615: 3.673ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=0.999 next=pair draft=223 prop=223 pred gate=device Token # 1616: 114.449ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.997 next=draft=4916 prop=9209 olap pair=109.3ms serial=194.0ms gain=84.7ms ratio=0.44 s0=3.7ms s1=190.3ms wait=0.1/48.7ms pred gate=device Token # 1617: 3.735ms; value: next_token_ids=tensor([30328], device='cuda:0') mtp accept=0 prop=9209 top1=4916 accp=0.674 next=pair draft=2541 prop=2541 pred gate=device Token # 1618: 114.806ms; value: next_token_ids=tensor([2541], device='cuda:0') mtp accept=1 prop=2541 top1=2541 accp=0.985 next=draft=7640 prop=7640 olap pair=109.5ms serial=194.6ms gain=85.1ms ratio=0.44 s0=3.6ms s1=191.0ms wait=0.1/48.7ms pred gate=device Token # 1619: 3.735ms; value: next_token_ids=tensor([7640], device='cuda:0') mtp accept=1 prop=7640 top1=7640 accp=1.000 next=pair draft=94 prop=94 pred gate=device Token # 1620: 115.467ms; value: next_token_ids=tensor([94], device='cuda:0') mtp accept=1 prop=94 top1=94 accp=1.000 next=draft=2619 prop=2619 olap pair=110.3ms serial=195.9ms gain=85.7ms ratio=0.44 s0=3.7ms s1=192.3ms wait=0.1/48.7ms pred gate=device Token # 1621: 3.750ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=pair draft=8835 prop=8835 pred gate=device Token # 1622: 114.814ms; value: next_token_ids=tensor([8835], device='cuda:0') mtp accept=1 prop=8835 top1=8835 accp=1.000 next=draft=666 prop=666 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.1ms s1=190.7ms wait=0.1/47.8ms pred gate=device Token # 1623: 3.752ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=pair draft=369 prop=369 pred gate=device Token # 1624: 115.083ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=1.000 next=draft=82981 prop=82981 olap pair=109.8ms serial=194.9ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.6ms wait=0.1/47.3ms pred gate=device Token # 1625: 3.742ms; value: next_token_ids=tensor([82981], device='cuda:0') mtp accept=1 prop=82981 top1=82981 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1626: 114.736ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.931 next=draft=18580 prop=18580 olap pair=109.6ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/47.3ms pred gate=device Token # 1627: 3.708ms; value: next_token_ids=tensor([18580], device='cuda:0') mtp accept=1 prop=18580 top1=18580 accp=0.996 next=pair draft=369 prop=369 pred gate=device Token # 1628: 114.883ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=1.000 next=draft=223 prop=223 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.5ms wait=0.1/47.1ms pred gate=device Token # 1629: 3.698ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.999 next=pair draft=653 prop=653 pred gate=device Token # 1630: 115.140ms; value: next_token_ids=tensor([653], device='cuda:0') mtp accept=1 prop=653 top1=653 accp=0.993 next=draft=31446 prop=31446 olap pair=110.0ms serial=195.2ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.9ms wait=0.1/47.2ms pred gate=device Token # 1631: 3.744ms; value: next_token_ids=tensor([2693], device='cuda:0') mtp accept=0 prop=31446 top1=2693 accp=0.299 next=pair draft=751 prop=751 pred gate=device Token # 1632: 114.816ms; value: next_token_ids=tensor([751], device='cuda:0') mtp accept=1 prop=751 top1=751 accp=1.000 next=draft=8842 prop=8842 olap pair=109.6ms serial=194.4ms gain=84.8ms ratio=0.44 s0=4.3ms s1=190.1ms wait=0.1/47.2ms pred gate=device Token # 1633: 3.682ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=0.994 next=pair draft=10602 prop=10602 pred gate=device Token # 1634: 115.084ms; value: next_token_ids=tensor([10602], device='cuda:0') mtp accept=1 prop=10602 top1=10602 accp=0.895 next=draft=303 prop=303 olap pair=109.9ms serial=195.2ms gain=85.3ms ratio=0.44 s0=4.3ms s1=190.9ms wait=0.1/47.3ms pred gate=device Token # 1635: 3.712ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.897 next=pair draft=8964 prop=1644 pred gate=device Token # 1636: 114.918ms; value: next_token_ids=tensor([8964], device='cuda:0') mtp accept=0 prop=1644 top1=8964 accp=0.782 next=draft=1644 prop=1644 olap pair=109.7ms serial=194.7ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.4ms wait=0.1/47.3ms pred gate=device Token # 1637: 115.058ms; value: next_token_ids=tensor([1644], device='cuda:0') mtp accept=1 prop=1644 top1=1644 accp=0.647 next=draft=6034 prop=4289 olap pair=109.8ms serial=194.9ms gain=85.1ms ratio=0.44 s0=4.7ms s1=190.3ms wait=0.1/46.7ms pred gate=device Token # 1638: 3.784ms; value: next_token_ids=tensor([4289], device='cuda:0') mtp accept=1 prop=4289 top1=4289 accp=0.380 next=pair draft=2664 prop=2664 pred gate=device Token # 1639: 114.608ms; value: next_token_ids=tensor([26127], device='cuda:0') mtp accept=0 prop=2664 top1=26127 accp=0.293 next=draft=6561 prop=6561 olap pair=109.4ms serial=194.1ms gain=84.7ms ratio=0.44 s0=4.7ms s1=189.4ms wait=0.1/46.6ms pred gate=device Token # 1640: 115.329ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=0 prop=6561 top1=369 accp=0.232 next=draft=223 prop=223 olap pair=110.1ms serial=195.4ms gain=85.4ms ratio=0.44 s0=4.5ms s1=191.0ms wait=0.1/46.9ms pred gate=device Token # 1641: 115.773ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=0 prop=223 top1=2619 accp=0.146 next=draft=10242 prop=10242 olap pair=110.5ms serial=196.0ms gain=85.5ms ratio=0.44 s0=4.8ms s1=191.2ms wait=0.1/46.6ms pred gate=device Token # 1642: 114.874ms; value: next_token_ids=tensor([10242], device='cuda:0') mtp accept=1 prop=10242 top1=71569 accp=0.613 next=draft=7185 prop=7185 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=3.8ms s1=190.9ms wait=0.1/48.5ms pred gate=device Token # 1643: 3.706ms; value: next_token_ids=tensor([2541], device='cuda:0') mtp accept=0 prop=7185 top1=2541 accp=0.103 next=pair draft=666 prop=666 pred gate=device Token # 1644: 114.590ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=7640 prop=7640 olap pair=109.3ms serial=194.0ms gain=84.7ms ratio=0.44 s0=3.8ms s1=190.2ms wait=0.1/48.6ms pred gate=device Token # 1645: 3.690ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=7640 top1=303 accp=0.348 next=pair draft=9158 prop=9158 pred gate=device Token # 1646: 114.782ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=0 prop=9158 top1=1207 accp=0.602 next=draft=1714 prop=1714 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.8ms wait=0.1/48.6ms pred gate=device Token # 1647: 114.594ms; value: next_token_ids=tensor([1714], device='cuda:0') mtp accept=1 prop=1714 top1=1714 accp=0.930 next=draft=4754 prop=4754 olap pair=109.4ms serial=194.1ms gain=84.8ms ratio=0.44 s0=3.8ms s1=190.3ms wait=0.1/48.5ms pred gate=device Token # 1648: 3.761ms; value: next_token_ids=tensor([7157], device='cuda:0') mtp accept=0 prop=4754 top1=7157 accp=0.386 next=pair draft=16303 prop=16303 pred gate=device Token # 1649: 114.919ms; value: next_token_ids=tensor([16303], device='cuda:0') mtp accept=1 prop=16303 top1=16303 accp=0.999 next=draft=100642 prop=100642 olap pair=109.7ms serial=194.6ms gain=84.9ms ratio=0.44 s0=3.8ms s1=190.7ms wait=0.1/48.4ms pred gate=device Token # 1650: 3.734ms; value: next_token_ids=tensor([100642], device='cuda:0') mtp accept=1 prop=100642 top1=100642 accp=0.998 next=pair draft=7640 prop=7640 pred gate=device Token # 1651: 114.292ms; value: next_token_ids=tensor([7640], device='cuda:0') mtp accept=1 prop=7640 top1=7640 accp=1.000 next=draft=94 prop=94 olap pair=109.0ms serial=193.5ms gain=84.5ms ratio=0.44 s0=3.7ms s1=189.8ms wait=0.1/48.7ms pred gate=device Token # 1652: 3.705ms; value: next_token_ids=tensor([94], device='cuda:0') mtp accept=1 prop=94 top1=94 accp=1.000 next=pair draft=2619 prop=2619 pred gate=device Token # 1653: 114.696ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=draft=10124 prop=10124 olap pair=109.4ms serial=194.1ms gain=84.7ms ratio=0.44 s0=3.8ms s1=190.2ms wait=0.1/48.5ms pred gate=device Token # 1654: 3.709ms; value: next_token_ids=tensor([10124], device='cuda:0') mtp accept=1 prop=10124 top1=10124 accp=1.000 next=pair draft=666 prop=666 pred gate=device Token # 1655: 114.658ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=369 prop=369 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.3ms s1=190.0ms wait=0.1/47.9ms pred gate=device Token # 1656: 3.726ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=1.000 next=pair draft=1327 prop=1327 pred gate=device Token # 1657: 114.883ms; value: next_token_ids=tensor([1327], device='cuda:0') mtp accept=1 prop=1327 top1=1327 accp=0.999 next=draft=251 prop=251 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.9ms wait=0.1/48.6ms pred gate=device Token # 1658: 3.742ms; value: next_token_ids=tensor([251], device='cuda:0') mtp accept=1 prop=251 top1=251 accp=1.000 next=pair draft=257 prop=257 pred gate=device Token # 1659: 114.276ms; value: next_token_ids=tensor([257], device='cuda:0') mtp accept=1 prop=257 top1=257 accp=1.000 next=draft=10759 prop=10759 olap pair=109.0ms serial=193.6ms gain=84.5ms ratio=0.44 s0=3.6ms s1=189.9ms wait=0.1/48.6ms pred gate=device Token # 1660: 3.660ms; value: next_token_ids=tensor([10759], device='cuda:0') mtp accept=1 prop=10759 top1=10759 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1661: 114.373ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=422 prop=422 olap pair=109.2ms serial=194.1ms gain=84.8ms ratio=0.44 s0=3.7ms s1=190.4ms wait=0.1/48.6ms pred gate=device Token # 1662: 3.736ms; value: next_token_ids=tensor([422], device='cuda:0') mtp accept=1 prop=422 top1=422 accp=0.978 next=pair draft=10242 prop=10242 pred gate=device Token # 1663: 114.275ms; value: next_token_ids=tensor([10242], device='cuda:0') mtp accept=1 prop=10242 top1=10242 accp=0.979 next=draft=369 prop=369 olap pair=109.1ms serial=193.8ms gain=84.7ms ratio=0.44 s0=3.7ms s1=190.2ms wait=0.1/48.7ms pred gate=device Token # 1664: 3.675ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1665: 114.424ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.999 next=draft=34408 prop=34408 olap pair=109.3ms serial=193.9ms gain=84.7ms ratio=0.44 s0=4.0ms s1=189.9ms wait=0.1/47.9ms pred gate=device Token # 1666: 3.770ms; value: next_token_ids=tensor([34408], device='cuda:0') mtp accept=1 prop=34408 top1=34408 accp=1.000 next=pair draft=1728 prop=1728 pred gate=device Token # 1667: 114.547ms; value: next_token_ids=tensor([1728], device='cuda:0') mtp accept=1 prop=1728 top1=1728 accp=1.000 next=draft=75777 prop=75777 olap pair=109.3ms serial=194.2ms gain=84.8ms ratio=0.44 s0=3.8ms s1=190.4ms wait=0.1/48.6ms pred gate=device Token # 1668: 3.692ms; value: next_token_ids=tensor([75777], device='cuda:0') mtp accept=1 prop=75777 top1=75777 accp=0.995 next=pair draft=8056 prop=8056 pred gate=device Token # 1669: 114.684ms; value: next_token_ids=tensor([8056], device='cuda:0') mtp accept=1 prop=8056 top1=8056 accp=0.984 next=draft=303 prop=303 olap pair=109.5ms serial=194.4ms gain=84.9ms ratio=0.44 s0=3.7ms s1=190.7ms wait=0.1/48.7ms pred gate=device Token # 1670: 3.711ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.971 next=pair draft=63239 prop=63239 pred gate=device Token # 1671: 114.875ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=0 prop=63239 top1=6034 accp=0.662 next=draft=63239 prop=63239 olap pair=109.6ms serial=194.3ms gain=84.7ms ratio=0.44 s0=4.3ms s1=190.0ms wait=0.1/47.6ms pred gate=device Token # 1672: 114.836ms; value: next_token_ids=tensor([63239], device='cuda:0') mtp accept=1 prop=63239 top1=63239 accp=1.000 next=draft=2467 prop=2467 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=3.8ms s1=190.3ms wait=0.1/48.4ms pred gate=device Token # 1673: 3.715ms; value: next_token_ids=tensor([2467], device='cuda:0') mtp accept=1 prop=2467 top1=2467 accp=1.000 next=pair draft=369 prop=369 pred gate=device Token # 1674: 114.967ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=1.000 next=draft=223 prop=223 olap pair=109.8ms serial=194.5ms gain=84.8ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/47.4ms pred gate=device Token # 1675: 3.691ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=9209 prop=9209 pred gate=device Token # 1676: 114.821ms; value: next_token_ids=tensor([91447], device='cuda:0') mtp accept=0 prop=9209 top1=91447 accp=0.521 next=draft=2541 prop=303 olap pair=109.6ms serial=194.1ms gain=84.5ms ratio=0.44 s0=5.3ms s1=188.8ms wait=0.1/46.5ms pred gate=device Token # 1677: 114.714ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.260 next=draft=30775 prop=30775 olap pair=109.5ms serial=194.4ms gain=84.9ms ratio=0.44 s0=3.8ms s1=190.6ms wait=0.1/48.5ms pred gate=device Token # 1678: 3.697ms; value: next_token_ids=tensor([30775], device='cuda:0') mtp accept=1 prop=30775 top1=30775 accp=0.999 next=pair draft=8842 prop=8842 pred gate=device Token # 1679: 114.893ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=0.967 next=draft=125344 prop=19341 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=3.9ms s1=190.9ms wait=0.1/48.2ms pred gate=device Token # 1680: 3.736ms; value: next_token_ids=tensor([38763], device='cuda:0') mtp accept=0 prop=19341 top1=38763 accp=0.167 next=pair draft=7640 prop=7640 pred gate=device Token # 1681: 115.279ms; value: next_token_ids=tensor([7640], device='cuda:0') mtp accept=1 prop=7640 top1=7640 accp=0.702 next=draft=94 prop=94 olap pair=110.1ms serial=195.4ms gain=85.3ms ratio=0.44 s0=4.3ms s1=191.1ms wait=0.1/47.2ms pred gate=device Token # 1682: 3.686ms; value: next_token_ids=tensor([94], device='cuda:0') mtp accept=1 prop=94 top1=94 accp=1.000 next=pair draft=2619 prop=2619 pred gate=device Token # 1683: 114.846ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=draft=7989 prop=7989 olap pair=109.6ms serial=194.5ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/47.1ms pred gate=device Token # 1684: 3.713ms; value: next_token_ids=tensor([7989], device='cuda:0') mtp accept=1 prop=7989 top1=7989 accp=1.000 next=pair draft=666 prop=666 pred gate=device Token # 1685: 115.138ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=369 prop=369 olap pair=109.9ms serial=195.0ms gain=85.1ms ratio=0.44 s0=4.1ms s1=190.9ms wait=0.1/47.7ms pred gate=device Token # 1686: 3.768ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=1.000 next=pair draft=82981 prop=82981 pred gate=device Token # 1687: 114.862ms; value: next_token_ids=tensor([82981], device='cuda:0') mtp accept=1 prop=82981 top1=82981 accp=0.739 next=draft=223 prop=223 olap pair=109.7ms serial=194.5ms gain=84.8ms ratio=0.44 s0=3.8ms s1=190.6ms wait=0.1/48.3ms pred gate=device Token # 1688: 3.740ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.999 next=pair draft=4674 prop=4674 pred gate=device Token # 1689: 115.286ms; value: next_token_ids=tensor([4674], device='cuda:0') mtp accept=1 prop=4674 top1=4674 accp=0.822 next=draft=18580 prop=18580 olap pair=110.0ms serial=195.4ms gain=85.3ms ratio=0.44 s0=4.0ms s1=191.4ms wait=0.1/48.1ms pred gate=device Token # 1690: 3.736ms; value: next_token_ids=tensor([18580], device='cuda:0') mtp accept=1 prop=18580 top1=18580 accp=0.919 next=pair draft=369 prop=369 pred gate=device Token # 1691: 114.616ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=1.000 next=draft=223 prop=223 olap pair=109.4ms serial=194.4ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.7ms wait=0.1/48.7ms pred gate=device Token # 1692: 3.815ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.999 next=pair draft=3440 prop=3440 pred gate=device Token # 1693: 115.272ms; value: next_token_ids=tensor([3440], device='cuda:0') mtp accept=1 prop=3440 top1=3440 accp=1.000 next=draft=28608 prop=28608 olap pair=110.1ms serial=195.5ms gain=85.4ms ratio=0.44 s0=4.3ms s1=191.2ms wait=0.1/47.3ms pred gate=device Token # 1694: 3.786ms; value: next_token_ids=tensor([28608], device='cuda:0') mtp accept=1 prop=28608 top1=28608 accp=0.906 next=pair draft=39 prop=39 pred gate=device Token # 1695: 115.024ms; value: next_token_ids=tensor([39], device='cuda:0') mtp accept=1 prop=39 top1=39 accp=1.000 next=draft=8842 prop=8842 olap pair=109.8ms serial=194.9ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.6ms wait=0.1/47.1ms pred gate=device Token # 1696: 3.730ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=1.000 next=pair draft=5660 prop=5660 pred gate=device Token # 1697: 114.687ms; value: next_token_ids=tensor([21978], device='cuda:0') mtp accept=0 prop=5660 top1=21978 accp=0.256 next=draft=303 prop=303 olap pair=109.5ms serial=194.4ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.7ms wait=0.1/48.6ms pred gate=device Token # 1698: 115.199ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=0 prop=303 top1=303 accp=0.686 next=draft=223 prop=223 olap pair=110.0ms serial=195.3ms gain=85.3ms ratio=0.44 s0=3.9ms s1=191.3ms wait=0.1/48.1ms pred gate=device Token # 1699: 115.019ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=10192 accp=0.372 next=draft=24096 prop=24096 olap pair=109.8ms serial=195.0ms gain=85.3ms ratio=0.44 s0=3.7ms s1=191.4ms wait=0.1/48.7ms pred gate=device Token # 1700: 3.731ms; value: next_token_ids=tensor([24096], device='cuda:0') mtp accept=1 prop=24096 top1=24096 accp=0.856 next=pair draft=28608 prop=28608 pred gate=device Token # 1701: 114.834ms; value: next_token_ids=tensor([28608], device='cuda:0') mtp accept=1 prop=28608 top1=28608 accp=1.000 next=draft=39 prop=39 olap pair=109.6ms serial=194.6ms gain=85.1ms ratio=0.44 s0=3.7ms s1=190.9ms wait=0.1/48.6ms pred gate=device Token # 1702: 3.704ms; value: next_token_ids=tensor([39], device='cuda:0') mtp accept=1 prop=39 top1=39 accp=1.000 next=pair draft=8842 prop=8842 pred gate=device Token # 1703: 114.218ms; value: next_token_ids=tensor([35991], device='cuda:0') mtp accept=0 prop=8842 top1=8842 accp=0.924 next=draft=303 prop=303 olap pair=109.0ms serial=193.5ms gain=84.4ms ratio=0.44 s0=3.9ms s1=189.6ms wait=0.1/48.0ms pred gate=device Token # 1704: 114.746ms; value: next_token_ids=tensor([2032], device='cuda:0') mtp accept=0 prop=303 top1=303 accp=0.688 next=draft=10242 prop=10242 olap pair=109.5ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.2ms s1=190.0ms wait=0.1/47.4ms pred gate=device Token # 1705: 114.910ms; value: next_token_ids=tensor([10242], device='cuda:0') mtp accept=1 prop=10242 top1=10242 accp=0.949 next=draft=25830 prop=25830 olap pair=109.6ms serial=194.5ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/47.1ms pred gate=device Token # 1706: 3.738ms; value: next_token_ids=tensor([25830], device='cuda:0') mtp accept=1 prop=25830 top1=25830 accp=0.997 next=pair draft=795 prop=795 pred gate=device Token # 1707: 115.231ms; value: next_token_ids=tensor([795], device='cuda:0') mtp accept=1 prop=795 top1=795 accp=0.848 next=draft=2619 prop=2619 olap pair=110.0ms serial=195.4ms gain=85.4ms ratio=0.44 s0=4.0ms s1=191.4ms wait=0.1/48.1ms pred gate=device Token # 1708: 3.692ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=pair draft=18172 prop=1131 pred gate=device Token # 1709: 114.847ms; value: next_token_ids=tensor([13276], device='cuda:0') mtp accept=0 prop=1131 top1=13276 accp=0.156 next=draft=7185 prop=7185 olap pair=109.7ms serial=194.7ms gain=85.0ms ratio=0.44 s0=4.0ms s1=190.7ms wait=0.1/47.9ms pred gate=device Token # 1710: 114.526ms; value: next_token_ids=tensor([13349], device='cuda:0') mtp accept=0 prop=7185 top1=13349 accp=0.038 next=draft=7383 prop=7383 olap pair=109.3ms serial=193.8ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.5ms wait=0.1/47.2ms pred gate=device Token # 1711: 114.882ms; value: next_token_ids=tensor([7383], device='cuda:0') mtp accept=1 prop=7383 top1=7383 accp=0.928 next=draft=666 prop=666 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.3ms wait=0.1/47.2ms pred gate=device Token # 1712: 3.702ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=pair draft=5189 prop=5189 pred gate=device Token # 1713: 114.455ms; value: next_token_ids=tensor([5189], device='cuda:0') mtp accept=1 prop=5189 top1=5189 accp=0.935 next=draft=19 prop=19 olap pair=109.3ms serial=193.8ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.5ms wait=0.1/47.2ms pred gate=device Token # 1714: 3.782ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=0.882 next=pair draft=16 prop=16 pred gate=device Token # 1715: 114.475ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=2619 prop=2619 olap pair=109.1ms serial=193.6ms gain=84.4ms ratio=0.44 s0=4.3ms s1=189.3ms wait=0.1/47.2ms pred gate=device Token # 1716: 3.690ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=pair draft=8162 prop=8162 pred gate=device Token # 1717: 114.341ms; value: next_token_ids=tensor([8162], device='cuda:0') mtp accept=1 prop=8162 top1=8162 accp=0.951 next=draft=8835 prop=8835 olap pair=109.1ms serial=193.6ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.3ms wait=0.1/47.2ms pred gate=device Token # 1718: 3.709ms; value: next_token_ids=tensor([8835], device='cuda:0') mtp accept=1 prop=8835 top1=8835 accp=0.995 next=pair draft=7185 prop=15133 pred gate=device Token # 1719: 114.539ms; value: next_token_ids=tensor([13349], device='cuda:0') mtp accept=0 prop=15133 top1=7185 accp=0.940 next=draft=666 prop=666 olap pair=109.3ms serial=193.9ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.6ms wait=0.1/47.3ms pred gate=device Token # 1720: 115.218ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=0.997 next=draft=1237 prop=1237 olap pair=109.9ms serial=195.2ms gain=85.3ms ratio=0.44 s0=4.3ms s1=190.9ms wait=0.1/47.1ms pred gate=device Token # 1721: 3.716ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=0.999 next=pair draft=1103 prop=1103 pred gate=device Token # 1722: 115.064ms; value: next_token_ids=tensor([1103], device='cuda:0') mtp accept=1 prop=1103 top1=1103 accp=0.510 next=draft=23750 prop=23750 olap pair=109.9ms serial=195.1ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.8ms wait=0.1/47.2ms pred gate=device Token # 1723: 3.730ms; value: next_token_ids=tensor([10242], device='cuda:0') mtp accept=0 prop=23750 top1=23750 accp=0.829 next=pair draft=7807 prop=7807 pred gate=device Token # 1724: 114.963ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=0 prop=7807 top1=1227 accp=0.622 next=draft=7524 prop=7524 olap pair=109.7ms serial=194.7ms gain=85.0ms ratio=0.44 s0=4.4ms s1=190.4ms wait=0.1/47.1ms pred gate=device Token # 1725: 115.574ms; value: next_token_ids=tensor([7524], device='cuda:0') mtp accept=1 prop=7524 top1=7524 accp=1.000 next=draft=262 prop=262 olap pair=109.4ms serial=193.8ms gain=84.4ms ratio=0.44 s0=4.6ms s1=189.3ms wait=0.1/47.0ms pred gate=device Token # 1726: 4.609ms; value: next_token_ids=tensor([262], device='cuda:0') mtp accept=1 prop=262 top1=262 accp=0.998 next=pair draft=35739 prop=35739 pred gate=device Token # 1727: 114.873ms; value: next_token_ids=tensor([35739], device='cuda:0') mtp accept=1 prop=35739 top1=35739 accp=0.727 next=draft=36490 prop=36490 olap pair=109.5ms serial=193.7ms gain=84.2ms ratio=0.43 s0=8.2ms s1=185.4ms wait=0.2/42.7ms pred gate=device Token # 1728: 3.796ms; value: next_token_ids=tensor([36490], device='cuda:0') mtp accept=1 prop=36490 top1=36490 accp=0.998 next=pair draft=201 prop=201 pred gate=device Token # 1729: 114.472ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=1.000 next=draft=262 prop=262 olap pair=109.3ms serial=193.9ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.6ms wait=0.1/47.3ms pred gate=device Token # 1730: 3.792ms; value: next_token_ids=tensor([262], device='cuda:0') mtp accept=1 prop=262 top1=262 accp=0.999 next=pair draft=1823 prop=1823 pred gate=device Token # 1731: 114.770ms; value: next_token_ids=tensor([1823], device='cuda:0') mtp accept=1 prop=1823 top1=1823 accp=1.000 next=draft=223 prop=223 olap pair=109.6ms serial=194.5ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/47.3ms pred gate=device Token # 1732: 3.739ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.996 next=pair draft=8040 prop=8040 pred gate=device Token # 1733: 114.544ms; value: next_token_ids=tensor([8040], device='cuda:0') mtp accept=1 prop=8040 top1=8040 accp=0.852 next=draft=768 prop=768 olap pair=109.4ms serial=194.1ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.9ms wait=0.1/47.2ms pred gate=device Token # 1734: 3.695ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=0.852 next=pair draft=26 prop=26 pred gate=device Token # 1735: 114.085ms; value: next_token_ids=tensor([1275], device='cuda:0') mtp accept=0 prop=26 top1=1275 accp=0.017 next=draft=2122 prop=2122 olap pair=108.9ms serial=193.2ms gain=84.3ms ratio=0.44 s0=4.3ms s1=189.0ms wait=0.1/47.1ms pred gate=device Token # 1736: 114.738ms; value: next_token_ids=tensor([2122], device='cuda:0') mtp accept=1 prop=2122 top1=2122 accp=0.749 next=draft=36 prop=36 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.9ms wait=0.1/47.2ms pred gate=device Token # 1737: 3.727ms; value: next_token_ids=tensor([36], device='cuda:0') mtp accept=1 prop=36 top1=36 accp=1.000 next=pair draft=8842 prop=8842 pred gate=device Token # 1738: 114.219ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=0.958 next=draft=15133 prop=15133 olap pair=109.0ms serial=193.3ms gain=84.3ms ratio=0.44 s0=4.3ms s1=189.1ms wait=0.1/47.3ms pred gate=device Token # 1739: 3.729ms; value: next_token_ids=tensor([642], device='cuda:0') mtp accept=0 prop=15133 top1=642 accp=0.111 next=pair draft=8835 prop=8835 pred gate=device Token # 1740: 114.628ms; value: next_token_ids=tensor([8835], device='cuda:0') mtp accept=1 prop=8835 top1=8835 accp=0.716 next=draft=1824 prop=445 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.9ms wait=0.1/47.2ms pred gate=device Token # 1741: 3.768ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=0 prop=445 top1=31 accp=0.102 next=pair draft=26 prop=26 pred gate=device Token # 1742: 114.419ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=0.997 next=draft=68160 prop=68160 olap pair=109.2ms serial=193.7ms gain=84.5ms ratio=0.44 s0=4.2ms s1=189.5ms wait=0.1/47.1ms pred gate=device Token # 1743: 3.728ms; value: next_token_ids=tensor([15133], device='cuda:0') mtp accept=0 prop=68160 top1=15133 accp=0.008 next=pair draft=445 prop=445 pred gate=device Token # 1744: 114.773ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=1 prop=445 top1=445 accp=0.931 next=draft=26 prop=26 olap pair=109.4ms serial=194.1ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.9ms wait=0.1/47.3ms pred gate=device Token # 1745: 3.982ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=pair draft=2353 prop=2353 pred gate=device Token # 1746: 114.908ms; value: next_token_ids=tensor([2353], device='cuda:0') mtp accept=1 prop=2353 top1=2353 accp=1.000 next=draft=35 prop=90738 olap pair=109.5ms serial=194.1ms gain=84.6ms ratio=0.44 s0=4.6ms s1=189.5ms wait=0.1/47.0ms pred gate=device Token # 1747: 3.680ms; value: next_token_ids=tensor([35], device='cuda:0') mtp accept=0 prop=90738 top1=35 accp=0.396 next=pair draft=1457 prop=1457 pred gate=device Token # 1748: 115.010ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=0.998 next=draft=21975 prop=21975 olap pair=109.8ms serial=195.0ms gain=85.2ms ratio=0.44 s0=3.9ms s1=191.0ms wait=0.1/48.1ms pred gate=device Token # 1749: 3.786ms; value: next_token_ids=tensor([572], device='cuda:0') mtp accept=0 prop=21975 top1=572 accp=0.236 next=pair draft=201 prop=201 pred gate=device Token # 1750: 115.989ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=1.000 next=draft=262 prop=262 olap pair=109.9ms serial=194.3ms gain=84.4ms ratio=0.43 s0=6.6ms s1=187.7ms wait=0.2/44.7ms pred gate=device Token # 1751: 4.565ms; value: next_token_ids=tensor([262], device='cuda:0') mtp accept=1 prop=262 top1=262 accp=1.000 next=pair draft=1823 prop=1823 pred gate=device Token # 1752: 116.063ms; value: next_token_ids=tensor([1823], device='cuda:0') mtp accept=1 prop=1823 top1=1823 accp=1.000 next=draft=223 prop=223 olap pair=109.9ms serial=193.6ms gain=83.7ms ratio=0.43 s0=8.9ms s1=184.8ms wait=0.2/42.1ms pred gate=device Token # 1753: 4.645ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.992 next=pair draft=1833 prop=1833 pred gate=device Token # 1754: 114.618ms; value: next_token_ids=tensor([1833], device='cuda:0') mtp accept=1 prop=1833 top1=1833 accp=0.704 next=draft=2353 prop=2353 olap pair=109.4ms serial=194.2ms gain=84.9ms ratio=0.44 s0=3.7ms s1=190.5ms wait=0.1/48.6ms pred gate=device Token # 1755: 3.767ms; value: next_token_ids=tensor([2353], device='cuda:0') mtp accept=1 prop=2353 top1=2353 accp=0.919 next=pair draft=4289 prop=4289 pred gate=device Token # 1756: 114.501ms; value: next_token_ids=tensor([4289], device='cuda:0') mtp accept=1 prop=4289 top1=4289 accp=1.000 next=draft=36893 prop=36893 olap pair=109.2ms serial=193.8ms gain=84.6ms ratio=0.44 s0=3.8ms s1=190.0ms wait=0.1/48.4ms pred gate=device Token # 1757: 3.711ms; value: next_token_ids=tensor([36893], device='cuda:0') mtp accept=1 prop=36893 top1=36893 accp=0.587 next=pair draft=3021 prop=3021 pred gate=device Token # 1758: 114.767ms; value: next_token_ids=tensor([15030], device='cuda:0') mtp accept=0 prop=3021 top1=15030 accp=0.056 next=draft=3021 prop=3021 olap pair=109.6ms serial=194.6ms gain=85.1ms ratio=0.44 s0=3.8ms s1=190.9ms wait=0.1/48.5ms pred gate=device Token # 1759: 115.324ms; value: next_token_ids=tensor([3021], device='cuda:0') mtp accept=1 prop=3021 top1=3021 accp=0.999 next=draft=26 prop=26 olap pair=110.1ms serial=195.7ms gain=85.7ms ratio=0.44 s0=3.8ms s1=192.0ms wait=0.1/48.4ms pred gate=device Token # 1760: 3.674ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=0.913 next=pair draft=16 prop=16 pred gate=device Token # 1761: 114.745ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=2402 prop=2402 olap pair=109.5ms serial=194.2ms gain=84.7ms ratio=0.44 s0=3.8ms s1=190.4ms wait=0.1/48.5ms pred gate=device Token # 1762: 3.793ms; value: next_token_ids=tensor([2402], device='cuda:0') mtp accept=1 prop=2402 top1=2402 accp=1.000 next=pair draft=36 prop=36 pred gate=device Token # 1763: 114.398ms; value: next_token_ids=tensor([36], device='cuda:0') mtp accept=1 prop=36 top1=36 accp=1.000 next=draft=10602 prop=10602 olap pair=109.1ms serial=193.8ms gain=84.7ms ratio=0.44 s0=3.7ms s1=190.2ms wait=0.1/48.6ms pred gate=device Token # 1764: 3.681ms; value: next_token_ids=tensor([10602], device='cuda:0') mtp accept=1 prop=10602 top1=10602 accp=0.996 next=pair draft=201 prop=201 pred gate=device Token # 1765: 114.668ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=0.997 next=draft=262 prop=262 olap pair=109.5ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.2ms s1=190.3ms wait=0.1/47.5ms pred gate=device Token # 1766: 3.686ms; value: next_token_ids=tensor([262], device='cuda:0') mtp accept=1 prop=262 top1=262 accp=1.000 next=pair draft=84154 prop=84154 pred gate=device Token # 1767: 114.889ms; value: next_token_ids=tensor([84154], device='cuda:0') mtp accept=1 prop=84154 top1=84154 accp=0.842 next=draft=20 prop=20 olap pair=109.7ms serial=194.6ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.3ms wait=0.1/47.2ms pred gate=device Token # 1768: 3.750ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 1769: 115.063ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=2619 prop=2619 olap pair=109.8ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.6ms wait=0.1/47.4ms pred gate=device Token # 1770: 3.683ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=pair draft=8835 prop=8835 pred gate=device Token # 1771: 114.476ms; value: next_token_ids=tensor([8835], device='cuda:0') mtp accept=1 prop=8835 top1=8835 accp=0.996 next=draft=13 prop=940 olap pair=109.3ms serial=193.9ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.6ms wait=0.1/47.4ms pred gate=device Token # 1772: 3.677ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=0 prop=940 top1=13 accp=0.643 next=pair draft=10124 prop=39732 pred gate=device Token # 1773: 114.771ms; value: next_token_ids=tensor([7989], device='cuda:0') mtp accept=0 prop=39732 top1=10124 accp=0.833 next=draft=14769 prop=14769 olap pair=109.6ms serial=194.3ms gain=84.7ms ratio=0.44 s0=3.8ms s1=190.5ms wait=0.1/48.4ms pred gate=device Token # 1774: 115.018ms; value: next_token_ids=tensor([13276], device='cuda:0') mtp accept=0 prop=14769 top1=13276 accp=0.164 next=draft=666 prop=666 olap pair=109.8ms serial=194.9ms gain=85.2ms ratio=0.44 s0=3.8ms s1=191.1ms wait=0.1/48.5ms pred gate=device Token # 1775: 114.942ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=1237 prop=1237 olap pair=109.6ms serial=194.5ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/47.3ms pred gate=device Token # 1776: 3.721ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=28608 prop=28608 pred gate=device Token # 1777: 114.533ms; value: next_token_ids=tensor([28608], device='cuda:0') mtp accept=1 prop=28608 top1=9721 accp=0.517 next=draft=39 prop=39 olap pair=109.2ms serial=193.6ms gain=84.3ms ratio=0.44 s0=4.0ms s1=189.6ms wait=0.1/48.2ms pred gate=device Token # 1778: 3.735ms; value: next_token_ids=tensor([39], device='cuda:0') mtp accept=1 prop=39 top1=39 accp=1.000 next=pair draft=8842 prop=8842 pred gate=device Token # 1779: 114.659ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=1.000 next=draft=1227 prop=1227 olap pair=109.4ms serial=193.0ms gain=83.6ms ratio=0.43 s0=4.5ms s1=188.5ms wait=0.1/47.2ms pred gate=device Token # 1780: 3.806ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=1 prop=1227 top1=1227 accp=0.987 next=pair draft=7524 prop=7524 pred gate=device Token # 1781: 114.671ms; value: next_token_ids=tensor([7524], device='cuda:0') mtp accept=1 prop=7524 top1=7524 accp=1.000 next=draft=262 prop=262 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.9ms wait=0.1/47.3ms pred gate=device Token # 1782: 3.747ms; value: next_token_ids=tensor([262], device='cuda:0') mtp accept=1 prop=262 top1=262 accp=1.000 next=pair draft=35739 prop=35739 pred gate=device Token # 1783: 115.124ms; value: next_token_ids=tensor([35739], device='cuda:0') mtp accept=1 prop=35739 top1=35739 accp=0.998 next=draft=36490 prop=36490 olap pair=109.9ms serial=195.1ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.8ms wait=0.1/47.3ms pred gate=device Token # 1784: 3.724ms; value: next_token_ids=tensor([36490], device='cuda:0') mtp accept=1 prop=36490 top1=36490 accp=1.000 next=pair draft=201 prop=201 pred gate=device Token # 1785: 114.838ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=1.000 next=draft=262 prop=262 olap pair=109.7ms serial=194.7ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.4ms wait=0.1/47.2ms pred gate=device Token # 1786: 3.744ms; value: next_token_ids=tensor([262], device='cuda:0') mtp accept=1 prop=262 top1=262 accp=1.000 next=pair draft=1823 prop=1823 pred gate=device Token # 1787: 114.824ms; value: next_token_ids=tensor([1823], device='cuda:0') mtp accept=1 prop=1823 top1=1823 accp=1.000 next=draft=223 prop=223 olap pair=109.6ms serial=194.6ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.3ms wait=0.1/47.2ms pred gate=device Token # 1788: 3.755ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.960 next=pair draft=8040 prop=8040 pred gate=device Token # 1789: 114.817ms; value: next_token_ids=tensor([8040], device='cuda:0') mtp accept=1 prop=8040 top1=8040 accp=1.000 next=draft=768 prop=768 olap pair=109.6ms serial=194.5ms gain=84.9ms ratio=0.44 s0=4.2ms s1=190.3ms wait=0.1/47.7ms pred gate=device Token # 1790: 3.665ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=48076 prop=48076 pred gate=device Token # 1791: 114.182ms; value: next_token_ids=tensor([48076], device='cuda:0') mtp accept=1 prop=48076 top1=48076 accp=0.996 next=draft=829 prop=829 olap pair=109.0ms serial=193.3ms gain=84.3ms ratio=0.44 s0=4.3ms s1=189.0ms wait=0.1/47.4ms pred gate=device Token # 1792: 3.767ms; value: next_token_ids=tensor([829], device='cuda:0') mtp accept=1 prop=829 top1=829 accp=1.000 next=pair draft=1985 prop=1985 pred gate=device Token # 1793: 114.339ms; value: next_token_ids=tensor([1985], device='cuda:0') mtp accept=1 prop=1985 top1=1985 accp=1.000 next=draft=223 prop=223 olap pair=109.2ms serial=193.7ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.4ms wait=0.1/47.3ms pred gate=device Token # 1794: 3.741ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.999 next=pair draft=26 prop=26 pred gate=device Token # 1795: 114.394ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=draft=90 prop=90 olap pair=109.2ms serial=193.7ms gain=84.5ms ratio=0.44 s0=4.2ms s1=189.5ms wait=0.1/47.4ms pred gate=device Token # 1796: 3.769ms; value: next_token_ids=tensor([90], device='cuda:0') mtp accept=1 prop=90 top1=90 accp=1.000 next=pair draft=25 prop=25 pred gate=device Token # 1797: 114.572ms; value: next_token_ids=tensor([25], device='cuda:0') mtp accept=1 prop=25 top1=25 accp=1.000 next=draft=36 prop=36 olap pair=109.3ms serial=194.1ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.8ms wait=0.1/47.2ms pred gate=device Token # 1798: 3.720ms; value: next_token_ids=tensor([36], device='cuda:0') mtp accept=1 prop=36 top1=36 accp=1.000 next=pair draft=8842 prop=8842 pred gate=device Token # 1799: 114.649ms; value: next_token_ids=tensor([2541], device='cuda:0') mtp accept=0 prop=8842 top1=2541 accp=0.444 next=draft=7989 prop=7989 olap pair=109.5ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.1ms wait=0.1/47.3ms pred gate=device Token # 1800: 116.644ms; value: next_token_ids=tensor([7989], device='cuda:0') mtp accept=1 prop=7989 top1=7989 accp=0.786 next=draft=31 prop=31 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.2ms s1=190.0ms wait=0.1/47.3ms pred gate=device Token # 1801: 3.726ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=26 prop=26 pred gate=device Token # 1802: 115.280ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=draft=1237 prop=14 olap pair=110.1ms serial=194.7ms gain=84.6ms ratio=0.43 s0=4.9ms s1=189.8ms wait=0.1/46.5ms pred gate=device Token # 1803: 3.728ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=303 accp=0.232 next=pair draft=48159 prop=48159 pred gate=device Token # 1804: 118.196ms; value: next_token_ids=tensor([48159], device='cuda:0') mtp accept=1 prop=48159 top1=48159 accp=1.000 next=draft=31 prop=31 olap pair=109.6ms serial=193.8ms gain=84.2ms ratio=0.43 s0=6.2ms s1=187.6ms wait=0.2/45.1ms pred gate=device Token # 1805: 3.905ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 1806: 114.857ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=0.851 next=draft=201 prop=201 olap pair=109.7ms serial=194.7ms gain=85.0ms ratio=0.44 s0=3.9ms s1=190.7ms wait=0.1/48.3ms pred gate=device Token # 1807: 3.708ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=0.982 next=pair draft=262 prop=262 pred gate=device Token # 1808: 114.593ms; value: next_token_ids=tensor([262], device='cuda:0') mtp accept=1 prop=262 top1=262 accp=1.000 next=draft=1823 prop=1823 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.9ms wait=0.1/47.3ms pred gate=device Token # 1809: 3.742ms; value: next_token_ids=tensor([1823], device='cuda:0') mtp accept=1 prop=1823 top1=1823 accp=0.996 next=pair draft=223 prop=223 pred gate=device Token # 1810: 114.735ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=7849 prop=7849 olap pair=109.6ms serial=194.4ms gain=84.8ms ratio=0.44 s0=4.2ms s1=190.2ms wait=0.1/47.5ms pred gate=device Token # 1811: 3.700ms; value: next_token_ids=tensor([7849], device='cuda:0') mtp accept=1 prop=7849 top1=7849 accp=0.978 next=pair draft=33912 prop=33912 pred gate=device Token # 1812: 114.502ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=0 prop=33912 top1=10756 accp=0.135 next=draft=12072 prop=119545 olap pair=109.4ms serial=194.0ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.7ms wait=0.1/47.4ms pred gate=device Token # 1813: 114.678ms; value: next_token_ids=tensor([68160], device='cuda:0') mtp accept=0 prop=119545 top1=68160 accp=0.300 next=draft=3007 prop=3007 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.0ms s1=190.3ms wait=0.1/47.9ms pred gate=device Token # 1814: 114.790ms; value: next_token_ids=tensor([3007], device='cuda:0') mtp accept=1 prop=3007 top1=3007 accp=0.876 next=draft=6034 prop=6034 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.8ms wait=0.1/48.6ms pred gate=device Token # 1815: 3.703ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=0.993 next=pair draft=303 prop=303 pred gate=device Token # 1816: 114.434ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=8835 prop=8835 olap pair=109.3ms serial=194.0ms gain=84.8ms ratio=0.44 s0=3.7ms s1=190.4ms wait=0.1/48.7ms pred gate=device Token # 1817: 3.711ms; value: next_token_ids=tensor([1833], device='cuda:0') mtp accept=0 prop=8835 top1=7849 accp=0.013 next=pair draft=2827 prop=2827 pred gate=device Token # 1818: 114.417ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=0 prop=2827 top1=6034 accp=0.036 next=draft=1176 prop=1176 olap pair=109.2ms serial=193.9ms gain=84.7ms ratio=0.44 s0=3.7ms s1=190.2ms wait=0.1/48.7ms pred gate=device Token # 1819: 115.193ms; value: next_token_ids=tensor([642], device='cuda:0') mtp accept=0 prop=1176 top1=642 accp=0.078 next=draft=8835 prop=8835 olap pair=109.9ms serial=195.5ms gain=85.6ms ratio=0.44 s0=3.7ms s1=191.8ms wait=0.1/48.6ms pred gate=device Token # 1820: 114.549ms; value: next_token_ids=tensor([8835], device='cuda:0') mtp accept=1 prop=8835 top1=8835 accp=1.000 next=draft=6410 prop=18617 olap pair=109.2ms serial=193.9ms gain=84.7ms ratio=0.44 s0=3.7ms s1=190.2ms wait=0.1/48.6ms pred gate=device Token # 1821: 3.683ms; value: next_token_ids=tensor([6410], device='cuda:0') mtp accept=0 prop=18617 top1=6410 accp=0.826 next=pair draft=2693 prop=2693 pred gate=device Token # 1822: 115.373ms; value: next_token_ids=tensor([2693], device='cuda:0') mtp accept=1 prop=2693 top1=2693 accp=0.987 next=draft=751 prop=751 olap pair=110.0ms serial=195.2ms gain=85.2ms ratio=0.44 s0=4.5ms s1=190.7ms wait=0.1/47.5ms pred gate=device Token # 1823: 3.770ms; value: next_token_ids=tensor([751], device='cuda:0') mtp accept=1 prop=751 top1=751 accp=1.000 next=pair draft=201 prop=201 pred gate=device Token # 1824: 115.076ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=1.000 next=draft=262 prop=262 olap pair=109.9ms serial=195.1ms gain=85.2ms ratio=0.44 s0=3.7ms s1=191.4ms wait=0.1/48.6ms pred gate=device Token # 1825: 3.776ms; value: next_token_ids=tensor([262], device='cuda:0') mtp accept=1 prop=262 top1=262 accp=1.000 next=pair draft=84154 prop=84154 pred gate=device Token # 1826: 115.725ms; value: next_token_ids=tensor([84154], device='cuda:0') mtp accept=1 prop=84154 top1=84154 accp=1.000 next=draft=21 prop=21 olap pair=109.7ms serial=193.8ms gain=84.0ms ratio=0.43 s0=8.8ms s1=185.0ms wait=0.2/42.3ms pred gate=device Token # 1827: 4.695ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=0.999 next=pair draft=16 prop=16 pred gate=device Token # 1828: 115.354ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=2619 prop=2619 olap pair=110.0ms serial=194.7ms gain=84.7ms ratio=0.44 s0=6.9ms s1=187.8ms wait=0.2/44.7ms pred gate=device Token # 1829: 3.732ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=pair draft=8835 prop=8835 pred gate=device Token # 1830: 115.024ms; value: next_token_ids=tensor([8835], device='cuda:0') mtp accept=1 prop=8835 top1=8835 accp=0.831 next=draft=13 prop=13 olap pair=109.9ms serial=195.2ms gain=85.4ms ratio=0.44 s0=3.7ms s1=191.5ms wait=0.1/48.7ms pred gate=device Token # 1831: 3.712ms; value: next_token_ids=tensor([13], device='cuda:0') mtp accept=1 prop=13 top1=13 accp=1.000 next=pair draft=39732 prop=10124 pred gate=device Token # 1832: 114.558ms; value: next_token_ids=tensor([10124], device='cuda:0') mtp accept=1 prop=10124 top1=10124 accp=0.254 next=draft=13276 prop=13276 olap pair=109.4ms serial=194.4ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.7ms wait=0.1/48.6ms pred gate=device Token # 1833: 3.691ms; value: next_token_ids=tensor([13276], device='cuda:0') mtp accept=1 prop=13276 top1=13276 accp=0.529 next=pair draft=666 prop=666 pred gate=device Token # 1834: 114.682ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=1237 prop=1237 olap pair=109.4ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.1ms s1=190.2ms wait=0.1/47.9ms pred gate=device Token # 1835: 3.731ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=40160 prop=40160 pred gate=device Token # 1836: 114.780ms; value: next_token_ids=tensor([40160], device='cuda:0') mtp accept=1 prop=40160 top1=3185 accp=0.458 next=draft=2782 prop=2782 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.9ms wait=0.1/48.8ms pred gate=device Token # 1837: 3.790ms; value: next_token_ids=tensor([547], device='cuda:0') mtp accept=0 prop=2782 top1=547 accp=0.007 next=pair draft=8842 prop=8842 pred gate=device Token # 1838: 116.049ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=1.000 next=draft=1227 prop=1227 olap pair=109.9ms serial=194.7ms gain=84.7ms ratio=0.44 s0=6.0ms s1=188.7ms wait=0.2/45.8ms pred gate=device Token # 1839: 3.949ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=1 prop=1227 top1=1227 accp=1.000 next=pair draft=7524 prop=7524 pred gate=device Token # 1840: 114.961ms; value: next_token_ids=tensor([7524], device='cuda:0') mtp accept=1 prop=7524 top1=7524 accp=1.000 next=draft=262 prop=262 olap pair=109.8ms serial=194.8ms gain=85.0ms ratio=0.44 s0=4.4ms s1=190.4ms wait=0.2/47.0ms pred gate=device Token # 1841: 3.727ms; value: next_token_ids=tensor([262], device='cuda:0') mtp accept=1 prop=262 top1=262 accp=1.000 next=pair draft=35739 prop=35739 pred gate=device Token # 1842: 114.691ms; value: next_token_ids=tensor([35739], device='cuda:0') mtp accept=1 prop=35739 top1=35739 accp=1.000 next=draft=36490 prop=36490 olap pair=109.5ms serial=194.4ms gain=84.8ms ratio=0.44 s0=4.3ms s1=190.0ms wait=0.1/47.1ms pred gate=device Token # 1843: 3.725ms; value: next_token_ids=tensor([36490], device='cuda:0') mtp accept=1 prop=36490 top1=36490 accp=1.000 next=pair draft=201 prop=201 pred gate=device Token # 1844: 114.661ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=1.000 next=draft=262 prop=262 olap pair=109.5ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.2ms s1=190.1ms wait=0.1/47.5ms pred gate=device Token # 1845: 3.702ms; value: next_token_ids=tensor([262], device='cuda:0') mtp accept=1 prop=262 top1=262 accp=1.000 next=pair draft=1823 prop=1823 pred gate=device Token # 1846: 114.999ms; value: next_token_ids=tensor([1823], device='cuda:0') mtp accept=1 prop=1823 top1=1823 accp=1.000 next=draft=223 prop=223 olap pair=109.9ms serial=195.0ms gain=85.1ms ratio=0.44 s0=4.2ms s1=190.8ms wait=0.1/47.4ms pred gate=device Token # 1847: 3.721ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=3440 prop=98181 pred gate=device Token # 1848: 114.784ms; value: next_token_ids=tensor([98181], device='cuda:0') mtp accept=1 prop=98181 top1=3440 accp=0.630 next=draft=2382 prop=2382 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=3.9ms s1=190.8ms wait=0.1/48.3ms pred gate=device Token # 1849: 3.707ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=0 prop=2382 top1=8842 accp=0.005 next=pair draft=6525 prop=6525 pred gate=device Token # 1850: 114.576ms; value: next_token_ids=tensor([35987], device='cuda:0') mtp accept=0 prop=6525 top1=38763 accp=0.347 next=draft=6525 prop=6525 olap pair=109.4ms serial=194.1ms gain=84.7ms ratio=0.44 s0=4.4ms s1=189.8ms wait=0.1/47.3ms pred gate=device Token # 1851: 114.985ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=6525 top1=303 accp=0.266 next=draft=8835 prop=8835 olap pair=109.8ms serial=194.9ms gain=85.1ms ratio=0.44 s0=4.8ms s1=190.1ms wait=0.1/46.7ms pred gate=device Token # 1852: 114.797ms; value: next_token_ids=tensor([8835], device='cuda:0') mtp accept=1 prop=8835 top1=8835 accp=0.982 next=draft=6525 prop=6525 olap pair=109.6ms serial=194.4ms gain=84.8ms ratio=0.44 s0=4.7ms s1=189.7ms wait=0.1/46.7ms pred gate=device Token # 1853: 3.673ms; value: next_token_ids=tensor([1746], device='cuda:0') mtp accept=0 prop=6525 top1=1746 accp=0.151 next=pair draft=40871 prop=40871 pred gate=device Token # 1854: 114.407ms; value: next_token_ids=tensor([40871], device='cuda:0') mtp accept=1 prop=40871 top1=40871 accp=0.683 next=draft=1644 prop=1644 olap pair=109.1ms serial=193.6ms gain=84.4ms ratio=0.44 s0=4.7ms s1=188.9ms wait=0.1/46.8ms pred gate=device Token # 1855: 3.699ms; value: next_token_ids=tensor([1644], device='cuda:0') mtp accept=1 prop=1644 top1=1644 accp=0.997 next=pair draft=4289 prop=4289 pred gate=device Token # 1856: 114.697ms; value: next_token_ids=tensor([2827], device='cuda:0') mtp accept=0 prop=4289 top1=2827 accp=0.000 next=draft=26127 prop=26127 olap pair=109.5ms serial=194.4ms gain=84.8ms ratio=0.44 s0=4.7ms s1=189.7ms wait=0.1/46.6ms pred gate=device Token # 1857: 115.063ms; value: next_token_ids=tensor([26127], device='cuda:0') mtp accept=1 prop=26127 top1=26127 accp=0.873 next=draft=625 prop=625 olap pair=109.8ms serial=194.9ms gain=85.1ms ratio=0.44 s0=4.7ms s1=190.2ms wait=0.1/46.6ms pred gate=device Token # 1858: 3.681ms; value: next_token_ids=tensor([625], device='cuda:0') mtp accept=1 prop=625 top1=625 accp=0.618 next=pair draft=2541 prop=2541 pred gate=device Token # 1859: 115.138ms; value: next_token_ids=tensor([2541], device='cuda:0') mtp accept=1 prop=2541 top1=2541 accp=0.695 next=draft=201 prop=201 olap pair=109.9ms serial=195.2ms gain=85.3ms ratio=0.44 s0=4.4ms s1=190.9ms wait=0.1/47.3ms pred gate=device Token # 1860: 3.697ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=1.000 next=pair draft=262 prop=262 pred gate=device Token # 1861: 114.650ms; value: next_token_ids=tensor([262], device='cuda:0') mtp accept=1 prop=262 top1=262 accp=1.000 next=draft=1823 prop=1823 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=3.7ms s1=190.8ms wait=0.1/48.7ms pred gate=device Token # 1862: 3.677ms; value: next_token_ids=tensor([1823], device='cuda:0') mtp accept=1 prop=1823 top1=1823 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1863: 114.895ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=8040 prop=8040 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=3.7ms s1=191.1ms wait=0.1/48.7ms pred gate=device Token # 1864: 3.711ms; value: next_token_ids=tensor([8040], device='cuda:0') mtp accept=1 prop=8040 top1=8040 accp=0.824 next=pair draft=768 prop=768 pred gate=device Token # 1865: 114.983ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=20323 prop=38523 olap pair=109.8ms serial=194.9ms gain=85.1ms ratio=0.44 s0=4.7ms s1=190.2ms wait=0.1/46.6ms pred gate=device Token # 1866: 3.786ms; value: next_token_ids=tensor([20323], device='cuda:0') mtp accept=0 prop=38523 top1=20323 accp=0.782 next=pair draft=36 prop=36 pred gate=device Token # 1867: 114.760ms; value: next_token_ids=tensor([36], device='cuda:0') mtp accept=1 prop=36 top1=36 accp=1.000 next=draft=8842 prop=8842 olap pair=109.5ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.6ms s1=189.7ms wait=0.1/46.8ms pred gate=device Token # 1868: 3.700ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=1.000 next=pair draft=303 prop=642 pred gate=device Token # 1869: 114.596ms; value: next_token_ids=tensor([642], device='cuda:0') mtp accept=1 prop=642 top1=642 accp=0.575 next=draft=8835 prop=48159 olap pair=109.4ms serial=194.2ms gain=84.7ms ratio=0.44 s0=4.8ms s1=189.4ms wait=0.1/46.5ms pred gate=device Token # 1870: 3.772ms; value: next_token_ids=tensor([10124], device='cuda:0') mtp accept=0 prop=48159 top1=8835 accp=0.658 next=pair draft=31 prop=31 pred gate=device Token # 1871: 114.681ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=20 prop=20 olap pair=109.5ms serial=194.2ms gain=84.7ms ratio=0.44 s0=4.7ms s1=189.5ms wait=0.1/46.6ms pred gate=device Token # 1872: 3.716ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=0 prop=20 top1=22 accp=0.595 next=pair draft=14 prop=14 pred gate=device Token # 1873: 115.225ms; value: next_token_ids=tensor([14], device='cuda:0') mtp accept=1 prop=14 top1=14 accp=1.000 next=draft=48159 prop=48159 olap pair=109.9ms serial=195.4ms gain=85.5ms ratio=0.44 s0=3.9ms s1=191.5ms wait=0.1/48.3ms pred gate=device Token # 1874: 3.724ms; value: next_token_ids=tensor([48159], device='cuda:0') mtp accept=1 prop=48159 top1=48159 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1875: 114.678ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=26 prop=26 olap pair=109.5ms serial=194.6ms gain=85.1ms ratio=0.44 s0=3.7ms s1=190.9ms wait=0.1/48.7ms pred gate=device Token # 1876: 3.706ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=pair draft=201 prop=201 pred gate=device Token # 1877: 115.076ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=0.992 next=draft=262 prop=262 olap pair=109.9ms serial=195.0ms gain=85.1ms ratio=0.44 s0=4.8ms s1=190.2ms wait=0.1/46.6ms pred gate=device Token # 1878: 3.710ms; value: next_token_ids=tensor([262], device='cuda:0') mtp accept=1 prop=262 top1=262 accp=1.000 next=pair draft=1823 prop=1823 pred gate=device Token # 1879: 114.735ms; value: next_token_ids=tensor([1823], device='cuda:0') mtp accept=1 prop=1823 top1=1823 accp=0.781 next=draft=223 prop=223 olap pair=109.5ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.7ms s1=189.6ms wait=0.1/46.5ms pred gate=device Token # 1880: 3.705ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=1207 prop=1207 pred gate=device Token # 1881: 115.210ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=1 prop=1207 top1=1207 accp=0.978 next=draft=2382 prop=2382 olap pair=110.0ms serial=194.8ms gain=84.8ms ratio=0.44 s0=4.6ms s1=190.2ms wait=0.1/47.0ms pred gate=device Token # 1882: 3.782ms; value: next_token_ids=tensor([2382], device='cuda:0') mtp accept=1 prop=2382 top1=2382 accp=0.821 next=pair draft=92 prop=92 pred gate=device Token # 1883: 114.326ms; value: next_token_ids=tensor([92], device='cuda:0') mtp accept=1 prop=92 top1=92 accp=1.000 next=draft=31 prop=31 olap pair=109.1ms serial=193.8ms gain=84.7ms ratio=0.44 s0=4.0ms s1=189.8ms wait=0.1/48.1ms pred gate=device Token # 1884: 3.754ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 1885: 114.854ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=625 prop=625 olap pair=109.7ms serial=194.7ms gain=85.0ms ratio=0.44 s0=3.8ms s1=190.8ms wait=0.1/48.4ms pred gate=device Token # 1886: 3.699ms; value: next_token_ids=tensor([625], device='cuda:0') mtp accept=1 prop=625 top1=625 accp=1.000 next=pair draft=10124 prop=10124 pred gate=device Token # 1887: 115.091ms; value: next_token_ids=tensor([10124], device='cuda:0') mtp accept=1 prop=10124 top1=10124 accp=0.986 next=draft=10730 prop=10730 olap pair=109.9ms serial=194.8ms gain=84.9ms ratio=0.44 s0=5.3ms s1=189.5ms wait=0.2/46.6ms pred gate=device Token # 1888: 3.749ms; value: next_token_ids=tensor([10730], device='cuda:0') mtp accept=1 prop=10730 top1=10730 accp=0.925 next=pair draft=2577 prop=2577 pred gate=device Token # 1889: 115.819ms; value: next_token_ids=tensor([2056], device='cuda:0') mtp accept=0 prop=2577 top1=2056 accp=0.074 next=draft=13207 prop=1714 olap pair=109.9ms serial=194.7ms gain=84.8ms ratio=0.44 s0=6.0ms s1=188.7ms wait=0.2/45.8ms pred gate=device Token # 1890: 115.148ms; value: next_token_ids=tensor([127314], device='cuda:0') mtp accept=0 prop=1714 top1=127314 accp=0.355 next=draft=7157 prop=7157 olap pair=109.7ms serial=194.7ms gain=85.0ms ratio=0.44 s0=4.5ms s1=190.1ms wait=0.1/47.0ms pred gate=device Token # 1891: 115.299ms; value: next_token_ids=tensor([7157], device='cuda:0') mtp accept=1 prop=7157 top1=7157 accp=0.673 next=draft=201 prop=201 olap pair=110.1ms serial=195.4ms gain=85.4ms ratio=0.44 s0=4.2ms s1=191.3ms wait=0.1/47.8ms pred gate=device Token # 1892: 3.849ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=1.000 next=pair draft=262 prop=262 pred gate=device Token # 1893: 114.566ms; value: next_token_ids=tensor([262], device='cuda:0') mtp accept=1 prop=262 top1=262 accp=1.000 next=draft=84154 prop=84154 olap pair=109.4ms serial=194.0ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.7ms wait=0.1/47.6ms pred gate=device Token # 1894: 3.684ms; value: next_token_ids=tensor([84154], device='cuda:0') mtp accept=1 prop=84154 top1=84154 accp=1.000 next=pair draft=372 prop=372 pred gate=device Token # 1895: 114.402ms; value: next_token_ids=tensor([372], device='cuda:0') mtp accept=1 prop=372 top1=372 accp=0.995 next=draft=223 prop=223 olap pair=109.2ms serial=193.7ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.4ms wait=0.1/47.3ms pred gate=device Token # 1896: 3.842ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.996 next=pair draft=1131 prop=1131 pred gate=device Token # 1897: 114.485ms; value: next_token_ids=tensor([1131], device='cuda:0') mtp accept=1 prop=1131 top1=1131 accp=1.000 next=draft=410 prop=410 olap pair=109.3ms serial=193.8ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.4ms wait=0.1/47.4ms pred gate=device Token # 1898: 3.702ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=1.000 next=pair draft=4899 prop=4899 pred gate=device Token # 1899: 114.799ms; value: next_token_ids=tensor([4899], device='cuda:0') mtp accept=1 prop=4899 top1=2382 accp=0.427 next=draft=15133 prop=15133 olap pair=109.6ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/47.3ms pred gate=device Token # 1900: 3.714ms; value: next_token_ids=tensor([15133], device='cuda:0') mtp accept=1 prop=15133 top1=15133 accp=1.000 next=pair draft=7157 prop=78938 pred gate=device Token # 1901: 115.327ms; value: next_token_ids=tensor([78938], device='cuda:0') mtp accept=1 prop=78938 top1=7157 accp=0.814 next=draft=271 prop=271 olap pair=110.1ms serial=195.3ms gain=85.2ms ratio=0.44 s0=4.4ms s1=190.9ms wait=0.1/47.0ms pred gate=device Token # 1902: 3.801ms; value: next_token_ids=tensor([7137], device='cuda:0') mtp accept=0 prop=271 top1=271 accp=0.549 next=pair draft=271 prop=271 pred gate=device Token # 1903: 114.946ms; value: next_token_ids=tensor([271], device='cuda:0') mtp accept=1 prop=271 top1=271 accp=1.000 next=draft=795 prop=795 olap pair=109.7ms serial=194.6ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.3ms wait=0.1/47.2ms pred gate=device Token # 1904: 3.751ms; value: next_token_ids=tensor([795], device='cuda:0') mtp accept=1 prop=795 top1=795 accp=1.000 next=pair draft=2619 prop=2619 pred gate=device Token # 1905: 115.019ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.992 next=draft=16303 prop=26127 olap pair=109.8ms serial=195.0ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.7ms wait=0.1/47.3ms pred gate=device Token # 1906: 3.688ms; value: next_token_ids=tensor([26127], device='cuda:0') mtp accept=1 prop=26127 top1=16303 accp=0.584 next=pair draft=42829 prop=947 pred gate=device Token # 1907: 115.124ms; value: next_token_ids=tensor([3803], device='cuda:0') mtp accept=0 prop=947 top1=8062 accp=0.205 next=draft=666 prop=666 olap pair=110.0ms serial=194.6ms gain=84.6ms ratio=0.43 s0=4.4ms s1=190.1ms wait=0.1/47.3ms pred gate=device Token # 1908: 114.735ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=1237 prop=1237 olap pair=109.5ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.3ms s1=190.0ms wait=0.1/47.4ms pred gate=device Token # 1909: 3.739ms; value: next_token_ids=tensor([7524], device='cuda:0') mtp accept=0 prop=1237 top1=7524 accp=0.336 next=pair draft=15 prop=15 pred gate=device Token # 1910: 114.762ms; value: next_token_ids=tensor([28986], device='cuda:0') mtp accept=0 prop=15 top1=28986 accp=0.147 next=draft=2382 prop=2382 olap pair=109.6ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/47.2ms pred gate=device Token # 1911: 115.679ms; value: next_token_ids=tensor([2382], device='cuda:0') mtp accept=1 prop=2382 top1=1860 accp=0.207 next=draft=92 prop=92 olap pair=110.4ms serial=195.4ms gain=85.0ms ratio=0.43 s0=4.4ms s1=191.0ms wait=0.1/47.3ms pred gate=device Token # 1912: 3.665ms; value: next_token_ids=tensor([92], device='cuda:0') mtp accept=1 prop=92 top1=92 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1913: 115.274ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=19 prop=19 olap pair=110.0ms serial=195.2ms gain=85.3ms ratio=0.44 s0=4.2ms s1=191.0ms wait=0.1/47.5ms pred gate=device Token # 1914: 3.731ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=36101 prop=36101 pred gate=device Token # 1915: 114.584ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=0.907 next=draft=625 prop=625 olap pair=109.5ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.1ms wait=0.1/47.4ms pred gate=device Token # 1916: 3.717ms; value: next_token_ids=tensor([26127], device='cuda:0') mtp accept=0 prop=625 top1=26127 accp=0.036 next=pair draft=6561 prop=6561 pred gate=device Token # 1917: 114.575ms; value: next_token_ids=tensor([35015], device='cuda:0') mtp accept=0 prop=6561 top1=35015 accp=0.006 next=draft=223 prop=223 olap pair=109.3ms serial=193.8ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.5ms wait=0.1/47.3ms pred gate=device Token # 1918: 115.080ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=8842 prop=8842 olap pair=109.8ms serial=194.8ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.5ms wait=0.1/47.4ms pred gate=device Token # 1919: 3.700ms; value: next_token_ids=tensor([10602], device='cuda:0') mtp accept=0 prop=8842 top1=10602 accp=0.035 next=pair draft=26127 prop=26127 pred gate=device Token # 1920: 114.584ms; value: next_token_ids=tensor([26127], device='cuda:0') mtp accept=1 prop=26127 top1=26127 accp=0.996 next=draft=940 prop=940 olap pair=109.3ms serial=193.8ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.5ms wait=0.1/47.3ms pred gate=device Token # 1921: 3.692ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1922: 115.016ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.988 next=draft=34864 prop=34864 olap pair=109.8ms serial=195.0ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.7ms wait=0.1/47.5ms pred gate=device Token # 1923: 3.728ms; value: next_token_ids=tensor([34864], device='cuda:0') mtp accept=1 prop=34864 top1=34864 accp=1.000 next=pair draft=2022 prop=2022 pred gate=device Token # 1924: 115.538ms; value: next_token_ids=tensor([2022], device='cuda:0') mtp accept=1 prop=2022 top1=2022 accp=0.894 next=draft=26127 prop=26127 olap pair=110.2ms serial=195.8ms gain=85.6ms ratio=0.44 s0=4.3ms s1=191.6ms wait=0.1/47.4ms pred gate=device Token # 1925: 3.633ms; value: next_token_ids=tensor([26127], device='cuda:0') mtp accept=1 prop=26127 top1=26127 accp=1.000 next=pair draft=940 prop=940 pred gate=device Token # 1926: 115.083ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=0.942 next=draft=126664 prop=126664 olap pair=109.9ms serial=195.1ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.8ms wait=0.1/47.4ms pred gate=device Token # 1927: 3.740ms; value: next_token_ids=tensor([126664], device='cuda:0') mtp accept=1 prop=126664 top1=126664 accp=0.837 next=pair draft=58603 prop=58603 pred gate=device Token # 1928: 114.940ms; value: next_token_ids=tensor([58603], device='cuda:0') mtp accept=1 prop=58603 top1=58603 accp=0.965 next=draft=201 prop=201 olap pair=109.7ms serial=194.7ms gain=85.1ms ratio=0.44 s0=3.9ms s1=190.8ms wait=0.1/48.3ms pred gate=device Token # 1929: 3.702ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=0.999 next=pair draft=20759 prop=15 pred gate=device Token # 1930: 114.633ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=0 prop=15 top1=4569 accp=0.233 next=draft=16 prop=16 olap pair=109.4ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.0ms s1=190.3ms wait=0.1/48.3ms pred gate=device Token # 1931: 115.062ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=48159 prop=48159 olap pair=109.8ms serial=195.0ms gain=85.2ms ratio=0.44 s0=4.2ms s1=190.8ms wait=0.1/47.2ms pred gate=device Token # 1932: 3.715ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=48159 top1=223 accp=0.088 next=pair draft=10602 prop=10602 pred gate=device Token # 1933: 114.949ms; value: next_token_ids=tensor([10602], device='cuda:0') mtp accept=1 prop=10602 top1=10602 accp=0.999 next=draft=26127 prop=26127 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.5ms wait=0.1/47.3ms pred gate=device Token # 1934: 3.703ms; value: next_token_ids=tensor([26127], device='cuda:0') mtp accept=1 prop=26127 top1=26127 accp=1.000 next=pair draft=768 prop=768 pred gate=device Token # 1935: 114.252ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=51799 prop=2490 olap pair=109.1ms serial=193.6ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.3ms wait=0.1/47.3ms pred gate=device Token # 1936: 3.685ms; value: next_token_ids=tensor([51799], device='cuda:0') mtp accept=0 prop=2490 top1=51799 accp=0.683 next=pair draft=8835 prop=8835 pred gate=device Token # 1937: 114.580ms; value: next_token_ids=tensor([8835], device='cuda:0') mtp accept=1 prop=8835 top1=8835 accp=0.890 next=draft=410 prop=410 olap pair=109.4ms serial=194.0ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.8ms wait=0.1/47.2ms pred gate=device Token # 1938: 3.685ms; value: next_token_ids=tensor([17], device='cuda:0') mtp accept=0 prop=410 top1=8009 accp=0.167 next=pair draft=10124 prop=8835 pred gate=device Token # 1939: 114.503ms; value: next_token_ids=tensor([10124], device='cuda:0') mtp accept=0 prop=8835 top1=10124 accp=0.601 next=draft=8009 prop=8964 olap pair=109.2ms serial=193.9ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.6ms wait=0.1/47.2ms pred gate=device Token # 1940: 114.782ms; value: next_token_ids=tensor([8009], device='cuda:0') mtp accept=0 prop=8964 top1=8009 accp=0.553 next=draft=201 prop=201 olap pair=109.5ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.2ms s1=190.1ms wait=0.1/47.3ms pred gate=device Token # 1941: 115.098ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=1.000 next=draft=20 prop=20 olap pair=109.9ms serial=195.0ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.8ms wait=0.1/47.3ms pred gate=device Token # 1942: 3.767ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 1943: 114.993ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=126664 prop=126664 olap pair=109.8ms serial=194.9ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.6ms wait=0.1/47.4ms pred gate=device Token # 1944: 3.706ms; value: next_token_ids=tensor([126664], device='cuda:0') mtp accept=1 prop=126664 top1=126664 accp=0.894 next=pair draft=58603 prop=58603 pred gate=device Token # 1945: 114.347ms; value: next_token_ids=tensor([58603], device='cuda:0') mtp accept=1 prop=58603 top1=58603 accp=1.000 next=draft=768 prop=768 olap pair=109.1ms serial=193.7ms gain=84.5ms ratio=0.44 s0=4.2ms s1=189.4ms wait=0.1/47.4ms pred gate=device Token # 1946: 3.667ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=2382 prop=2382 pred gate=device Token # 1947: 114.902ms; value: next_token_ids=tensor([2382], device='cuda:0') mtp accept=1 prop=2382 top1=2382 accp=0.997 next=draft=92 prop=92 olap pair=109.7ms serial=194.7ms gain=85.0ms ratio=0.44 s0=4.2ms s1=190.5ms wait=0.1/47.6ms pred gate=device Token # 1948: 3.677ms; value: next_token_ids=tensor([92], device='cuda:0') mtp accept=1 prop=92 top1=92 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1949: 115.084ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=0.997 next=draft=19 prop=19 olap pair=109.8ms serial=195.0ms gain=85.1ms ratio=0.44 s0=4.2ms s1=190.7ms wait=0.1/47.4ms pred gate=device Token # 1950: 3.714ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=625 prop=625 pred gate=device Token # 1951: 114.703ms; value: next_token_ids=tensor([625], device='cuda:0') mtp accept=1 prop=625 top1=625 accp=1.000 next=draft=39133 prop=39133 olap pair=109.6ms serial=194.5ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/47.4ms pred gate=device Token # 1952: 3.700ms; value: next_token_ids=tensor([39133], device='cuda:0') mtp accept=1 prop=39133 top1=39133 accp=0.995 next=pair draft=303 prop=303 pred gate=device Token # 1953: 114.442ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=1207 prop=1207 olap pair=109.3ms serial=194.0ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.7ms wait=0.1/47.2ms pred gate=device Token # 1954: 3.670ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=1 prop=1207 top1=1207 accp=0.972 next=pair draft=1714 prop=1714 pred gate=device Token # 1955: 114.840ms; value: next_token_ids=tensor([22089], device='cuda:0') mtp accept=0 prop=1714 top1=22089 accp=0.023 next=draft=15991 prop=15991 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.5ms wait=0.1/47.4ms pred gate=device Token # 1956: 115.272ms; value: next_token_ids=tensor([15991], device='cuda:0') mtp accept=1 prop=15991 top1=15991 accp=0.543 next=draft=3412 prop=3412 olap pair=110.1ms serial=195.5ms gain=85.4ms ratio=0.44 s0=4.3ms s1=191.2ms wait=0.1/47.2ms pred gate=device Token # 1957: 3.681ms; value: next_token_ids=tensor([3412], device='cuda:0') mtp accept=1 prop=3412 top1=3412 accp=0.864 next=pair draft=547 prop=547 pred gate=device Token # 1958: 115.082ms; value: next_token_ids=tensor([547], device='cuda:0') mtp accept=1 prop=547 top1=547 accp=0.981 next=draft=201 prop=201 olap pair=109.9ms serial=195.1ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.9ms wait=0.1/47.3ms pred gate=device Token # 1959: 3.699ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=1.000 next=pair draft=21 prop=21 pred gate=device Token # 1960: 115.279ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=0.996 next=draft=16 prop=16 olap pair=110.0ms serial=195.4ms gain=85.4ms ratio=0.44 s0=4.2ms s1=191.2ms wait=0.1/47.3ms pred gate=device Token # 1961: 3.721ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 1962: 114.660ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=34864 prop=34864 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.2ms s1=190.0ms wait=0.1/47.3ms pred gate=device Token # 1963: 3.734ms; value: next_token_ids=tensor([34864], device='cuda:0') mtp accept=1 prop=34864 top1=34864 accp=1.000 next=pair draft=2022 prop=2022 pred gate=device Token # 1964: 114.962ms; value: next_token_ids=tensor([2022], device='cuda:0') mtp accept=1 prop=2022 top1=2022 accp=1.000 next=draft=768 prop=768 olap pair=109.7ms serial=194.7ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.5ms wait=0.1/47.5ms pred gate=device Token # 1965: 3.696ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=2382 prop=2382 pred gate=device Token # 1966: 115.258ms; value: next_token_ids=tensor([2382], device='cuda:0') mtp accept=1 prop=2382 top1=2382 accp=0.967 next=draft=92 prop=92 olap pair=110.1ms serial=195.6ms gain=85.5ms ratio=0.44 s0=3.9ms s1=191.7ms wait=0.1/48.5ms pred gate=device Token # 1967: 3.691ms; value: next_token_ids=tensor([92], device='cuda:0') mtp accept=1 prop=92 top1=92 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1968: 114.686ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=19 prop=19 olap pair=109.5ms serial=194.6ms gain=85.1ms ratio=0.44 s0=3.7ms s1=190.9ms wait=0.1/48.7ms pred gate=device Token # 1969: 3.684ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=625 prop=625 pred gate=device Token # 1970: 114.537ms; value: next_token_ids=tensor([625], device='cuda:0') mtp accept=1 prop=625 top1=625 accp=1.000 next=draft=8725 prop=8725 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=3.6ms s1=190.7ms wait=0.1/48.8ms pred gate=device Token # 1971: 3.732ms; value: next_token_ids=tensor([8725], device='cuda:0') mtp accept=1 prop=8725 top1=8725 accp=0.641 next=pair draft=39133 prop=39133 pred gate=device Token # 1972: 114.857ms; value: next_token_ids=tensor([39133], device='cuda:0') mtp accept=1 prop=39133 top1=39133 accp=0.988 next=draft=201 prop=201 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=3.6ms s1=191.3ms wait=0.1/48.7ms pred gate=device Token # 1973: 3.754ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=1.000 next=pair draft=20759 prop=20759 pred gate=device Token # 1974: 115.467ms; value: next_token_ids=tensor([20759], device='cuda:0') mtp accept=1 prop=20759 top1=20759 accp=1.000 next=draft=795 prop=795 olap pair=110.3ms serial=196.0ms gain=85.7ms ratio=0.44 s0=4.0ms s1=192.0ms wait=0.1/47.9ms pred gate=device Token # 1975: 3.730ms; value: next_token_ids=tensor([795], device='cuda:0') mtp accept=1 prop=795 top1=795 accp=1.000 next=pair draft=2619 prop=2619 pred gate=device Token # 1976: 114.980ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=draft=16303 prop=16303 olap pair=109.8ms serial=194.9ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.6ms wait=0.1/47.3ms pred gate=device Token # 1977: 3.702ms; value: next_token_ids=tensor([45045], device='cuda:0') mtp accept=0 prop=16303 top1=45045 accp=0.145 next=pair draft=13208 prop=13208 pred gate=device Token # 1978: 114.451ms; value: next_token_ids=tensor([30047], device='cuda:0') mtp accept=0 prop=13208 top1=30047 accp=0.013 next=draft=90974 prop=90974 olap pair=109.2ms serial=193.8ms gain=84.5ms ratio=0.44 s0=4.2ms s1=189.5ms wait=0.1/47.3ms pred gate=device Token # 1979: 115.030ms; value: next_token_ids=tensor([90974], device='cuda:0') mtp accept=1 prop=90974 top1=90974 accp=1.000 next=draft=666 prop=666 olap pair=109.7ms serial=194.6ms gain=84.8ms ratio=0.44 s0=4.4ms s1=190.2ms wait=0.1/47.2ms pred gate=device Token # 1980: 3.738ms; value: next_token_ids=tensor([1121], device='cuda:0') mtp accept=0 prop=666 top1=1121 accp=0.124 next=pair draft=666 prop=666 pred gate=device Token # 1981: 114.792ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=0.997 next=draft=7524 prop=7524 olap pair=109.6ms serial=194.5ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/47.4ms pred gate=device Token # 1982: 3.686ms; value: next_token_ids=tensor([7524], device='cuda:0') mtp accept=1 prop=7524 top1=7524 accp=1.000 next=pair draft=15 prop=15 pred gate=device Token # 1983: 114.474ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=201 prop=201 olap pair=109.3ms serial=193.9ms gain=84.6ms ratio=0.44 s0=4.3ms s1=189.6ms wait=0.1/47.4ms pred gate=device Token # 1984: 3.670ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=0 prop=201 top1=2619 accp=0.040 next=pair draft=2382 prop=2382 pred gate=device Token # 1985: 114.840ms; value: next_token_ids=tensor([2382], device='cuda:0') mtp accept=1 prop=2382 top1=2382 accp=0.966 next=draft=92 prop=92 olap pair=109.6ms serial=194.5ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/47.2ms pred gate=device Token # 1986: 3.683ms; value: next_token_ids=tensor([92], device='cuda:0') mtp accept=1 prop=92 top1=92 accp=1.000 next=pair draft=31 prop=31 pred gate=device Token # 1987: 114.574ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=draft=19 prop=19 olap pair=109.3ms serial=194.1ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.8ms wait=0.1/47.4ms pred gate=device Token # 1988: 3.692ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=36101 prop=36101 pred gate=device Token # 1989: 114.538ms; value: next_token_ids=tensor([7758], device='cuda:0') mtp accept=0 prop=36101 top1=7758 accp=0.014 next=draft=45045 prop=45045 olap pair=109.4ms serial=194.1ms gain=84.7ms ratio=0.44 s0=4.3ms s1=189.8ms wait=0.1/47.2ms pred gate=device Token # 1990: 115.382ms; value: next_token_ids=tensor([45045], device='cuda:0') mtp accept=1 prop=45045 top1=45045 accp=0.942 next=draft=666 prop=666 olap pair=110.1ms serial=195.5ms gain=85.3ms ratio=0.44 s0=4.3ms s1=191.2ms wait=0.1/47.1ms pred gate=device Token # 1991: 3.702ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=pair draft=768 prop=768 pred gate=device Token # 1992: 114.596ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=8835 prop=8835 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.3ms s1=189.9ms wait=0.1/47.2ms pred gate=device Token # 1993: 3.689ms; value: next_token_ids=tensor([8835], device='cuda:0') mtp accept=1 prop=8835 top1=8835 accp=0.920 next=pair draft=1103 prop=1103 pred gate=device Token # 1994: 115.134ms; value: next_token_ids=tensor([5133], device='cuda:0') mtp accept=0 prop=1103 top1=5133 accp=0.004 next=draft=16303 prop=16303 olap pair=109.9ms serial=195.1ms gain=85.1ms ratio=0.44 s0=4.5ms s1=190.6ms wait=0.1/46.8ms pred gate=device Token # 1995: 115.612ms; value: next_token_ids=tensor([16303], device='cuda:0') mtp accept=1 prop=16303 top1=16303 accp=0.703 next=draft=100642 prop=100642 olap pair=110.3ms serial=195.4ms gain=85.2ms ratio=0.44 s0=4.9ms s1=190.6ms wait=0.1/46.4ms pred gate=device Token # 1996: 3.716ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=100642 top1=100642 accp=0.720 next=pair draft=2431 prop=2431 pred gate=device Token # 1997: 114.819ms; value: next_token_ids=tensor([2431], device='cuda:0') mtp accept=1 prop=2431 top1=2431 accp=0.998 next=draft=5133 prop=5133 olap pair=109.6ms serial=194.5ms gain=84.9ms ratio=0.44 s0=4.7ms s1=189.8ms wait=0.1/46.7ms pred gate=device Token # 1998: 3.713ms; value: next_token_ids=tensor([5133], device='cuda:0') mtp accept=1 prop=5133 top1=5133 accp=0.930 next=pair draft=45045 prop=45045 pred gate=device Token # 1999: 115.266ms; value: next_token_ids=tensor([45045], device='cuda:0') mtp accept=1 prop=45045 top1=45045 accp=0.999 next=draft=201 prop=201 olap pair=110.0ms serial=195.4ms gain=85.4ms ratio=0.44 s0=4.7ms s1=190.7ms wait=0.2/46.6ms pred gate=device Token # 2000: 3.707ms; value: next_token_ids=tensor([201], device='cuda:0') mtp accept=1 prop=201 top1=201 accp=1.000 next=pair draft=15 prop=15 pred gate=device Token # 2001: 114.521ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=2619 prop=2619 olap pair=109.3ms serial=194.0ms gain=84.7ms ratio=0.44 s0=4.7ms s1=189.3ms wait=0.1/46.7ms pred gate=device Token # 2002: 3.688ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.799 next=pair draft=2386 prop=2386 pred gate=device Token # 2003: 114.596ms; value: next_token_ids=tensor([2386], device='cuda:0') mtp accept=1 prop=2386 top1=2386 accp=0.958 next=draft=13208 prop=13208 olap pair=109.4ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.8ms s1=189.5ms wait=0.1/46.5ms pred gate=device Token # 2004: 3.686ms; value: next_token_ids=tensor([91868], device='cuda:0') mtp accept=0 prop=13208 top1=91868 accp=0.315 next=pair draft=666 prop=666 pred gate=device Token # 2005: 114.101ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=0.983 next=draft=768 prop=768 olap pair=108.9ms serial=193.2ms gain=84.3ms ratio=0.44 s0=4.7ms s1=188.4ms wait=0.1/46.5ms pred gate=device Token # 2006: 3.678ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=pair draft=3007 prop=3007 pred gate=device Token # 2007: 114.189ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=0 prop=3007 top1=8835 accp=0.374 next=draft=6034 prop=6034 olap pair=109.0ms serial=193.6ms gain=84.5ms ratio=0.44 s0=4.0ms s1=189.6ms wait=0.1/48.1ms pred gate=device Token # 2008: 115.060ms; value: next_token_ids=tensor([5078], device='cuda:0') mtp accept=0 prop=6034 top1=5078 accp=0.002 next=draft=28057 prop=28057 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=3.7ms s1=191.2ms wait=0.1/48.6ms pred gate=device Token # 2009: 115.150ms; value: next_token_ids=tensor([28057], device='cuda:0') mtp accept=1 prop=28057 top1=28057 accp=0.968 next=draft=572 prop=572 olap pair=109.8ms serial=195.0ms gain=85.2ms ratio=0.44 s0=3.8ms s1=191.2ms wait=0.1/48.6ms pred gate=device Token # 2010: 3.690ms; value: next_token_ids=tensor([572], device='cuda:0') mtp accept=1 prop=572 top1=572 accp=1.000 next=pair draft=10251 prop=10251 pred gate=device Token # 2011: 114.663ms; value: next_token_ids=tensor([10251], device='cuda:0') mtp accept=1 prop=10251 top1=10251 accp=0.999 next=draft=3007 prop=8835 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=3.7ms s1=190.6ms wait=0.1/48.6ms pred gate=device Token # 2012: 3.694ms; value: next_token_ids=tensor([8835], device='cuda:0') mtp accept=1 prop=8835 top1=8835 accp=0.167 next=pair draft=13 prop=13 pred gate=device