[2026-04-08 08:57:30.386253 INFO duck_llm] 这是一条信息日志 [2026-04-08 08:57:30.386283 WARN duck_llm] 这是一条警告日志 [2026-04-08 08:57:30.386285 ERROR duck_llm] 这是一条错误日志 [2026-04-08 08:57:30.386481 INFO utils] Selected DPDK lcores: master=0, workers=[2, 4, 6, 8], all_performance_core_representatives=[0, 2, 4, 6, 8, 10, 12, 14] EAL: Detected CPU lcores: 32 EAL: Detected NUMA nodes: 1 EAL: Detected shared linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: VFIO support initialized EAL: Using IOMMU type 1 (Type 1) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) [2026-04-08 08:57:32.441441 INFO dpdk_workers] DPDK initialized successfully. Found 4 ports. [2026-04-08 08:57:32.441457 INFO dpdk_workers] Port 0 device name: 0000:01:00.0 [2026-04-08 08:57:32.441460 INFO dpdk_workers] Port 0 IP address: 10.21.1.1 [2026-04-08 08:57:32.441462 INFO dpdk_workers] Port 0 Broadcast address: 10.21.1.255 [2026-04-08 08:57:32.441464 INFO dpdk_workers] Port 1 device name: 0000:01:00.1 [2026-04-08 08:57:32.441465 INFO dpdk_workers] Port 1 IP address: 10.21.2.1 [2026-04-08 08:57:32.441467 INFO dpdk_workers] Port 1 Broadcast address: 10.21.2.255 [2026-04-08 08:57:32.441468 INFO dpdk_workers] Port 2 device name: 0000:01:00.2 [2026-04-08 08:57:32.441470 INFO dpdk_workers] Port 2 IP address: 10.21.3.1 [2026-04-08 08:57:32.441471 INFO dpdk_workers] Port 2 Broadcast address: 10.21.3.255 [2026-04-08 08:57:32.441473 INFO dpdk_workers] Port 3 device name: 0000:01:00.3 [2026-04-08 08:57:32.441475 INFO dpdk_workers] Port 3 IP address: 10.21.4.1 [2026-04-08 08:57:32.441476 INFO dpdk_workers] Port 3 Broadcast address: 10.21.4.255 [2026-04-08 08:57:32.441478 INFO dpdk_workers] Available netifs list: [(10.21.1.255, 0, 10.21.1.1), (10.21.2.255, 1, 10.21.2.1), (10.21.3.255, 2, 10.21.3.1), (10.21.4.255, 3, 10.21.4.1)] [2026-04-08 08:57:32.441486 INFO dpdk_workers] Starting worker #0: (bcast_ip: 10.21.1.255, port_id: 0, lcore_id: 2, host_ip: 10.21.1.1) [2026-04-08 08:57:32.441515 INFO dpdk_workers] Starting worker #1: (bcast_ip: 10.21.2.255, port_id: 1, lcore_id: 4, host_ip: 10.21.2.1) [2026-04-08 08:57:32.441541 INFO dpdk_workers] Initializing worker port 0 on lcore 2... [2026-04-08 08:57:32.443883 INFO dpdk_workers] Starting worker #2: (bcast_ip: 10.21.3.255, port_id: 2, lcore_id: 6, host_ip: 10.21.3.1) [2026-04-08 08:57:32.443908 INFO dpdk_workers] Starting worker #3: (bcast_ip: 10.21.4.255, port_id: 3, lcore_id: 8, host_ip: 10.21.4.1) [2026-04-08 08:57:32.443944 INFO dpdk_workers] Initializing worker port 1 on lcore 4... [2026-04-08 08:57:32.445804 INFO dpdk_workers] Initializing worker port 2 on lcore 6... [2026-04-08 08:57:32.447795 INFO dpdk_workers] Initializing worker port 3 on lcore 8... ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 1). ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 0). ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 3). ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 2). [2026-04-08 08:57:35.586755 INFO dpdk_workers] Worker port 2 initialized successfully. [2026-04-08 08:57:35.618907 INFO dpdk_workers] Worker port 0 initialized successfully. [2026-04-08 08:57:36.446556 INFO dpdk_workers] Worker port 1 initialized successfully. [2026-04-08 08:57:36.450130 INFO dpdk_workers] Worker port 3 initialized successfully. [2026-04-08 08:57:36.450160 INFO dpdk_workers] Workers initialized successfully. 4 workers running. [2026-04-08 08:57:36.450432 INFO utils] Binding master thread to cores (excluding workers): [0, 1, 3, 5, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31] [2026-04-08 08:57:36.450442 INFO utils] set_thread_affinity(tid 1370268, cores [0, 1, 3, 5, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]): 0 [2026-04-08 08:57:36.451206 INFO dpdk_workers] Run command Ping all time: send 1.4 us, recv 755.2 us [2026-04-08 08:57:36.501265 INFO dpdk_workers] Run command Ping all time: send 0.3 us, recv 0.4 us [2026-04-08 08:57:36.551322 INFO dpdk_workers] Run command Ping all time: send 0.2 us, recv 0.6 us [2026-04-08 08:57:36.601378 INFO dpdk_workers] Run command Ping all time: send 0.3 us, recv 0.4 us [2026-04-08 08:57:36.651434 INFO dpdk_workers] Run command Ping all time: send 0.5 us, recv 0.4 us [2026-04-08 08:57:36.701490 INFO dpdk_workers] Run command Ping all time: send 0.3 us, recv 0.4 us [2026-04-08 08:57:36.751547 INFO dpdk_workers] Run command Ping all time: send 0.3 us, recv 0.6 us [2026-04-08 08:57:36.801604 INFO dpdk_workers] Run command Ping all time: send 0.2 us, recv 0.5 us [2026-04-08 08:57:36.851670 INFO dpdk_workers] Run command Ping all time: send 1.0 us, recv 1.3 us [2026-04-08 08:57:36.901749 INFO dpdk_workers] Run command Ping all time: send 1.4 us, recv 1.1 us [2026-04-08 08:57:36.951842 INFO dpdk_workers] Found 32 ducks in duck-ips-multi-netifs.txt [2026-04-08 08:57:36.951846 INFO dpdk_workers] Duck #0: 10.21.1.101 (bcast_ip: 10.21.1.255) [2026-04-08 08:57:36.951848 INFO dpdk_workers] Duck #1: 10.21.1.102 (bcast_ip: 10.21.1.255) [2026-04-08 08:57:36.951851 INFO dpdk_workers] Duck #2: 10.21.1.103 (bcast_ip: 10.21.1.255) [2026-04-08 08:57:36.951852 INFO dpdk_workers] Duck #3: 10.21.1.104 (bcast_ip: 10.21.1.255) [2026-04-08 08:57:36.951855 INFO dpdk_workers] Duck #4: 10.21.1.105 (bcast_ip: 10.21.1.255) [2026-04-08 08:57:36.951856 INFO dpdk_workers] Duck #5: 10.21.1.106 (bcast_ip: 10.21.1.255) [2026-04-08 08:57:36.951859 INFO dpdk_workers] Duck #6: 10.21.1.107 (bcast_ip: 10.21.1.255) [2026-04-08 08:57:36.951861 INFO dpdk_workers] Duck #7: 10.21.1.108 (bcast_ip: 10.21.1.255) [2026-04-08 08:57:36.951863 INFO dpdk_workers] Duck #8: 10.21.2.101 (bcast_ip: 10.21.2.255) [2026-04-08 08:57:36.951865 INFO dpdk_workers] Duck #9: 10.21.2.102 (bcast_ip: 10.21.2.255) [2026-04-08 08:57:36.951867 INFO dpdk_workers] Duck #10: 10.21.2.103 (bcast_ip: 10.21.2.255) [2026-04-08 08:57:36.951869 INFO dpdk_workers] Duck #11: 10.21.2.104 (bcast_ip: 10.21.2.255) [2026-04-08 08:57:36.951870 INFO dpdk_workers] Duck #12: 10.21.2.105 (bcast_ip: 10.21.2.255) [2026-04-08 08:57:36.951872 INFO dpdk_workers] Duck #13: 10.21.2.106 (bcast_ip: 10.21.2.255) [2026-04-08 08:57:36.951874 INFO dpdk_workers] Duck #14: 10.21.2.107 (bcast_ip: 10.21.2.255) [2026-04-08 08:57:36.951876 INFO dpdk_workers] Duck #15: 10.21.2.108 (bcast_ip: 10.21.2.255) [2026-04-08 08:57:36.951878 INFO dpdk_workers] Duck #16: 10.21.3.101 (bcast_ip: 10.21.3.255) [2026-04-08 08:57:36.951880 INFO dpdk_workers] Duck #17: 10.21.3.102 (bcast_ip: 10.21.3.255) [2026-04-08 08:57:36.951882 INFO dpdk_workers] Duck #18: 10.21.3.103 (bcast_ip: 10.21.3.255) [2026-04-08 08:57:36.951883 INFO dpdk_workers] Duck #19: 10.21.3.104 (bcast_ip: 10.21.3.255) [2026-04-08 08:57:36.951885 INFO dpdk_workers] Duck #20: 10.21.3.105 (bcast_ip: 10.21.3.255) [2026-04-08 08:57:36.951887 INFO dpdk_workers] Duck #21: 10.21.3.106 (bcast_ip: 10.21.3.255) [2026-04-08 08:57:36.951889 INFO dpdk_workers] Duck #22: 10.21.3.107 (bcast_ip: 10.21.3.255) [2026-04-08 08:57:36.951891 INFO dpdk_workers] Duck #23: 10.21.3.108 (bcast_ip: 10.21.3.255) [2026-04-08 08:57:36.951893 INFO dpdk_workers] Duck #24: 10.21.4.101 (bcast_ip: 10.21.4.255) [2026-04-08 08:57:36.951895 INFO dpdk_workers] Duck #25: 10.21.4.102 (bcast_ip: 10.21.4.255) [2026-04-08 08:57:36.951896 INFO dpdk_workers] Duck #26: 10.21.4.103 (bcast_ip: 10.21.4.255) [2026-04-08 08:57:36.951898 INFO dpdk_workers] Duck #27: 10.21.4.104 (bcast_ip: 10.21.4.255) [2026-04-08 08:57:36.951900 INFO dpdk_workers] Duck #28: 10.21.4.105 (bcast_ip: 10.21.4.255) [2026-04-08 08:57:36.951902 INFO dpdk_workers] Duck #29: 10.21.4.106 (bcast_ip: 10.21.4.255) [2026-04-08 08:57:36.951904 INFO dpdk_workers] Duck #30: 10.21.4.107 (bcast_ip: 10.21.4.255) [2026-04-08 08:57:36.951908 INFO dpdk_workers] Duck #31: 10.21.4.108 (bcast_ip: 10.21.4.255) [2026-04-08 08:57:36.953675 INFO dpdk_workers] [Worker 0]: 10.21.1.101 [2026-04-08 08:57:36.953691 INFO dpdk_workers] [Worker 0]: 10.21.1.102 [2026-04-08 08:57:36.953702 INFO dpdk_workers] [Worker 0]: 10.21.1.103 [2026-04-08 08:57:36.953713 INFO dpdk_workers] [Worker 0]: 10.21.1.104 [2026-04-08 08:57:36.953716 INFO dpdk_workers] [Worker 0]: 10.21.1.105 [2026-04-08 08:57:36.953718 INFO dpdk_workers] [Worker 0]: 10.21.1.106 [2026-04-08 08:57:36.953719 INFO dpdk_workers] [Worker 0]: 10.21.1.107 [2026-04-08 08:57:36.953721 INFO dpdk_workers] [Worker 0]: 10.21.1.108 [2026-04-08 08:57:36.953724 INFO dpdk_workers] [Worker 1]: 10.21.2.101 [2026-04-08 08:57:36.953725 INFO dpdk_workers] [Worker 1]: 10.21.2.102 [2026-04-08 08:57:36.953727 INFO dpdk_workers] [Worker 1]: 10.21.2.103 [2026-04-08 08:57:36.953728 INFO dpdk_workers] [Worker 1]: 10.21.2.104 [2026-04-08 08:57:36.953729 INFO dpdk_workers] [Worker 1]: 10.21.2.105 [2026-04-08 08:57:36.953731 INFO dpdk_workers] [Worker 1]: 10.21.2.106 [2026-04-08 08:57:36.953732 INFO dpdk_workers] [Worker 1]: 10.21.2.107 [2026-04-08 08:57:36.953734 INFO dpdk_workers] [Worker 1]: 10.21.2.108 [2026-04-08 08:57:36.953737 INFO dpdk_workers] [Worker 2]: 10.21.3.101 [2026-04-08 08:57:36.953738 INFO dpdk_workers] [Worker 2]: 10.21.3.102 [2026-04-08 08:57:36.953740 INFO dpdk_workers] [Worker 2]: 10.21.3.103 [2026-04-08 08:57:36.953741 INFO dpdk_workers] [Worker 2]: 10.21.3.104 [2026-04-08 08:57:36.953743 INFO dpdk_workers] [Worker 2]: 10.21.3.105 [2026-04-08 08:57:36.953744 INFO dpdk_workers] [Worker 2]: 10.21.3.106 [2026-04-08 08:57:36.953746 INFO dpdk_workers] [Worker 2]: 10.21.3.107 [2026-04-08 08:57:36.953748 INFO dpdk_workers] [Worker 2]: 10.21.3.108 [2026-04-08 08:57:36.953962 INFO dpdk_workers] [Worker 3]: 10.21.4.101 [2026-04-08 08:57:36.953964 INFO dpdk_workers] [Worker 3]: 10.21.4.102 [2026-04-08 08:57:36.953966 INFO dpdk_workers] [Worker 3]: 10.21.4.103 [2026-04-08 08:57:36.953967 INFO dpdk_workers] [Worker 3]: 10.21.4.104 [2026-04-08 08:57:36.953969 INFO dpdk_workers] [Worker 3]: 10.21.4.105 [2026-04-08 08:57:36.953970 INFO dpdk_workers] [Worker 3]: 10.21.4.106 [2026-04-08 08:57:36.953972 INFO dpdk_workers] [Worker 3]: 10.21.4.107 [2026-04-08 08:57:36.953973 INFO dpdk_workers] [Worker 3]: 10.21.4.108 [2026-04-08 08:57:36.953975 INFO dpdk_workers] init_ducks done [2026-04-08 08:57:36.954083 INFO dpdk_ducks] Initialized 4 DPDK duck workers [2026-04-08 08:57:36.954086 INFO dpdk_ducks] DPDK duck worker 0: DpdkDuckWorker { worker_idx: 0, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (0, 8) } [2026-04-08 08:57:36.954090 INFO dpdk_ducks] DPDK duck worker 1: DpdkDuckWorker { worker_idx: 1, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (8, 16) } [2026-04-08 08:57:36.954093 INFO dpdk_ducks] DPDK duck worker 2: DpdkDuckWorker { worker_idx: 2, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (16, 24) } [2026-04-08 08:57:36.954095 INFO dpdk_ducks] DPDK duck worker 3: DpdkDuckWorker { worker_idx: 3, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (24, 32) } [2026-04-08 08:57:36.954100 INFO buffer_manager] Initializing buffer manager [2026-04-08 08:57:36.954102 INFO buffer_manager] Buffer manager initialized: ELF BufferAllocator { begin: 0, end: 10485760, current: 0 }, input BufferAllocator { begin: 10485760, end: 104857600, current: 10485760 }, weights BufferAllocator { begin: 104923136, end: 32212254720, current: 104923136 } [2026-04-08 08:57:36.954106 INFO fp8_dpdk_common] fp9 persistent judge enabled by default; set DUCK_FP9_PERSISTENT_JUDGE=0 to disable [2026-04-08 08:57:36.954520 INFO buffer_manager] Added kernel fp9_kernels at (0, 91664) [2026-04-08 08:57:36.954555 INFO fp8_dpdk_common] fp9 persistent judge: opened 32 sessions [2026-04-08 08:57:36.954558 INFO fp8_dpdk_common] fp9 persistent judge: force-opened 32 fresh sessions for new init [2026-04-08 08:57:36.954560 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init(tp_size=32) [2026-04-08 08:57:36.954561 INFO fp8_moe_dpdk] fp8_moe_dpdk: init(tp_size=32) [2026-04-08 08:57:37.340666 INFO weight_cache] weight_cache: header hit tp_size=32 num_slots=62 finished_slots=62 [2026-04-08 08:57:37.667804 INFO buffer_manager] Allocated weights buffer at (104923136, 0) [2026-04-08 08:57:37.667830 INFO buffer_manager] Allocated weights buffer at (104923136, 4128768) [2026-04-08 08:57:37.667832 INFO buffer_manager] Allocated weights buffer at (109051904, 516096) [2026-04-08 08:57:37.667834 INFO buffer_manager] Allocated weights buffer at (109568000, 2016) [2026-04-08 08:57:37.667836 INFO buffer_manager] Allocated weights buffer at (109572096, 4128768) [2026-04-08 08:57:37.667837 INFO buffer_manager] Allocated weights buffer at (113700864, 516096) [2026-04-08 08:57:37.667839 INFO buffer_manager] Allocated weights buffer at (114216960, 2016) [2026-04-08 08:57:37.667840 INFO buffer_manager] Allocated weights buffer at (114221056, 4128768) [2026-04-08 08:57:37.667842 INFO buffer_manager] Allocated weights buffer at (118349824, 516096) [2026-04-08 08:57:37.667843 INFO buffer_manager] Allocated weights buffer at (118865920, 2016) [2026-04-08 08:57:37.667845 INFO buffer_manager] Allocated weights buffer at (118870016, 0) [2026-04-08 08:57:37.667847 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init_layer_cached(layer_idx=0, cache_slot=0) planned desc only [2026-04-08 08:57:37.761059 INFO buffer_manager] Allocated weights buffer at (118870016, 0) [2026-04-08 08:57:37.761083 INFO buffer_manager] Allocated weights buffer at (118870016, 4128768) [2026-04-08 08:57:37.761085 INFO buffer_manager] Allocated weights buffer at (122998784, 516096) [2026-04-08 08:57:37.761087 INFO buffer_manager] Allocated weights buffer at (123514880, 2016) [2026-04-08 08:57:37.761089 INFO buffer_manager] Allocated weights buffer at (123518976, 4128768) [2026-04-08 08:57:37.761090 INFO buffer_manager] Allocated weights buffer at (127647744, 516096) [2026-04-08 08:57:37.761092 INFO buffer_manager] Allocated weights buffer at (128163840, 2016) [2026-04-08 08:57:37.761093 INFO buffer_manager] Allocated weights buffer at (128167936, 4128768) [2026-04-08 08:57:37.761095 INFO buffer_manager] Allocated weights buffer at (132296704, 516096) [2026-04-08 08:57:37.761097 INFO buffer_manager] Allocated weights buffer at (132812800, 2016) [2026-04-08 08:57:37.761098 INFO buffer_manager] Allocated weights buffer at (132816896, 0) [2026-04-08 08:57:37.761099 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init_layer_cached(layer_idx=1, cache_slot=1) planned desc only [2026-04-08 08:57:37.847803 INFO buffer_manager] Allocated weights buffer at (132816896, 0) [2026-04-08 08:57:37.847822 INFO buffer_manager] Allocated weights buffer at (132816896, 4128768) [2026-04-08 08:57:37.847824 INFO buffer_manager] Allocated weights buffer at (136945664, 516096) [2026-04-08 08:57:37.847826 INFO buffer_manager] Allocated weights buffer at (137461760, 2016) [2026-04-08 08:57:37.847830 INFO buffer_manager] Allocated weights buffer at (137465856, 4128768) [2026-04-08 08:57:37.847832 INFO buffer_manager] Allocated weights buffer at (141594624, 516096) [2026-04-08 08:57:37.847833 INFO buffer_manager] Allocated weights buffer at (142110720, 2016) [2026-04-08 08:57:37.847835 INFO buffer_manager] Allocated weights buffer at (142114816, 4128768) [2026-04-08 08:57:37.847836 INFO buffer_manager] Allocated weights buffer at (146243584, 516096) [2026-04-08 08:57:37.847838 INFO buffer_manager] Allocated weights buffer at (146759680, 2016) [2026-04-08 08:57:37.847839 INFO buffer_manager] Allocated weights buffer at (146763776, 0) [2026-04-08 08:57:37.847841 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init_layer_cached(layer_idx=2, cache_slot=2) planned desc only [2026-04-08 08:57:37.876379 INFO buffer_manager] Allocated weights buffer at (146763776, 0) [2026-04-08 08:57:37.876394 INFO buffer_manager] Allocated weights buffer at (146763776, 132120576) [2026-04-08 08:57:37.876396 INFO buffer_manager] Allocated weights buffer at (278884352, 57344) [2026-04-08 08:57:37.876397 INFO buffer_manager] Allocated weights buffer at (278941696, 132120576) [2026-04-08 08:57:37.876399 INFO buffer_manager] Allocated weights buffer at (411062272, 57344) [2026-04-08 08:57:37.876400 INFO buffer_manager] Allocated weights buffer at (411119616, 132120576) [2026-04-08 08:57:37.876402 INFO buffer_manager] Allocated weights buffer at (543240192, 57344) [2026-04-08 08:57:37.876403 INFO buffer_manager] Allocated weights buffer at (543297536, 0) [2026-04-08 08:57:37.876405 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=3, cache_slot=3) planned desc only [2026-04-08 08:57:37.912803 INFO buffer_manager] Allocated weights buffer at (543297536, 0) [2026-04-08 08:57:37.912817 INFO buffer_manager] Allocated weights buffer at (543297536, 132120576) [2026-04-08 08:57:37.912819 INFO buffer_manager] Allocated weights buffer at (675418112, 57344) [2026-04-08 08:57:37.912821 INFO buffer_manager] Allocated weights buffer at (675475456, 132120576) [2026-04-08 08:57:37.912822 INFO buffer_manager] Allocated weights buffer at (807596032, 57344) [2026-04-08 08:57:37.912824 INFO buffer_manager] Allocated weights buffer at (807653376, 132120576) [2026-04-08 08:57:37.912825 INFO buffer_manager] Allocated weights buffer at (939773952, 57344) [2026-04-08 08:57:37.912827 INFO buffer_manager] Allocated weights buffer at (939831296, 0) [2026-04-08 08:57:37.912828 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=4, cache_slot=4) planned desc only [2026-04-08 08:57:37.949060 INFO buffer_manager] Allocated weights buffer at (939831296, 0) [2026-04-08 08:57:37.949075 INFO buffer_manager] Allocated weights buffer at (939831296, 132120576) [2026-04-08 08:57:37.949077 INFO buffer_manager] Allocated weights buffer at (1071951872, 57344) [2026-04-08 08:57:37.949079 INFO buffer_manager] Allocated weights buffer at (1072009216, 132120576) [2026-04-08 08:57:37.949080 INFO buffer_manager] Allocated weights buffer at (1204129792, 57344) [2026-04-08 08:57:37.949082 INFO buffer_manager] Allocated weights buffer at (1204187136, 132120576) [2026-04-08 08:57:37.949083 INFO buffer_manager] Allocated weights buffer at (1336307712, 57344) [2026-04-08 08:57:37.949085 INFO buffer_manager] Allocated weights buffer at (1336365056, 0) [2026-04-08 08:57:37.949086 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=5, cache_slot=5) planned desc only [2026-04-08 08:57:37.985410 INFO buffer_manager] Allocated weights buffer at (1336365056, 0) [2026-04-08 08:57:37.985424 INFO buffer_manager] Allocated weights buffer at (1336365056, 132120576) [2026-04-08 08:57:37.985426 INFO buffer_manager] Allocated weights buffer at (1468485632, 57344) [2026-04-08 08:57:37.985428 INFO buffer_manager] Allocated weights buffer at (1468542976, 132120576) [2026-04-08 08:57:37.985429 INFO buffer_manager] Allocated weights buffer at (1600663552, 57344) [2026-04-08 08:57:37.985431 INFO buffer_manager] Allocated weights buffer at (1600720896, 132120576) [2026-04-08 08:57:37.985436 INFO buffer_manager] Allocated weights buffer at (1732841472, 57344) [2026-04-08 08:57:37.985437 INFO buffer_manager] Allocated weights buffer at (1732898816, 0) [2026-04-08 08:57:37.985439 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=6, cache_slot=6) planned desc only [2026-04-08 08:57:38.021750 INFO buffer_manager] Allocated weights buffer at (1732898816, 0) [2026-04-08 08:57:38.021763 INFO buffer_manager] Allocated weights buffer at (1732898816, 132120576) [2026-04-08 08:57:38.021765 INFO buffer_manager] Allocated weights buffer at (1865019392, 57344) [2026-04-08 08:57:38.021766 INFO buffer_manager] Allocated weights buffer at (1865076736, 132120576) [2026-04-08 08:57:38.021768 INFO buffer_manager] Allocated weights buffer at (1997197312, 57344) [2026-04-08 08:57:38.021769 INFO buffer_manager] Allocated weights buffer at (1997254656, 132120576) [2026-04-08 08:57:38.021771 INFO buffer_manager] Allocated weights buffer at (2129375232, 57344) [2026-04-08 08:57:38.021772 INFO buffer_manager] Allocated weights buffer at (2129432576, 0) [2026-04-08 08:57:38.021774 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=7, cache_slot=7) planned desc only [2026-04-08 08:57:38.058056 INFO buffer_manager] Allocated weights buffer at (2129432576, 0) [2026-04-08 08:57:38.058070 INFO buffer_manager] Allocated weights buffer at (2129432576, 132120576) [2026-04-08 08:57:38.058072 INFO buffer_manager] Allocated weights buffer at (2261553152, 57344) [2026-04-08 08:57:38.058073 INFO buffer_manager] Allocated weights buffer at (2261610496, 132120576) [2026-04-08 08:57:38.058075 INFO buffer_manager] Allocated weights buffer at (2393731072, 57344) [2026-04-08 08:57:38.058076 INFO buffer_manager] Allocated weights buffer at (2393788416, 132120576) [2026-04-08 08:57:38.058078 INFO buffer_manager] Allocated weights buffer at (2525908992, 57344) [2026-04-08 08:57:38.058083 INFO buffer_manager] Allocated weights buffer at (2525966336, 0) [2026-04-08 08:57:38.058086 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=8, cache_slot=8) planned desc only [2026-04-08 08:57:38.094340 INFO buffer_manager] Allocated weights buffer at (2525966336, 0) [2026-04-08 08:57:38.094355 INFO buffer_manager] Allocated weights buffer at (2525966336, 132120576) [2026-04-08 08:57:38.094357 INFO buffer_manager] Allocated weights buffer at (2658086912, 57344) [2026-04-08 08:57:38.094358 INFO buffer_manager] Allocated weights buffer at (2658144256, 132120576) [2026-04-08 08:57:38.094360 INFO buffer_manager] Allocated weights buffer at (2790264832, 57344) [2026-04-08 08:57:38.094361 INFO buffer_manager] Allocated weights buffer at (2790322176, 132120576) [2026-04-08 08:57:38.094363 INFO buffer_manager] Allocated weights buffer at (2922442752, 57344) [2026-04-08 08:57:38.094364 INFO buffer_manager] Allocated weights buffer at (2922500096, 0) [2026-04-08 08:57:38.094366 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=9, cache_slot=9) planned desc only [2026-04-08 08:57:38.130541 INFO buffer_manager] Allocated weights buffer at (2922500096, 0) [2026-04-08 08:57:38.130556 INFO buffer_manager] Allocated weights buffer at (2922500096, 132120576) [2026-04-08 08:57:38.130558 INFO buffer_manager] Allocated weights buffer at (3054620672, 57344) [2026-04-08 08:57:38.130560 INFO buffer_manager] Allocated weights buffer at (3054678016, 132120576) [2026-04-08 08:57:38.130561 INFO buffer_manager] Allocated weights buffer at (3186798592, 57344) [2026-04-08 08:57:38.130563 INFO buffer_manager] Allocated weights buffer at (3186855936, 132120576) [2026-04-08 08:57:38.130564 INFO buffer_manager] Allocated weights buffer at (3318976512, 57344) [2026-04-08 08:57:38.130565 INFO buffer_manager] Allocated weights buffer at (3319033856, 0) [2026-04-08 08:57:38.130567 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=10, cache_slot=10) planned desc only [2026-04-08 08:57:38.166857 INFO buffer_manager] Allocated weights buffer at (3319033856, 0) [2026-04-08 08:57:38.166873 INFO buffer_manager] Allocated weights buffer at (3319033856, 132120576) [2026-04-08 08:57:38.166878 INFO buffer_manager] Allocated weights buffer at (3451154432, 57344) [2026-04-08 08:57:38.166880 INFO buffer_manager] Allocated weights buffer at (3451211776, 132120576) [2026-04-08 08:57:38.166882 INFO buffer_manager] Allocated weights buffer at (3583332352, 57344) [2026-04-08 08:57:38.166883 INFO buffer_manager] Allocated weights buffer at (3583389696, 132120576) [2026-04-08 08:57:38.166885 INFO buffer_manager] Allocated weights buffer at (3715510272, 57344) [2026-04-08 08:57:38.166886 INFO buffer_manager] Allocated weights buffer at (3715567616, 0) [2026-04-08 08:57:38.166888 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=11, cache_slot=11) planned desc only [2026-04-08 08:57:38.203077 INFO buffer_manager] Allocated weights buffer at (3715567616, 0) [2026-04-08 08:57:38.203092 INFO buffer_manager] Allocated weights buffer at (3715567616, 132120576) [2026-04-08 08:57:38.203094 INFO buffer_manager] Allocated weights buffer at (3847688192, 57344) [2026-04-08 08:57:38.203096 INFO buffer_manager] Allocated weights buffer at (3847745536, 132120576) [2026-04-08 08:57:38.203098 INFO buffer_manager] Allocated weights buffer at (3979866112, 57344) [2026-04-08 08:57:38.203099 INFO buffer_manager] Allocated weights buffer at (3979923456, 132120576) [2026-04-08 08:57:38.203101 INFO buffer_manager] Allocated weights buffer at (4112044032, 57344) [2026-04-08 08:57:38.203102 INFO buffer_manager] Allocated weights buffer at (4112101376, 0) [2026-04-08 08:57:38.203104 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=12, cache_slot=12) planned desc only [2026-04-08 08:57:38.239255 INFO buffer_manager] Allocated weights buffer at (4112101376, 0) [2026-04-08 08:57:38.239269 INFO buffer_manager] Allocated weights buffer at (4112101376, 132120576) [2026-04-08 08:57:38.239271 INFO buffer_manager] Allocated weights buffer at (4244221952, 57344) [2026-04-08 08:57:38.239273 INFO buffer_manager] Allocated weights buffer at (4244279296, 132120576) [2026-04-08 08:57:38.239274 INFO buffer_manager] Allocated weights buffer at (4376399872, 57344) [2026-04-08 08:57:38.239276 INFO buffer_manager] Allocated weights buffer at (4376457216, 132120576) [2026-04-08 08:57:38.239277 INFO buffer_manager] Allocated weights buffer at (4508577792, 57344) [2026-04-08 08:57:38.239279 INFO buffer_manager] Allocated weights buffer at (4508635136, 0) [2026-04-08 08:57:38.239280 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=13, cache_slot=13) planned desc only [2026-04-08 08:57:38.275415 INFO buffer_manager] Allocated weights buffer at (4508635136, 0) [2026-04-08 08:57:38.275430 INFO buffer_manager] Allocated weights buffer at (4508635136, 132120576) [2026-04-08 08:57:38.275432 INFO buffer_manager] Allocated weights buffer at (4640755712, 57344) [2026-04-08 08:57:38.275434 INFO buffer_manager] Allocated weights buffer at (4640813056, 132120576) [2026-04-08 08:57:38.275435 INFO buffer_manager] Allocated weights buffer at (4772933632, 57344) [2026-04-08 08:57:38.275436 INFO buffer_manager] Allocated weights buffer at (4772990976, 132120576) [2026-04-08 08:57:38.275438 INFO buffer_manager] Allocated weights buffer at (4905111552, 57344) [2026-04-08 08:57:38.275439 INFO buffer_manager] Allocated weights buffer at (4905168896, 0) [2026-04-08 08:57:38.275441 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=14, cache_slot=14) planned desc only [2026-04-08 08:57:38.311638 INFO buffer_manager] Allocated weights buffer at (4905168896, 0) [2026-04-08 08:57:38.311654 INFO buffer_manager] Allocated weights buffer at (4905168896, 132120576) [2026-04-08 08:57:38.311657 INFO buffer_manager] Allocated weights buffer at (5037289472, 57344) [2026-04-08 08:57:38.311658 INFO buffer_manager] Allocated weights buffer at (5037346816, 132120576) [2026-04-08 08:57:38.311660 INFO buffer_manager] Allocated weights buffer at (5169467392, 57344) [2026-04-08 08:57:38.311661 INFO buffer_manager] Allocated weights buffer at (5169524736, 132120576) [2026-04-08 08:57:38.311663 INFO buffer_manager] Allocated weights buffer at (5301645312, 57344) [2026-04-08 08:57:38.311669 INFO buffer_manager] Allocated weights buffer at (5301702656, 0) [2026-04-08 08:57:38.311670 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=15, cache_slot=15) planned desc only [2026-04-08 08:57:38.347921 INFO buffer_manager] Allocated weights buffer at (5301702656, 0) [2026-04-08 08:57:38.347936 INFO buffer_manager] Allocated weights buffer at (5301702656, 132120576) [2026-04-08 08:57:38.347938 INFO buffer_manager] Allocated weights buffer at (5433823232, 57344) [2026-04-08 08:57:38.347939 INFO buffer_manager] Allocated weights buffer at (5433880576, 132120576) [2026-04-08 08:57:38.347941 INFO buffer_manager] Allocated weights buffer at (5566001152, 57344) [2026-04-08 08:57:38.347942 INFO buffer_manager] Allocated weights buffer at (5566058496, 132120576) [2026-04-08 08:57:38.347944 INFO buffer_manager] Allocated weights buffer at (5698179072, 57344) [2026-04-08 08:57:38.347945 INFO buffer_manager] Allocated weights buffer at (5698236416, 0) [2026-04-08 08:57:38.347947 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=16, cache_slot=16) planned desc only [2026-04-08 08:57:38.384108 INFO buffer_manager] Allocated weights buffer at (5698236416, 0) [2026-04-08 08:57:38.384123 INFO buffer_manager] Allocated weights buffer at (5698236416, 132120576) [2026-04-08 08:57:38.384125 INFO buffer_manager] Allocated weights buffer at (5830356992, 57344) [2026-04-08 08:57:38.384127 INFO buffer_manager] Allocated weights buffer at (5830414336, 132120576) [2026-04-08 08:57:38.384128 INFO buffer_manager] Allocated weights buffer at (5962534912, 57344) [2026-04-08 08:57:38.384130 INFO buffer_manager] Allocated weights buffer at (5962592256, 132120576) [2026-04-08 08:57:38.384132 INFO buffer_manager] Allocated weights buffer at (6094712832, 57344) [2026-04-08 08:57:38.384133 INFO buffer_manager] Allocated weights buffer at (6094770176, 0) [2026-04-08 08:57:38.384135 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=17, cache_slot=17) planned desc only [2026-04-08 08:57:38.420276 INFO buffer_manager] Allocated weights buffer at (6094770176, 0) [2026-04-08 08:57:38.420292 INFO buffer_manager] Allocated weights buffer at (6094770176, 132120576) [2026-04-08 08:57:38.420294 INFO buffer_manager] Allocated weights buffer at (6226890752, 57344) [2026-04-08 08:57:38.420296 INFO buffer_manager] Allocated weights buffer at (6226948096, 132120576) [2026-04-08 08:57:38.420297 INFO buffer_manager] Allocated weights buffer at (6359068672, 57344) [2026-04-08 08:57:38.420299 INFO buffer_manager] Allocated weights buffer at (6359126016, 132120576) [2026-04-08 08:57:38.420300 INFO buffer_manager] Allocated weights buffer at (6491246592, 57344) [2026-04-08 08:57:38.420302 INFO buffer_manager] Allocated weights buffer at (6491303936, 0) [2026-04-08 08:57:38.420303 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=18, cache_slot=18) planned desc only [2026-04-08 08:57:38.456544 INFO buffer_manager] Allocated weights buffer at (6491303936, 0) [2026-04-08 08:57:38.456559 INFO buffer_manager] Allocated weights buffer at (6491303936, 132120576) [2026-04-08 08:57:38.456561 INFO buffer_manager] Allocated weights buffer at (6623424512, 57344) [2026-04-08 08:57:38.456563 INFO buffer_manager] Allocated weights buffer at (6623481856, 132120576) [2026-04-08 08:57:38.456565 INFO buffer_manager] Allocated weights buffer at (6755602432, 57344) [2026-04-08 08:57:38.456566 INFO buffer_manager] Allocated weights buffer at (6755659776, 132120576) [2026-04-08 08:57:38.456568 INFO buffer_manager] Allocated weights buffer at (6887780352, 57344) [2026-04-08 08:57:38.456569 INFO buffer_manager] Allocated weights buffer at (6887837696, 0) [2026-04-08 08:57:38.456571 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=19, cache_slot=19) planned desc only [2026-04-08 08:57:38.492718 INFO buffer_manager] Allocated weights buffer at (6887837696, 0) [2026-04-08 08:57:38.492732 INFO buffer_manager] Allocated weights buffer at (6887837696, 132120576) [2026-04-08 08:57:38.492741 INFO buffer_manager] Allocated weights buffer at (7019958272, 57344) [2026-04-08 08:57:38.492743 INFO buffer_manager] Allocated weights buffer at (7020015616, 132120576) [2026-04-08 08:57:38.492745 INFO buffer_manager] Allocated weights buffer at (7152136192, 57344) [2026-04-08 08:57:38.492746 INFO buffer_manager] Allocated weights buffer at (7152193536, 132120576) [2026-04-08 08:57:38.492748 INFO buffer_manager] Allocated weights buffer at (7284314112, 57344) [2026-04-08 08:57:38.492749 INFO buffer_manager] Allocated weights buffer at (7284371456, 0) [2026-04-08 08:57:38.492751 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=20, cache_slot=20) planned desc only [2026-04-08 08:57:38.528927 INFO buffer_manager] Allocated weights buffer at (7284371456, 0) [2026-04-08 08:57:38.528941 INFO buffer_manager] Allocated weights buffer at (7284371456, 132120576) [2026-04-08 08:57:38.528943 INFO buffer_manager] Allocated weights buffer at (7416492032, 57344) [2026-04-08 08:57:38.528945 INFO buffer_manager] Allocated weights buffer at (7416549376, 132120576) [2026-04-08 08:57:38.528946 INFO buffer_manager] Allocated weights buffer at (7548669952, 57344) [2026-04-08 08:57:38.528948 INFO buffer_manager] Allocated weights buffer at (7548727296, 132120576) [2026-04-08 08:57:38.528949 INFO buffer_manager] Allocated weights buffer at (7680847872, 57344) [2026-04-08 08:57:38.528951 INFO buffer_manager] Allocated weights buffer at (7680905216, 0) [2026-04-08 08:57:38.528952 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=21, cache_slot=21) planned desc only [2026-04-08 08:57:38.565051 INFO buffer_manager] Allocated weights buffer at (7680905216, 0) [2026-04-08 08:57:38.565065 INFO buffer_manager] Allocated weights buffer at (7680905216, 132120576) [2026-04-08 08:57:38.565067 INFO buffer_manager] Allocated weights buffer at (7813025792, 57344) [2026-04-08 08:57:38.565069 INFO buffer_manager] Allocated weights buffer at (7813083136, 132120576) [2026-04-08 08:57:38.565071 INFO buffer_manager] Allocated weights buffer at (7945203712, 57344) [2026-04-08 08:57:38.565072 INFO buffer_manager] Allocated weights buffer at (7945261056, 132120576) [2026-04-08 08:57:38.565074 INFO buffer_manager] Allocated weights buffer at (8077381632, 57344) [2026-04-08 08:57:38.565075 INFO buffer_manager] Allocated weights buffer at (8077438976, 0) [2026-04-08 08:57:38.565077 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=22, cache_slot=22) planned desc only [2026-04-08 08:57:38.601352 INFO buffer_manager] Allocated weights buffer at (8077438976, 0) [2026-04-08 08:57:38.601366 INFO buffer_manager] Allocated weights buffer at (8077438976, 132120576) [2026-04-08 08:57:38.601369 INFO buffer_manager] Allocated weights buffer at (8209559552, 57344) [2026-04-08 08:57:38.601375 INFO buffer_manager] Allocated weights buffer at (8209616896, 132120576) [2026-04-08 08:57:38.601377 INFO buffer_manager] Allocated weights buffer at (8341737472, 57344) [2026-04-08 08:57:38.601378 INFO buffer_manager] Allocated weights buffer at (8341794816, 132120576) [2026-04-08 08:57:38.601380 INFO buffer_manager] Allocated weights buffer at (8473915392, 57344) [2026-04-08 08:57:38.601381 INFO buffer_manager] Allocated weights buffer at (8473972736, 0) [2026-04-08 08:57:38.601383 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=23, cache_slot=23) planned desc only [2026-04-08 08:57:38.637589 INFO buffer_manager] Allocated weights buffer at (8473972736, 0) [2026-04-08 08:57:38.637603 INFO buffer_manager] Allocated weights buffer at (8473972736, 132120576) [2026-04-08 08:57:38.637605 INFO buffer_manager] Allocated weights buffer at (8606093312, 57344) [2026-04-08 08:57:38.637607 INFO buffer_manager] Allocated weights buffer at (8606150656, 132120576) [2026-04-08 08:57:38.637608 INFO buffer_manager] Allocated weights buffer at (8738271232, 57344) [2026-04-08 08:57:38.637610 INFO buffer_manager] Allocated weights buffer at (8738328576, 132120576) [2026-04-08 08:57:38.637611 INFO buffer_manager] Allocated weights buffer at (8870449152, 57344) [2026-04-08 08:57:38.637617 INFO buffer_manager] Allocated weights buffer at (8870506496, 0) [2026-04-08 08:57:38.637618 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=24, cache_slot=24) planned desc only [2026-04-08 08:57:38.673771 INFO buffer_manager] Allocated weights buffer at (8870506496, 0) [2026-04-08 08:57:38.673795 INFO buffer_manager] Allocated weights buffer at (8870506496, 132120576) [2026-04-08 08:57:38.673802 INFO buffer_manager] Allocated weights buffer at (9002627072, 57344) [2026-04-08 08:57:38.673803 INFO buffer_manager] Allocated weights buffer at (9002684416, 132120576) [2026-04-08 08:57:38.673805 INFO buffer_manager] Allocated weights buffer at (9134804992, 57344) [2026-04-08 08:57:38.673806 INFO buffer_manager] Allocated weights buffer at (9134862336, 132120576) [2026-04-08 08:57:38.673808 INFO buffer_manager] Allocated weights buffer at (9266982912, 57344) [2026-04-08 08:57:38.673809 INFO buffer_manager] Allocated weights buffer at (9267040256, 0) [2026-04-08 08:57:38.673811 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=25, cache_slot=25) planned desc only [2026-04-08 08:57:38.709910 INFO buffer_manager] Allocated weights buffer at (9267040256, 0) [2026-04-08 08:57:38.709925 INFO buffer_manager] Allocated weights buffer at (9267040256, 132120576) [2026-04-08 08:57:38.709927 INFO buffer_manager] Allocated weights buffer at (9399160832, 57344) [2026-04-08 08:57:38.709928 INFO buffer_manager] Allocated weights buffer at (9399218176, 132120576) [2026-04-08 08:57:38.709930 INFO buffer_manager] Allocated weights buffer at (9531338752, 57344) [2026-04-08 08:57:38.709931 INFO buffer_manager] Allocated weights buffer at (9531396096, 132120576) [2026-04-08 08:57:38.709933 INFO buffer_manager] Allocated weights buffer at (9663516672, 57344) [2026-04-08 08:57:38.709934 INFO buffer_manager] Allocated weights buffer at (9663574016, 0) [2026-04-08 08:57:38.709936 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=26, cache_slot=26) planned desc only [2026-04-08 08:57:38.746160 INFO buffer_manager] Allocated weights buffer at (9663574016, 0) [2026-04-08 08:57:38.746174 INFO buffer_manager] Allocated weights buffer at (9663574016, 132120576) [2026-04-08 08:57:38.746176 INFO buffer_manager] Allocated weights buffer at (9795694592, 57344) [2026-04-08 08:57:38.746178 INFO buffer_manager] Allocated weights buffer at (9795751936, 132120576) [2026-04-08 08:57:38.746180 INFO buffer_manager] Allocated weights buffer at (9927872512, 57344) [2026-04-08 08:57:38.746181 INFO buffer_manager] Allocated weights buffer at (9927929856, 132120576) [2026-04-08 08:57:38.746183 INFO buffer_manager] Allocated weights buffer at (10060050432, 57344) [2026-04-08 08:57:38.746184 INFO buffer_manager] Allocated weights buffer at (10060107776, 0) [2026-04-08 08:57:38.746186 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=27, cache_slot=27) planned desc only [2026-04-08 08:57:38.782352 INFO buffer_manager] Allocated weights buffer at (10060107776, 0) [2026-04-08 08:57:38.782365 INFO buffer_manager] Allocated weights buffer at (10060107776, 132120576) [2026-04-08 08:57:38.782368 INFO buffer_manager] Allocated weights buffer at (10192228352, 57344) [2026-04-08 08:57:38.782369 INFO buffer_manager] Allocated weights buffer at (10192285696, 132120576) [2026-04-08 08:57:38.782371 INFO buffer_manager] Allocated weights buffer at (10324406272, 57344) [2026-04-08 08:57:38.782372 INFO buffer_manager] Allocated weights buffer at (10324463616, 132120576) [2026-04-08 08:57:38.782374 INFO buffer_manager] Allocated weights buffer at (10456584192, 57344) [2026-04-08 08:57:38.782375 INFO buffer_manager] Allocated weights buffer at (10456641536, 0) [2026-04-08 08:57:38.782377 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=28, cache_slot=28) planned desc only [2026-04-08 08:57:38.818497 INFO buffer_manager] Allocated weights buffer at (10456641536, 0) [2026-04-08 08:57:38.818511 INFO buffer_manager] Allocated weights buffer at (10456641536, 132120576) [2026-04-08 08:57:38.818517 INFO buffer_manager] Allocated weights buffer at (10588762112, 57344) [2026-04-08 08:57:38.818519 INFO buffer_manager] Allocated weights buffer at (10588819456, 132120576) [2026-04-08 08:57:38.818521 INFO buffer_manager] Allocated weights buffer at (10720940032, 57344) [2026-04-08 08:57:38.818522 INFO buffer_manager] Allocated weights buffer at (10720997376, 132120576) [2026-04-08 08:57:38.818524 INFO buffer_manager] Allocated weights buffer at (10853117952, 57344) [2026-04-08 08:57:38.818525 INFO buffer_manager] Allocated weights buffer at (10853175296, 0) [2026-04-08 08:57:38.818527 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=29, cache_slot=29) planned desc only [2026-04-08 08:57:38.854607 INFO buffer_manager] Allocated weights buffer at (10853175296, 0) [2026-04-08 08:57:38.854622 INFO buffer_manager] Allocated weights buffer at (10853175296, 132120576) [2026-04-08 08:57:38.854624 INFO buffer_manager] Allocated weights buffer at (10985295872, 57344) [2026-04-08 08:57:38.854625 INFO buffer_manager] Allocated weights buffer at (10985353216, 132120576) [2026-04-08 08:57:38.854627 INFO buffer_manager] Allocated weights buffer at (11117473792, 57344) [2026-04-08 08:57:38.854628 INFO buffer_manager] Allocated weights buffer at (11117531136, 132120576) [2026-04-08 08:57:38.854630 INFO buffer_manager] Allocated weights buffer at (11249651712, 57344) [2026-04-08 08:57:38.854631 INFO buffer_manager] Allocated weights buffer at (11249709056, 0) [2026-04-08 08:57:38.854633 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=30, cache_slot=30) planned desc only [2026-04-08 08:57:38.890817 INFO buffer_manager] Allocated weights buffer at (11249709056, 0) [2026-04-08 08:57:38.890832 INFO buffer_manager] Allocated weights buffer at (11249709056, 132120576) [2026-04-08 08:57:38.890834 INFO buffer_manager] Allocated weights buffer at (11381829632, 57344) [2026-04-08 08:57:38.890835 INFO buffer_manager] Allocated weights buffer at (11381886976, 132120576) [2026-04-08 08:57:38.890837 INFO buffer_manager] Allocated weights buffer at (11514007552, 57344) [2026-04-08 08:57:38.890839 INFO buffer_manager] Allocated weights buffer at (11514064896, 132120576) [2026-04-08 08:57:38.890840 INFO buffer_manager] Allocated weights buffer at (11646185472, 57344) [2026-04-08 08:57:38.890842 INFO buffer_manager] Allocated weights buffer at (11646242816, 0) [2026-04-08 08:57:38.890844 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=31, cache_slot=31) planned desc only [2026-04-08 08:57:38.927028 INFO buffer_manager] Allocated weights buffer at (11646242816, 0) [2026-04-08 08:57:38.927043 INFO buffer_manager] Allocated weights buffer at (11646242816, 132120576) [2026-04-08 08:57:38.927045 INFO buffer_manager] Allocated weights buffer at (11778363392, 57344) [2026-04-08 08:57:38.927047 INFO buffer_manager] Allocated weights buffer at (11778420736, 132120576) [2026-04-08 08:57:38.927048 INFO buffer_manager] Allocated weights buffer at (11910541312, 57344) [2026-04-08 08:57:38.927050 INFO buffer_manager] Allocated weights buffer at (11910598656, 132120576) [2026-04-08 08:57:38.927051 INFO buffer_manager] Allocated weights buffer at (12042719232, 57344) [2026-04-08 08:57:38.927053 INFO buffer_manager] Allocated weights buffer at (12042776576, 0) [2026-04-08 08:57:38.927054 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=32, cache_slot=32) planned desc only [2026-04-08 08:57:38.963260 INFO buffer_manager] Allocated weights buffer at (12042776576, 0) [2026-04-08 08:57:38.963275 INFO buffer_manager] Allocated weights buffer at (12042776576, 132120576) [2026-04-08 08:57:38.963278 INFO buffer_manager] Allocated weights buffer at (12174897152, 57344) [2026-04-08 08:57:38.963280 INFO buffer_manager] Allocated weights buffer at (12174954496, 132120576) [2026-04-08 08:57:38.963283 INFO buffer_manager] Allocated weights buffer at (12307075072, 57344) [2026-04-08 08:57:38.963286 INFO buffer_manager] Allocated weights buffer at (12307132416, 132120576) [2026-04-08 08:57:38.963288 INFO buffer_manager] Allocated weights buffer at (12439252992, 57344) [2026-04-08 08:57:38.963294 INFO buffer_manager] Allocated weights buffer at (12439310336, 0) [2026-04-08 08:57:38.963296 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=33, cache_slot=33) planned desc only [2026-04-08 08:57:38.999415 INFO buffer_manager] Allocated weights buffer at (12439310336, 0) [2026-04-08 08:57:38.999428 INFO buffer_manager] Allocated weights buffer at (12439310336, 132120576) [2026-04-08 08:57:38.999430 INFO buffer_manager] Allocated weights buffer at (12571430912, 57344) [2026-04-08 08:57:38.999432 INFO buffer_manager] Allocated weights buffer at (12571488256, 132120576) [2026-04-08 08:57:38.999434 INFO buffer_manager] Allocated weights buffer at (12703608832, 57344) [2026-04-08 08:57:38.999440 INFO buffer_manager] Allocated weights buffer at (12703666176, 132120576) [2026-04-08 08:57:38.999442 INFO buffer_manager] Allocated weights buffer at (12835786752, 57344) [2026-04-08 08:57:38.999443 INFO buffer_manager] Allocated weights buffer at (12835844096, 0) [2026-04-08 08:57:38.999445 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=34, cache_slot=34) planned desc only [2026-04-08 08:57:39.035645 INFO buffer_manager] Allocated weights buffer at (12835844096, 0) [2026-04-08 08:57:39.035660 INFO buffer_manager] Allocated weights buffer at (12835844096, 132120576) [2026-04-08 08:57:39.035662 INFO buffer_manager] Allocated weights buffer at (12967964672, 57344) [2026-04-08 08:57:39.035664 INFO buffer_manager] Allocated weights buffer at (12968022016, 132120576) [2026-04-08 08:57:39.035665 INFO buffer_manager] Allocated weights buffer at (13100142592, 57344) [2026-04-08 08:57:39.035668 INFO buffer_manager] Allocated weights buffer at (13100199936, 132120576) [2026-04-08 08:57:39.035670 INFO buffer_manager] Allocated weights buffer at (13232320512, 57344) [2026-04-08 08:57:39.035672 INFO buffer_manager] Allocated weights buffer at (13232377856, 0) [2026-04-08 08:57:39.035673 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=35, cache_slot=35) planned desc only [2026-04-08 08:57:39.071840 INFO buffer_manager] Allocated weights buffer at (13232377856, 0) [2026-04-08 08:57:39.071853 INFO buffer_manager] Allocated weights buffer at (13232377856, 132120576) [2026-04-08 08:57:39.071855 INFO buffer_manager] Allocated weights buffer at (13364498432, 57344) [2026-04-08 08:57:39.071857 INFO buffer_manager] Allocated weights buffer at (13364555776, 132120576) [2026-04-08 08:57:39.071858 INFO buffer_manager] Allocated weights buffer at (13496676352, 57344) [2026-04-08 08:57:39.071860 INFO buffer_manager] Allocated weights buffer at (13496733696, 132120576) [2026-04-08 08:57:39.071862 INFO buffer_manager] Allocated weights buffer at (13628854272, 57344) [2026-04-08 08:57:39.071863 INFO buffer_manager] Allocated weights buffer at (13628911616, 0) [2026-04-08 08:57:39.071865 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=36, cache_slot=36) planned desc only [2026-04-08 08:57:39.108027 INFO buffer_manager] Allocated weights buffer at (13628911616, 0) [2026-04-08 08:57:39.108042 INFO buffer_manager] Allocated weights buffer at (13628911616, 132120576) [2026-04-08 08:57:39.108044 INFO buffer_manager] Allocated weights buffer at (13761032192, 57344) [2026-04-08 08:57:39.108046 INFO buffer_manager] Allocated weights buffer at (13761089536, 132120576) [2026-04-08 08:57:39.108051 INFO buffer_manager] Allocated weights buffer at (13893210112, 57344) [2026-04-08 08:57:39.108053 INFO buffer_manager] Allocated weights buffer at (13893267456, 132120576) [2026-04-08 08:57:39.108055 INFO buffer_manager] Allocated weights buffer at (14025388032, 57344) [2026-04-08 08:57:39.108056 INFO buffer_manager] Allocated weights buffer at (14025445376, 0) [2026-04-08 08:57:39.108058 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=37, cache_slot=37) planned desc only [2026-04-08 08:57:39.144194 INFO buffer_manager] Allocated weights buffer at (14025445376, 0) [2026-04-08 08:57:39.144208 INFO buffer_manager] Allocated weights buffer at (14025445376, 132120576) [2026-04-08 08:57:39.144214 INFO buffer_manager] Allocated weights buffer at (14157565952, 57344) [2026-04-08 08:57:39.144215 INFO buffer_manager] Allocated weights buffer at (14157623296, 132120576) [2026-04-08 08:57:39.144217 INFO buffer_manager] Allocated weights buffer at (14289743872, 57344) [2026-04-08 08:57:39.144218 INFO buffer_manager] Allocated weights buffer at (14289801216, 132120576) [2026-04-08 08:57:39.144220 INFO buffer_manager] Allocated weights buffer at (14421921792, 57344) [2026-04-08 08:57:39.144221 INFO buffer_manager] Allocated weights buffer at (14421979136, 0) [2026-04-08 08:57:39.144223 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=38, cache_slot=38) planned desc only [2026-04-08 08:57:39.180538 INFO buffer_manager] Allocated weights buffer at (14421979136, 0) [2026-04-08 08:57:39.180553 INFO buffer_manager] Allocated weights buffer at (14421979136, 132120576) [2026-04-08 08:57:39.180555 INFO buffer_manager] Allocated weights buffer at (14554099712, 57344) [2026-04-08 08:57:39.180557 INFO buffer_manager] Allocated weights buffer at (14554157056, 132120576) [2026-04-08 08:57:39.180559 INFO buffer_manager] Allocated weights buffer at (14686277632, 57344) [2026-04-08 08:57:39.180560 INFO buffer_manager] Allocated weights buffer at (14686334976, 132120576) [2026-04-08 08:57:39.180562 INFO buffer_manager] Allocated weights buffer at (14818455552, 57344) [2026-04-08 08:57:39.180563 INFO buffer_manager] Allocated weights buffer at (14818512896, 0) [2026-04-08 08:57:39.180565 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=39, cache_slot=39) planned desc only [2026-04-08 08:57:39.216797 INFO buffer_manager] Allocated weights buffer at (14818512896, 0) [2026-04-08 08:57:39.216812 INFO buffer_manager] Allocated weights buffer at (14818512896, 132120576) [2026-04-08 08:57:39.216814 INFO buffer_manager] Allocated weights buffer at (14950633472, 57344) [2026-04-08 08:57:39.216816 INFO buffer_manager] Allocated weights buffer at (14950690816, 132120576) [2026-04-08 08:57:39.216817 INFO buffer_manager] Allocated weights buffer at (15082811392, 57344) [2026-04-08 08:57:39.216819 INFO buffer_manager] Allocated weights buffer at (15082868736, 132120576) [2026-04-08 08:57:39.216820 INFO buffer_manager] Allocated weights buffer at (15214989312, 57344) [2026-04-08 08:57:39.216822 INFO buffer_manager] Allocated weights buffer at (15215046656, 0) [2026-04-08 08:57:39.216823 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=40, cache_slot=40) planned desc only [2026-04-08 08:57:39.253005 INFO buffer_manager] Allocated weights buffer at (15215046656, 0) [2026-04-08 08:57:39.253019 INFO buffer_manager] Allocated weights buffer at (15215046656, 132120576) [2026-04-08 08:57:39.253021 INFO buffer_manager] Allocated weights buffer at (15347167232, 57344) [2026-04-08 08:57:39.253022 INFO buffer_manager] Allocated weights buffer at (15347224576, 132120576) [2026-04-08 08:57:39.253024 INFO buffer_manager] Allocated weights buffer at (15479345152, 57344) [2026-04-08 08:57:39.253025 INFO buffer_manager] Allocated weights buffer at (15479402496, 132120576) [2026-04-08 08:57:39.253027 INFO buffer_manager] Allocated weights buffer at (15611523072, 57344) [2026-04-08 08:57:39.253028 INFO buffer_manager] Allocated weights buffer at (15611580416, 0) [2026-04-08 08:57:39.253030 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=41, cache_slot=41) planned desc only [2026-04-08 08:57:39.289185 INFO buffer_manager] Allocated weights buffer at (15611580416, 0) [2026-04-08 08:57:39.289201 INFO buffer_manager] Allocated weights buffer at (15611580416, 132120576) [2026-04-08 08:57:39.289203 INFO buffer_manager] Allocated weights buffer at (15743700992, 57344) [2026-04-08 08:57:39.289205 INFO buffer_manager] Allocated weights buffer at (15743758336, 132120576) [2026-04-08 08:57:39.289207 INFO buffer_manager] Allocated weights buffer at (15875878912, 57344) [2026-04-08 08:57:39.289208 INFO buffer_manager] Allocated weights buffer at (15875936256, 132120576) [2026-04-08 08:57:39.289210 INFO buffer_manager] Allocated weights buffer at (16008056832, 57344) [2026-04-08 08:57:39.289214 INFO buffer_manager] Allocated weights buffer at (16008114176, 0) [2026-04-08 08:57:39.289216 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=42, cache_slot=42) planned desc only [2026-04-08 08:57:39.325404 INFO buffer_manager] Allocated weights buffer at (16008114176, 0) [2026-04-08 08:57:39.325418 INFO buffer_manager] Allocated weights buffer at (16008114176, 132120576) [2026-04-08 08:57:39.325420 INFO buffer_manager] Allocated weights buffer at (16140234752, 57344) [2026-04-08 08:57:39.325422 INFO buffer_manager] Allocated weights buffer at (16140292096, 132120576) [2026-04-08 08:57:39.325423 INFO buffer_manager] Allocated weights buffer at (16272412672, 57344) [2026-04-08 08:57:39.325425 INFO buffer_manager] Allocated weights buffer at (16272470016, 132120576) [2026-04-08 08:57:39.325426 INFO buffer_manager] Allocated weights buffer at (16404590592, 57344) [2026-04-08 08:57:39.325428 INFO buffer_manager] Allocated weights buffer at (16404647936, 0) [2026-04-08 08:57:39.325429 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=43, cache_slot=43) planned desc only [2026-04-08 08:57:39.361644 INFO buffer_manager] Allocated weights buffer at (16404647936, 0) [2026-04-08 08:57:39.361658 INFO buffer_manager] Allocated weights buffer at (16404647936, 132120576) [2026-04-08 08:57:39.361660 INFO buffer_manager] Allocated weights buffer at (16536768512, 57344) [2026-04-08 08:57:39.361662 INFO buffer_manager] Allocated weights buffer at (16536825856, 132120576) [2026-04-08 08:57:39.361663 INFO buffer_manager] Allocated weights buffer at (16668946432, 57344) [2026-04-08 08:57:39.361665 INFO buffer_manager] Allocated weights buffer at (16669003776, 132120576) [2026-04-08 08:57:39.361666 INFO buffer_manager] Allocated weights buffer at (16801124352, 57344) [2026-04-08 08:57:39.361668 INFO buffer_manager] Allocated weights buffer at (16801181696, 0) [2026-04-08 08:57:39.361669 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=44, cache_slot=44) planned desc only [2026-04-08 08:57:39.397747 INFO buffer_manager] Allocated weights buffer at (16801181696, 0) [2026-04-08 08:57:39.397760 INFO buffer_manager] Allocated weights buffer at (16801181696, 132120576) [2026-04-08 08:57:39.397763 INFO buffer_manager] Allocated weights buffer at (16933302272, 57344) [2026-04-08 08:57:39.397765 INFO buffer_manager] Allocated weights buffer at (16933359616, 132120576) [2026-04-08 08:57:39.397766 INFO buffer_manager] Allocated weights buffer at (17065480192, 57344) [2026-04-08 08:57:39.397768 INFO buffer_manager] Allocated weights buffer at (17065537536, 132120576) [2026-04-08 08:57:39.397770 INFO buffer_manager] Allocated weights buffer at (17197658112, 57344) [2026-04-08 08:57:39.397771 INFO buffer_manager] Allocated weights buffer at (17197715456, 0) [2026-04-08 08:57:39.397773 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=45, cache_slot=45) planned desc only [2026-04-08 08:57:39.433828 INFO buffer_manager] Allocated weights buffer at (17197715456, 0) [2026-04-08 08:57:39.433843 INFO buffer_manager] Allocated weights buffer at (17197715456, 132120576) [2026-04-08 08:57:39.433845 INFO buffer_manager] Allocated weights buffer at (17329836032, 57344) [2026-04-08 08:57:39.433846 INFO buffer_manager] Allocated weights buffer at (17329893376, 132120576) [2026-04-08 08:57:39.433848 INFO buffer_manager] Allocated weights buffer at (17462013952, 57344) [2026-04-08 08:57:39.433849 INFO buffer_manager] Allocated weights buffer at (17462071296, 132120576) [2026-04-08 08:57:39.433851 INFO buffer_manager] Allocated weights buffer at (17594191872, 57344) [2026-04-08 08:57:39.433852 INFO buffer_manager] Allocated weights buffer at (17594249216, 0) [2026-04-08 08:57:39.433854 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=46, cache_slot=46) planned desc only [2026-04-08 08:57:39.469998 INFO buffer_manager] Allocated weights buffer at (17594249216, 0) [2026-04-08 08:57:39.470012 INFO buffer_manager] Allocated weights buffer at (17594249216, 132120576) [2026-04-08 08:57:39.470018 INFO buffer_manager] Allocated weights buffer at (17726369792, 57344) [2026-04-08 08:57:39.470020 INFO buffer_manager] Allocated weights buffer at (17726427136, 132120576) [2026-04-08 08:57:39.470022 INFO buffer_manager] Allocated weights buffer at (17858547712, 57344) [2026-04-08 08:57:39.470023 INFO buffer_manager] Allocated weights buffer at (17858605056, 132120576) [2026-04-08 08:57:39.470025 INFO buffer_manager] Allocated weights buffer at (17990725632, 57344) [2026-04-08 08:57:39.470026 INFO buffer_manager] Allocated weights buffer at (17990782976, 0) [2026-04-08 08:57:39.470028 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=47, cache_slot=47) planned desc only [2026-04-08 08:57:39.506185 INFO buffer_manager] Allocated weights buffer at (17990782976, 0) [2026-04-08 08:57:39.506200 INFO buffer_manager] Allocated weights buffer at (17990782976, 132120576) [2026-04-08 08:57:39.506202 INFO buffer_manager] Allocated weights buffer at (18122903552, 57344) [2026-04-08 08:57:39.506204 INFO buffer_manager] Allocated weights buffer at (18122960896, 132120576) [2026-04-08 08:57:39.506205 INFO buffer_manager] Allocated weights buffer at (18255081472, 57344) [2026-04-08 08:57:39.506207 INFO buffer_manager] Allocated weights buffer at (18255138816, 132120576) [2026-04-08 08:57:39.506208 INFO buffer_manager] Allocated weights buffer at (18387259392, 57344) [2026-04-08 08:57:39.506210 INFO buffer_manager] Allocated weights buffer at (18387316736, 0) [2026-04-08 08:57:39.506211 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=48, cache_slot=48) planned desc only [2026-04-08 08:57:39.542313 INFO buffer_manager] Allocated weights buffer at (18387316736, 0) [2026-04-08 08:57:39.542327 INFO buffer_manager] Allocated weights buffer at (18387316736, 132120576) [2026-04-08 08:57:39.542328 INFO buffer_manager] Allocated weights buffer at (18519437312, 57344) [2026-04-08 08:57:39.542330 INFO buffer_manager] Allocated weights buffer at (18519494656, 132120576) [2026-04-08 08:57:39.542332 INFO buffer_manager] Allocated weights buffer at (18651615232, 57344) [2026-04-08 08:57:39.542333 INFO buffer_manager] Allocated weights buffer at (18651672576, 132120576) [2026-04-08 08:57:39.542335 INFO buffer_manager] Allocated weights buffer at (18783793152, 57344) [2026-04-08 08:57:39.542336 INFO buffer_manager] Allocated weights buffer at (18783850496, 0) [2026-04-08 08:57:39.542338 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=49, cache_slot=49) planned desc only [2026-04-08 08:57:39.578394 INFO buffer_manager] Allocated weights buffer at (18783850496, 0) [2026-04-08 08:57:39.578408 INFO buffer_manager] Allocated weights buffer at (18783850496, 132120576) [2026-04-08 08:57:39.578411 INFO buffer_manager] Allocated weights buffer at (18915971072, 57344) [2026-04-08 08:57:39.578412 INFO buffer_manager] Allocated weights buffer at (18916028416, 132120576) [2026-04-08 08:57:39.578414 INFO buffer_manager] Allocated weights buffer at (19048148992, 57344) [2026-04-08 08:57:39.578415 INFO buffer_manager] Allocated weights buffer at (19048206336, 132120576) [2026-04-08 08:57:39.578417 INFO buffer_manager] Allocated weights buffer at (19180326912, 57344) [2026-04-08 08:57:39.578418 INFO buffer_manager] Allocated weights buffer at (19180384256, 0) [2026-04-08 08:57:39.578420 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=50, cache_slot=50) planned desc only [2026-04-08 08:57:39.614568 INFO buffer_manager] Allocated weights buffer at (19180384256, 0) [2026-04-08 08:57:39.614582 INFO buffer_manager] Allocated weights buffer at (19180384256, 132120576) [2026-04-08 08:57:39.614584 INFO buffer_manager] Allocated weights buffer at (19312504832, 57344) [2026-04-08 08:57:39.614586 INFO buffer_manager] Allocated weights buffer at (19312562176, 132120576) [2026-04-08 08:57:39.614587 INFO buffer_manager] Allocated weights buffer at (19444682752, 57344) [2026-04-08 08:57:39.614589 INFO buffer_manager] Allocated weights buffer at (19444740096, 132120576) [2026-04-08 08:57:39.614594 INFO buffer_manager] Allocated weights buffer at (19576860672, 57344) [2026-04-08 08:57:39.614596 INFO buffer_manager] Allocated weights buffer at (19576918016, 0) [2026-04-08 08:57:39.614597 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=51, cache_slot=51) planned desc only [2026-04-08 08:57:39.650779 INFO buffer_manager] Allocated weights buffer at (19576918016, 0) [2026-04-08 08:57:39.650793 INFO buffer_manager] Allocated weights buffer at (19576918016, 132120576) [2026-04-08 08:57:39.650795 INFO buffer_manager] Allocated weights buffer at (19709038592, 57344) [2026-04-08 08:57:39.650797 INFO buffer_manager] Allocated weights buffer at (19709095936, 132120576) [2026-04-08 08:57:39.650799 INFO buffer_manager] Allocated weights buffer at (19841216512, 57344) [2026-04-08 08:57:39.650800 INFO buffer_manager] Allocated weights buffer at (19841273856, 132120576) [2026-04-08 08:57:39.650802 INFO buffer_manager] Allocated weights buffer at (19973394432, 57344) [2026-04-08 08:57:39.650803 INFO buffer_manager] Allocated weights buffer at (19973451776, 0) [2026-04-08 08:57:39.650805 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=52, cache_slot=52) planned desc only [2026-04-08 08:57:39.686924 INFO buffer_manager] Allocated weights buffer at (19973451776, 0) [2026-04-08 08:57:39.686944 INFO buffer_manager] Allocated weights buffer at (19973451776, 132120576) [2026-04-08 08:57:39.686947 INFO buffer_manager] Allocated weights buffer at (20105572352, 57344) [2026-04-08 08:57:39.686948 INFO buffer_manager] Allocated weights buffer at (20105629696, 132120576) [2026-04-08 08:57:39.686950 INFO buffer_manager] Allocated weights buffer at (20237750272, 57344) [2026-04-08 08:57:39.686951 INFO buffer_manager] Allocated weights buffer at (20237807616, 132120576) [2026-04-08 08:57:39.686953 INFO buffer_manager] Allocated weights buffer at (20369928192, 57344) [2026-04-08 08:57:39.686954 INFO buffer_manager] Allocated weights buffer at (20369985536, 0) [2026-04-08 08:57:39.686956 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=53, cache_slot=53) planned desc only [2026-04-08 08:57:39.723242 INFO buffer_manager] Allocated weights buffer at (20369985536, 0) [2026-04-08 08:57:39.723255 INFO buffer_manager] Allocated weights buffer at (20369985536, 132120576) [2026-04-08 08:57:39.723257 INFO buffer_manager] Allocated weights buffer at (20502106112, 57344) [2026-04-08 08:57:39.723259 INFO buffer_manager] Allocated weights buffer at (20502163456, 132120576) [2026-04-08 08:57:39.723260 INFO buffer_manager] Allocated weights buffer at (20634284032, 57344) [2026-04-08 08:57:39.723262 INFO buffer_manager] Allocated weights buffer at (20634341376, 132120576) [2026-04-08 08:57:39.723263 INFO buffer_manager] Allocated weights buffer at (20766461952, 57344) [2026-04-08 08:57:39.723265 INFO buffer_manager] Allocated weights buffer at (20766519296, 0) [2026-04-08 08:57:39.723266 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=54, cache_slot=54) planned desc only [2026-04-08 08:57:39.759642 INFO buffer_manager] Allocated weights buffer at (20766519296, 0) [2026-04-08 08:57:39.759656 INFO buffer_manager] Allocated weights buffer at (20766519296, 132120576) [2026-04-08 08:57:39.759658 INFO buffer_manager] Allocated weights buffer at (20898639872, 57344) [2026-04-08 08:57:39.759660 INFO buffer_manager] Allocated weights buffer at (20898697216, 132120576) [2026-04-08 08:57:39.759661 INFO buffer_manager] Allocated weights buffer at (21030817792, 57344) [2026-04-08 08:57:39.759663 INFO buffer_manager] Allocated weights buffer at (21030875136, 132120576) [2026-04-08 08:57:39.759664 INFO buffer_manager] Allocated weights buffer at (21162995712, 57344) [2026-04-08 08:57:39.759666 INFO buffer_manager] Allocated weights buffer at (21163053056, 0) [2026-04-08 08:57:39.759667 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=55, cache_slot=55) planned desc only [2026-04-08 08:57:39.795962 INFO buffer_manager] Allocated weights buffer at (21163053056, 0) [2026-04-08 08:57:39.795979 INFO buffer_manager] Allocated weights buffer at (21163053056, 132120576) [2026-04-08 08:57:39.795982 INFO buffer_manager] Allocated weights buffer at (21295173632, 57344) [2026-04-08 08:57:39.795984 INFO buffer_manager] Allocated weights buffer at (21295230976, 132120576) [2026-04-08 08:57:39.795985 INFO buffer_manager] Allocated weights buffer at (21427351552, 57344) [2026-04-08 08:57:39.795987 INFO buffer_manager] Allocated weights buffer at (21427408896, 132120576) [2026-04-08 08:57:39.795988 INFO buffer_manager] Allocated weights buffer at (21559529472, 57344) [2026-04-08 08:57:39.795990 INFO buffer_manager] Allocated weights buffer at (21559586816, 0) [2026-04-08 08:57:39.795991 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=56, cache_slot=56) planned desc only [2026-04-08 08:57:39.832231 INFO buffer_manager] Allocated weights buffer at (21559586816, 0) [2026-04-08 08:57:39.832245 INFO buffer_manager] Allocated weights buffer at (21559586816, 132120576) [2026-04-08 08:57:39.832247 INFO buffer_manager] Allocated weights buffer at (21691707392, 57344) [2026-04-08 08:57:39.832249 INFO buffer_manager] Allocated weights buffer at (21691764736, 132120576) [2026-04-08 08:57:39.832251 INFO buffer_manager] Allocated weights buffer at (21823885312, 57344) [2026-04-08 08:57:39.832252 INFO buffer_manager] Allocated weights buffer at (21823942656, 132120576) [2026-04-08 08:57:39.832254 INFO buffer_manager] Allocated weights buffer at (21956063232, 57344) [2026-04-08 08:57:39.832255 INFO buffer_manager] Allocated weights buffer at (21956120576, 0) [2026-04-08 08:57:39.832257 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=57, cache_slot=57) planned desc only [2026-04-08 08:57:39.868489 INFO buffer_manager] Allocated weights buffer at (21956120576, 0) [2026-04-08 08:57:39.868503 INFO buffer_manager] Allocated weights buffer at (21956120576, 132120576) [2026-04-08 08:57:39.868505 INFO buffer_manager] Allocated weights buffer at (22088241152, 57344) [2026-04-08 08:57:39.868507 INFO buffer_manager] Allocated weights buffer at (22088298496, 132120576) [2026-04-08 08:57:39.868509 INFO buffer_manager] Allocated weights buffer at (22220419072, 57344) [2026-04-08 08:57:39.868510 INFO buffer_manager] Allocated weights buffer at (22220476416, 132120576) [2026-04-08 08:57:39.868512 INFO buffer_manager] Allocated weights buffer at (22352596992, 57344) [2026-04-08 08:57:39.868513 INFO buffer_manager] Allocated weights buffer at (22352654336, 0) [2026-04-08 08:57:39.868515 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=58, cache_slot=58) planned desc only [2026-04-08 08:57:39.904816 INFO buffer_manager] Allocated weights buffer at (22352654336, 0) [2026-04-08 08:57:39.904831 INFO buffer_manager] Allocated weights buffer at (22352654336, 132120576) [2026-04-08 08:57:39.904833 INFO buffer_manager] Allocated weights buffer at (22484774912, 57344) [2026-04-08 08:57:39.904835 INFO buffer_manager] Allocated weights buffer at (22484832256, 132120576) [2026-04-08 08:57:39.904836 INFO buffer_manager] Allocated weights buffer at (22616952832, 57344) [2026-04-08 08:57:39.904838 INFO buffer_manager] Allocated weights buffer at (22617010176, 132120576) [2026-04-08 08:57:39.904840 INFO buffer_manager] Allocated weights buffer at (22749130752, 57344) [2026-04-08 08:57:39.904841 INFO buffer_manager] Allocated weights buffer at (22749188096, 0) [2026-04-08 08:57:39.904843 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=59, cache_slot=59) planned desc only [2026-04-08 08:57:39.941212 INFO buffer_manager] Allocated weights buffer at (22749188096, 0) [2026-04-08 08:57:39.941226 INFO buffer_manager] Allocated weights buffer at (22749188096, 132120576) [2026-04-08 08:57:39.941228 INFO buffer_manager] Allocated weights buffer at (22881308672, 57344) [2026-04-08 08:57:39.941230 INFO buffer_manager] Allocated weights buffer at (22881366016, 132120576) [2026-04-08 08:57:39.941231 INFO buffer_manager] Allocated weights buffer at (23013486592, 57344) [2026-04-08 08:57:39.941233 INFO buffer_manager] Allocated weights buffer at (23013543936, 132120576) [2026-04-08 08:57:39.941237 INFO buffer_manager] Allocated weights buffer at (23145664512, 57344) [2026-04-08 08:57:39.941239 INFO buffer_manager] Allocated weights buffer at (23145721856, 0) [2026-04-08 08:57:39.941240 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=60, cache_slot=60) planned desc only [2026-04-08 08:57:40.304256 INFO buffer_manager] Allocated weights buffer at (23145721856, 0) [2026-04-08 08:57:40.304278 INFO buffer_manager] Allocated weights buffer at (23145721856, 132120576) [2026-04-08 08:57:40.304280 INFO buffer_manager] Allocated weights buffer at (23277842432, 57344) [2026-04-08 08:57:40.304282 INFO buffer_manager] Allocated weights buffer at (23277899776, 132120576) [2026-04-08 08:57:40.304283 INFO buffer_manager] Allocated weights buffer at (23410020352, 57344) [2026-04-08 08:57:40.304285 INFO buffer_manager] Allocated weights buffer at (23410077696, 132120576) [2026-04-08 08:57:40.304287 INFO buffer_manager] Allocated weights buffer at (23542198272, 57344) [2026-04-08 08:57:40.304288 INFO buffer_manager] Allocated weights buffer at (23542255616, 0) [2026-04-08 08:57:40.304290 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=61, cache_slot=61) planned desc only [2026-04-08 08:57:44.279109 INFO fp8_dpdk_common] fp9 fast path forced on by default in the current kernel build [2026-04-08 08:57:44.544601 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=256, expert_tiles=752, avg_tile_batch=3.49, prepare=3.489812ms, send=20.257244ms, judge_wait=198.030463ms, fetch=25.581299ms, reduce=22ns; duck time-ns stats: p50=192.208331ms, p90=192.509959ms, max=192.861489ms; kernel_model: matmul=7.222591 GFLOP (37.450 GFLOP/s @ duck_max), param_stream=1.034945G (5.366 Gparam/s @ duck_max), weight_stream=1110.857 MiB (6.040 GB/s @ duck_max) [2026-04-08 08:57:44.807265 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=256, expert_tiles=754, avg_tile_batch=3.48, prepare=599.786µs, send=17.227038ms, judge_wait=200.863663ms, fetch=26.714956ms, reduce=149ns; duck time-ns stats: p50=190.223745ms, p90=190.891081ms, max=191.172646ms; kernel_model: matmul=7.222591 GFLOP (37.780 GFLOP/s @ duck_max), param_stream=1.037697G (5.428 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.109 GB/s @ duck_max) [2026-04-08 08:57:45.077381 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=256, expert_tiles=752, avg_tile_batch=3.49, prepare=656.79µs, send=17.227405ms, judge_wait=207.788255ms, fetch=26.92832ms, reduce=137ns; duck time-ns stats: p50=193.935664ms, p90=194.168224ms, max=194.520781ms; kernel_model: matmul=7.222591 GFLOP (37.130 GFLOP/s @ duck_max), param_stream=1.034945G (5.320 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.988 GB/s @ duck_max) [2026-04-08 08:57:45.345979 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=256, expert_tiles=751, avg_tile_batch=3.49, prepare=662.422µs, send=18.434477ms, judge_wait=205.11211ms, fetch=25.998948ms, reduce=139ns; duck time-ns stats: p50=190.734249ms, p90=190.920086ms, max=191.031108ms; kernel_model: matmul=7.222591 GFLOP (37.808 GFLOP/s @ duck_max), param_stream=1.033568G (5.410 Gparam/s @ duck_max), weight_stream=1109.380 MiB (6.089 GB/s @ duck_max) [2026-04-08 08:57:45.372079 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=136, expert_tiles=137, avg_tile_batch=1.40, prepare=39.047µs, send=1.420822ms, judge_wait=22.151006ms, fetch=1.481144ms, reduce=136ns; duck time-ns stats: p50=21.92785ms, p90=21.950981ms, max=21.995556ms; kernel_model: matmul=0.528482 GFLOP (24.027 GFLOP/s @ duck_max), param_stream=0.188547G (8.572 Gparam/s @ duck_max), weight_stream=202.377 MiB (9.648 GB/s @ duck_max) [2026-04-08 08:57:45.659970 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=256, expert_tiles=750, avg_tile_batch=3.50, prepare=761.88µs, send=18.47116ms, judge_wait=207.833501ms, fetch=26.967121ms, reduce=20ns; duck time-ns stats: p50=192.642556ms, p90=192.959504ms, max=193.194698ms; kernel_model: matmul=7.222591 GFLOP (37.385 GFLOP/s @ duck_max), param_stream=1.032192G (5.343 Gparam/s @ duck_max), weight_stream=1107.903 MiB (6.013 GB/s @ duck_max) [2026-04-08 08:57:45.922660 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=256, expert_tiles=755, avg_tile_batch=3.48, prepare=649.406µs, send=18.502693ms, judge_wait=207.503089ms, fetch=20.641693ms, reduce=136ns; duck time-ns stats: p50=192.584981ms, p90=192.774881ms, max=192.932849ms; kernel_model: matmul=7.222591 GFLOP (37.436 GFLOP/s @ duck_max), param_stream=1.039073G (5.386 Gparam/s @ duck_max), weight_stream=1115.289 MiB (6.062 GB/s @ duck_max) [2026-04-08 08:57:46.187316 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=254, expert_tiles=759, avg_tile_batch=3.46, prepare=651.883µs, send=18.527825ms, judge_wait=209.492048ms, fetch=20.652225ms, reduce=20ns; duck time-ns stats: p50=193.561313ms, p90=194.0877ms, max=194.352297ms; kernel_model: matmul=7.222591 GFLOP (37.162 GFLOP/s @ duck_max), param_stream=1.044578G (5.375 Gparam/s @ duck_max), weight_stream=1121.197 MiB (6.049 GB/s @ duck_max) [2026-04-08 08:57:46.452845 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=254, expert_tiles=756, avg_tile_batch=3.47, prepare=650.572µs, send=18.56782ms, judge_wait=210.329248ms, fetch=20.662659ms, reduce=20ns; duck time-ns stats: p50=194.117055ms, p90=194.432878ms, max=194.753852ms; kernel_model: matmul=7.222591 GFLOP (37.086 GFLOP/s @ duck_max), param_stream=1.040450G (5.342 Gparam/s @ duck_max), weight_stream=1116.766 MiB (6.013 GB/s @ duck_max) [2026-04-08 08:57:46.478972 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=136, expert_tiles=136, avg_tile_batch=1.41, prepare=51.678µs, send=1.450623ms, judge_wait=22.115639ms, fetch=1.4878ms, reduce=136ns; duck time-ns stats: p50=21.881081ms, p90=21.911291ms, max=21.966892ms; kernel_model: matmul=0.528482 GFLOP (24.058 GFLOP/s @ duck_max), param_stream=0.187171G (8.521 Gparam/s @ duck_max), weight_stream=200.900 MiB (9.590 GB/s @ duck_max) [2026-04-08 08:57:46.766939 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=256, expert_tiles=760, avg_tile_batch=3.45, prepare=785.06µs, send=18.438381ms, judge_wait=212.978485ms, fetch=20.694207ms, reduce=135ns; duck time-ns stats: p50=196.933403ms, p90=197.206135ms, max=197.311811ms; kernel_model: matmul=7.222591 GFLOP (36.605 GFLOP/s @ duck_max), param_stream=1.045955G (5.301 Gparam/s @ duck_max), weight_stream=1122.675 MiB (5.966 GB/s @ duck_max) [2026-04-08 08:57:47.033384 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=255, expert_tiles=754, avg_tile_batch=3.48, prepare=666.497µs, send=17.226454ms, judge_wait=212.536753ms, fetch=20.638228ms, reduce=141ns; duck time-ns stats: p50=195.052554ms, p90=195.324868ms, max=195.345193ms; kernel_model: matmul=7.222591 GFLOP (36.973 GFLOP/s @ duck_max), param_stream=1.037697G (5.312 Gparam/s @ duck_max), weight_stream=1113.811 MiB (5.979 GB/s @ duck_max) [2026-04-08 08:57:47.299123 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=255, expert_tiles=755, avg_tile_batch=3.48, prepare=657.849µs, send=18.569797ms, judge_wait=210.457295ms, fetch=20.645918ms, reduce=159ns; duck time-ns stats: p50=193.539845ms, p90=193.91339ms, max=194.061656ms; kernel_model: matmul=7.222591 GFLOP (37.218 GFLOP/s @ duck_max), param_stream=1.039073G (5.354 Gparam/s @ duck_max), weight_stream=1115.289 MiB (6.026 GB/s @ duck_max) [2026-04-08 08:57:47.565211 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=255, expert_tiles=744, avg_tile_batch=3.53, prepare=655.543µs, send=18.507068ms, judge_wait=210.932595ms, fetch=20.661839ms, reduce=22ns; duck time-ns stats: p50=193.520453ms, p90=193.928441ms, max=194.335286ms; kernel_model: matmul=7.222591 GFLOP (37.166 GFLOP/s @ duck_max), param_stream=1.023934G (5.269 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.930 GB/s @ duck_max) [2026-04-08 08:57:47.591930 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=133, expert_tiles=133, avg_tile_batch=1.44, prepare=53.147µs, send=1.437156ms, judge_wait=22.740536ms, fetch=1.483988ms, reduce=110ns; duck time-ns stats: p50=22.498471ms, p90=22.540155ms, max=22.586215ms; kernel_model: matmul=0.528482 GFLOP (23.398 GFLOP/s @ duck_max), param_stream=0.183042G (8.104 Gparam/s @ duck_max), weight_stream=196.468 MiB (9.121 GB/s @ duck_max) [2026-04-08 08:57:47.880989 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=253, expert_tiles=756, avg_tile_batch=3.47, prepare=777.04µs, send=18.483401ms, judge_wait=214.006943ms, fetch=21.103606ms, reduce=19ns; duck time-ns stats: p50=196.155888ms, p90=196.523543ms, max=196.700434ms; kernel_model: matmul=7.222591 GFLOP (36.719 GFLOP/s @ duck_max), param_stream=1.040450G (5.290 Gparam/s @ duck_max), weight_stream=1116.766 MiB (5.953 GB/s @ duck_max) [2026-04-08 08:57:48.151249 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=251, expert_tiles=751, avg_tile_batch=3.49, prepare=662.133µs, send=18.568006ms, judge_wait=212.389768ms, fetch=20.644072ms, reduce=142ns; duck time-ns stats: p50=194.864762ms, p90=195.021981ms, max=195.225954ms; kernel_model: matmul=7.222591 GFLOP (36.996 GFLOP/s @ duck_max), param_stream=1.033568G (5.294 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.959 GB/s @ duck_max) [2026-04-08 08:57:48.419186 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=253, expert_tiles=764, avg_tile_batch=3.43, prepare=653.133µs, send=18.54643ms, judge_wait=212.676073ms, fetch=20.657988ms, reduce=136ns; duck time-ns stats: p50=194.811616ms, p90=195.151714ms, max=195.336924ms; kernel_model: matmul=7.222591 GFLOP (36.975 GFLOP/s @ duck_max), param_stream=1.051460G (5.383 Gparam/s @ duck_max), weight_stream=1128.583 MiB (6.058 GB/s @ duck_max) [2026-04-08 08:57:48.686949 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=249, expert_tiles=752, avg_tile_batch=3.49, prepare=655.497µs, send=18.516841ms, judge_wait=212.502795ms, fetch=20.653223ms, reduce=20ns; duck time-ns stats: p50=194.554342ms, p90=194.854286ms, max=195.41558ms; kernel_model: matmul=7.222591 GFLOP (36.960 GFLOP/s @ duck_max), param_stream=1.034945G (5.296 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.961 GB/s @ duck_max) [2026-04-08 08:57:48.713296 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=133, expert_tiles=134, avg_tile_batch=1.43, prepare=54.669µs, send=1.333458ms, judge_wait=22.504624ms, fetch=1.471226ms, reduce=21ns; duck time-ns stats: p50=22.293525ms, p90=22.335785ms, max=22.362509ms; kernel_model: matmul=0.528482 GFLOP (23.633 GFLOP/s @ duck_max), param_stream=0.184418G (8.247 Gparam/s @ duck_max), weight_stream=197.945 MiB (9.282 GB/s @ duck_max) [2026-04-08 08:57:48.998061 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=253, expert_tiles=753, avg_tile_batch=3.48, prepare=768.595µs, send=18.396926ms, judge_wait=208.95623ms, fetch=21.631328ms, reduce=137ns; duck time-ns stats: p50=190.588496ms, p90=191.017274ms, max=191.327715ms; kernel_model: matmul=7.222591 GFLOP (37.750 GFLOP/s @ duck_max), param_stream=1.036321G (5.416 Gparam/s @ duck_max), weight_stream=1112.334 MiB (6.096 GB/s @ duck_max) [2026-04-08 08:57:49.267900 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=255, expert_tiles=756, avg_tile_batch=3.47, prepare=651.829µs, send=18.445706ms, judge_wait=213.746146ms, fetch=21.624839ms, reduce=136ns; duck time-ns stats: p50=194.146958ms, p90=194.305537ms, max=194.373757ms; kernel_model: matmul=7.222591 GFLOP (37.158 GFLOP/s @ duck_max), param_stream=1.040450G (5.353 Gparam/s @ duck_max), weight_stream=1116.766 MiB (6.025 GB/s @ duck_max) [2026-04-08 08:57:49.537457 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=251, expert_tiles=748, avg_tile_batch=3.51, prepare=667.458µs, send=18.570252ms, judge_wait=214.231674ms, fetch=20.659764ms, reduce=20ns; duck time-ns stats: p50=194.156745ms, p90=194.448048ms, max=194.586703ms; kernel_model: matmul=7.222591 GFLOP (37.118 GFLOP/s @ duck_max), param_stream=1.029439G (5.290 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.954 GB/s @ duck_max) [2026-04-08 08:57:49.810996 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=254, expert_tiles=754, avg_tile_batch=3.48, prepare=655.743µs, send=18.499288ms, judge_wait=217.338016ms, fetch=21.633373ms, reduce=134ns; duck time-ns stats: p50=197.213415ms, p90=197.463968ms, max=197.599852ms; kernel_model: matmul=7.222591 GFLOP (36.552 GFLOP/s @ duck_max), param_stream=1.037697G (5.252 Gparam/s @ duck_max), weight_stream=1113.811 MiB (5.911 GB/s @ duck_max) [2026-04-08 08:57:49.836116 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=129, expert_tiles=129, avg_tile_batch=1.49, prepare=55.434µs, send=1.35614ms, judge_wait=21.248186ms, fetch=1.479229ms, reduce=19ns; duck time-ns stats: p50=21.047047ms, p90=21.070047ms, max=21.096252ms; kernel_model: matmul=0.528482 GFLOP (25.051 GFLOP/s @ duck_max), param_stream=0.177537G (8.416 Gparam/s @ duck_max), weight_stream=190.559 MiB (9.472 GB/s @ duck_max) [2026-04-08 08:57:50.128337 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=252, expert_tiles=748, avg_tile_batch=3.51, prepare=780.14µs, send=19.416909ms, judge_wait=209.91631ms, fetch=27.291595ms, reduce=133ns; duck time-ns stats: p50=190.704498ms, p90=190.881144ms, max=191.630917ms; kernel_model: matmul=7.222591 GFLOP (37.690 GFLOP/s @ duck_max), param_stream=1.029439G (5.372 Gparam/s @ duck_max), weight_stream=1104.948 MiB (6.046 GB/s @ duck_max) [2026-04-08 08:57:50.396568 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=250, expert_tiles=749, avg_tile_batch=3.50, prepare=680.088µs, send=18.566951ms, judge_wait=213.049511ms, fetch=20.648315ms, reduce=20ns; duck time-ns stats: p50=195.124783ms, p90=195.384061ms, max=195.585111ms; kernel_model: matmul=7.222591 GFLOP (36.928 GFLOP/s @ duck_max), param_stream=1.030816G (5.270 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.932 GB/s @ duck_max) [2026-04-08 08:57:50.661754 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=249, expert_tiles=750, avg_tile_batch=3.50, prepare=650.979µs, send=17.223886ms, judge_wait=211.334161ms, fetch=20.621587ms, reduce=22ns; duck time-ns stats: p50=191.687034ms, p90=192.104084ms, max=192.446949ms; kernel_model: matmul=7.222591 GFLOP (37.530 GFLOP/s @ duck_max), param_stream=1.032192G (5.364 Gparam/s @ duck_max), weight_stream=1107.903 MiB (6.037 GB/s @ duck_max) [2026-04-08 08:57:50.929835 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=249, expert_tiles=749, avg_tile_batch=3.50, prepare=649.972µs, send=18.56601ms, judge_wait=212.835162ms, fetch=20.643552ms, reduce=20ns; duck time-ns stats: p50=193.87861ms, p90=194.094801ms, max=194.265546ms; kernel_model: matmul=7.222591 GFLOP (37.179 GFLOP/s @ duck_max), param_stream=1.030816G (5.306 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.972 GB/s @ duck_max) [2026-04-08 08:57:50.956543 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=116, expert_tiles=117, avg_tile_batch=1.64, prepare=54.185µs, send=1.373334ms, judge_wait=22.800924ms, fetch=1.478232ms, reduce=132ns; duck time-ns stats: p50=22.481775ms, p90=22.576592ms, max=22.64832ms; kernel_model: matmul=0.528482 GFLOP (23.334 GFLOP/s @ duck_max), param_stream=0.161022G (7.110 Gparam/s @ duck_max), weight_stream=172.833 MiB (8.002 GB/s @ duck_max) [2026-04-08 08:57:51.244195 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=254, expert_tiles=754, avg_tile_batch=3.48, prepare=770.538µs, send=18.382798ms, judge_wait=211.991991ms, fetch=21.625764ms, reduce=136ns; duck time-ns stats: p50=192.859637ms, p90=193.284999ms, max=193.449828ms; kernel_model: matmul=7.222591 GFLOP (37.336 GFLOP/s @ duck_max), param_stream=1.037697G (5.364 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.037 GB/s @ duck_max) [2026-04-08 08:57:51.513755 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=250, expert_tiles=753, avg_tile_batch=3.48, prepare=660.783µs, send=18.519238ms, judge_wait=213.367164ms, fetch=21.642262ms, reduce=135ns; duck time-ns stats: p50=194.380394ms, p90=194.697361ms, max=195.010843ms; kernel_model: matmul=7.222591 GFLOP (37.037 GFLOP/s @ duck_max), param_stream=1.036321G (5.314 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.981 GB/s @ duck_max) [2026-04-08 08:57:51.780234 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=254, expert_tiles=742, avg_tile_batch=3.54, prepare=647.453µs, send=18.565805ms, judge_wait=211.204212ms, fetch=20.670944ms, reduce=19ns; duck time-ns stats: p50=192.384158ms, p90=192.695436ms, max=192.831505ms; kernel_model: matmul=7.222591 GFLOP (37.455 GFLOP/s @ duck_max), param_stream=1.021182G (5.296 Gparam/s @ duck_max), weight_stream=1096.085 MiB (5.960 GB/s @ duck_max) [2026-04-08 08:57:52.045826 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=253, expert_tiles=743, avg_tile_batch=3.53, prepare=652.179µs, send=18.55243ms, judge_wait=210.355722ms, fetch=20.660933ms, reduce=20ns; duck time-ns stats: p50=190.326966ms, p90=190.58185ms, max=190.852267ms; kernel_model: matmul=7.222591 GFLOP (37.844 GFLOP/s @ duck_max), param_stream=1.022558G (5.358 Gparam/s @ duck_max), weight_stream=1097.562 MiB (6.030 GB/s @ duck_max) [2026-04-08 08:57:52.070576 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=118, expert_tiles=122, avg_tile_batch=1.57, prepare=52.09µs, send=1.395156ms, judge_wait=20.803664ms, fetch=1.484763ms, reduce=138ns; duck time-ns stats: p50=20.56657ms, p90=20.602187ms, max=20.646067ms; kernel_model: matmul=0.528482 GFLOP (25.597 GFLOP/s @ duck_max), param_stream=0.167903G (8.132 Gparam/s @ duck_max), weight_stream=180.219 MiB (9.153 GB/s @ duck_max) [2026-04-08 08:57:52.363451 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=251, expert_tiles=756, avg_tile_batch=3.47, prepare=798.49µs, send=17.214578ms, judge_wait=219.345783ms, fetch=20.655638ms, reduce=21ns; duck time-ns stats: p50=201.226291ms, p90=201.554371ms, max=201.697955ms; kernel_model: matmul=7.222591 GFLOP (35.809 GFLOP/s @ duck_max), param_stream=1.040450G (5.158 Gparam/s @ duck_max), weight_stream=1116.766 MiB (5.806 GB/s @ duck_max) [2026-04-08 08:57:52.639087 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=252, expert_tiles=756, avg_tile_batch=3.47, prepare=661.871µs, send=18.552497ms, judge_wait=219.420868ms, fetch=21.620048ms, reduce=133ns; duck time-ns stats: p50=200.123735ms, p90=200.427193ms, max=200.732941ms; kernel_model: matmul=7.222591 GFLOP (35.981 GFLOP/s @ duck_max), param_stream=1.040450G (5.183 Gparam/s @ duck_max), weight_stream=1116.766 MiB (5.834 GB/s @ duck_max) [2026-04-08 08:57:52.911338 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=252, expert_tiles=751, avg_tile_batch=3.49, prepare=662.83µs, send=18.564615ms, judge_wait=215.974528ms, fetch=21.595132ms, reduce=136ns; duck time-ns stats: p50=198.293732ms, p90=198.556613ms, max=198.616366ms; kernel_model: matmul=7.222591 GFLOP (36.365 GFLOP/s @ duck_max), param_stream=1.033568G (5.204 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.857 GB/s @ duck_max) [2026-04-08 08:57:53.188634 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=253, expert_tiles=755, avg_tile_batch=3.48, prepare=657.939µs, send=18.514847ms, judge_wait=221.049988ms, fetch=21.623311ms, reduce=136ns; duck time-ns stats: p50=200.728892ms, p90=200.927143ms, max=200.997534ms; kernel_model: matmul=7.222591 GFLOP (35.934 GFLOP/s @ duck_max), param_stream=1.039073G (5.170 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.818 GB/s @ duck_max) [2026-04-08 08:57:53.215238 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=106, expert_tiles=114, avg_tile_batch=1.68, prepare=55.52µs, send=1.335385ms, judge_wait=22.718527ms, fetch=1.476128ms, reduce=141ns; duck time-ns stats: p50=22.437869ms, p90=22.507455ms, max=22.573305ms; kernel_model: matmul=0.528482 GFLOP (23.412 GFLOP/s @ duck_max), param_stream=0.156893G (6.950 Gparam/s @ duck_max), weight_stream=168.401 MiB (7.823 GB/s @ duck_max) [2026-04-08 08:57:53.500499 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=252, expert_tiles=744, avg_tile_batch=3.53, prepare=764.86µs, send=18.531193ms, judge_wait=210.588874ms, fetch=20.647744ms, reduce=20ns; duck time-ns stats: p50=191.224483ms, p90=191.481966ms, max=191.744084ms; kernel_model: matmul=7.222591 GFLOP (37.668 GFLOP/s @ duck_max), param_stream=1.023934G (5.340 Gparam/s @ duck_max), weight_stream=1099.039 MiB (6.010 GB/s @ duck_max) [2026-04-08 08:57:53.769872 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=252, expert_tiles=748, avg_tile_batch=3.51, prepare=654.967µs, send=18.519547ms, judge_wait=214.186772ms, fetch=20.624932ms, reduce=133ns; duck time-ns stats: p50=193.417211ms, p90=193.569433ms, max=193.918991ms; kernel_model: matmul=7.222591 GFLOP (37.245 GFLOP/s @ duck_max), param_stream=1.029439G (5.309 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.975 GB/s @ duck_max) [2026-04-08 08:57:54.037627 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=252, expert_tiles=752, avg_tile_batch=3.49, prepare=650.035µs, send=18.537166ms, judge_wait=212.556114ms, fetch=20.664775ms, reduce=21ns; duck time-ns stats: p50=192.462304ms, p90=192.637219ms, max=192.67479ms; kernel_model: matmul=7.222591 GFLOP (37.486 GFLOP/s @ duck_max), param_stream=1.034945G (5.371 Gparam/s @ duck_max), weight_stream=1110.857 MiB (6.046 GB/s @ duck_max) [2026-04-08 08:57:54.308049 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=250, expert_tiles=752, avg_tile_batch=3.49, prepare=652.837µs, send=18.565827ms, judge_wait=215.236174ms, fetch=20.636381ms, reduce=135ns; duck time-ns stats: p50=194.122594ms, p90=194.421204ms, max=194.627753ms; kernel_model: matmul=7.222591 GFLOP (37.110 GFLOP/s @ duck_max), param_stream=1.034945G (5.318 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.985 GB/s @ duck_max) [2026-04-08 08:57:54.333562 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=124, expert_tiles=126, avg_tile_batch=1.52, prepare=53.578µs, send=1.432595ms, judge_wait=21.524718ms, fetch=1.476145ms, reduce=148ns; duck time-ns stats: p50=21.304435ms, p90=21.328356ms, max=21.348632ms; kernel_model: matmul=0.528482 GFLOP (24.755 GFLOP/s @ duck_max), param_stream=0.173408G (8.123 Gparam/s @ duck_max), weight_stream=186.128 MiB (9.142 GB/s @ duck_max) [2026-04-08 08:57:54.623809 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=251, expert_tiles=749, avg_tile_batch=3.50, prepare=765.539µs, send=18.555452ms, judge_wait=214.644023ms, fetch=21.63957ms, reduce=132ns; duck time-ns stats: p50=194.057855ms, p90=194.259068ms, max=194.546482ms; kernel_model: matmul=7.222591 GFLOP (37.125 GFLOP/s @ duck_max), param_stream=1.030816G (5.299 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.963 GB/s @ duck_max) [2026-04-08 08:57:54.891719 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=250, expert_tiles=746, avg_tile_batch=3.52, prepare=652.52µs, send=18.565579ms, judge_wait=212.677484ms, fetch=20.632884ms, reduce=133ns; duck time-ns stats: p50=192.893574ms, p90=193.456987ms, max=193.631523ms; kernel_model: matmul=7.222591 GFLOP (37.301 GFLOP/s @ duck_max), param_stream=1.026687G (5.302 Gparam/s @ duck_max), weight_stream=1101.994 MiB (5.968 GB/s @ duck_max) [2026-04-08 08:57:55.165679 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=254, expert_tiles=751, avg_tile_batch=3.49, prepare=656.178µs, send=18.563562ms, judge_wait=217.693507ms, fetch=21.679132ms, reduce=21ns; duck time-ns stats: p50=197.13056ms, p90=197.511361ms, max=197.739886ms; kernel_model: matmul=7.222591 GFLOP (36.526 GFLOP/s @ duck_max), param_stream=1.033568G (5.227 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.883 GB/s @ duck_max) [2026-04-08 08:57:55.435796 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=251, expert_tiles=754, avg_tile_batch=3.48, prepare=650.489µs, send=18.563612ms, judge_wait=213.972887ms, fetch=21.584968ms, reduce=20ns; duck time-ns stats: p50=192.557418ms, p90=193.134426ms, max=193.318571ms; kernel_model: matmul=7.222591 GFLOP (37.361 GFLOP/s @ duck_max), param_stream=1.037697G (5.368 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.041 GB/s @ duck_max) [2026-04-08 08:57:55.462351 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=105, expert_tiles=114, avg_tile_batch=1.68, prepare=53.36µs, send=1.477986ms, judge_wait=22.529022ms, fetch=1.482977ms, reduce=151ns; duck time-ns stats: p50=22.301937ms, p90=22.322903ms, max=22.341082ms; kernel_model: matmul=0.528482 GFLOP (23.655 GFLOP/s @ duck_max), param_stream=0.156893G (7.023 Gparam/s @ duck_max), weight_stream=168.401 MiB (7.904 GB/s @ duck_max) [2026-04-08 08:57:55.749638 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=250, expert_tiles=754, avg_tile_batch=3.48, prepare=767.332µs, send=17.229262ms, judge_wait=213.142861ms, fetch=21.63309ms, reduce=139ns; duck time-ns stats: p50=192.470528ms, p90=192.906992ms, max=192.941022ms; kernel_model: matmul=7.222591 GFLOP (37.434 GFLOP/s @ duck_max), param_stream=1.037697G (5.378 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.053 GB/s @ duck_max) [2026-04-08 08:57:56.022392 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=251, expert_tiles=755, avg_tile_batch=3.48, prepare=653.899µs, send=18.576277ms, judge_wait=216.495538ms, fetch=21.645652ms, reduce=140ns; duck time-ns stats: p50=195.082936ms, p90=195.251538ms, max=195.31514ms; kernel_model: matmul=7.222591 GFLOP (36.979 GFLOP/s @ duck_max), param_stream=1.039073G (5.320 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.988 GB/s @ duck_max) [2026-04-08 08:57:56.291766 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=252, expert_tiles=752, avg_tile_batch=3.49, prepare=648.613µs, send=18.559646ms, judge_wait=214.235598ms, fetch=20.63348ms, reduce=139ns; duck time-ns stats: p50=193.225069ms, p90=193.408956ms, max=193.792238ms; kernel_model: matmul=7.222591 GFLOP (37.270 GFLOP/s @ duck_max), param_stream=1.034945G (5.340 Gparam/s @ duck_max), weight_stream=1110.857 MiB (6.011 GB/s @ duck_max) [2026-04-08 08:57:56.561706 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=248, expert_tiles=751, avg_tile_batch=3.49, prepare=647.792µs, send=17.222407ms, judge_wait=215.018985ms, fetch=21.658426ms, reduce=146ns; duck time-ns stats: p50=192.71654ms, p90=192.939062ms, max=193.066521ms; kernel_model: matmul=7.222591 GFLOP (37.410 GFLOP/s @ duck_max), param_stream=1.033568G (5.353 Gparam/s @ duck_max), weight_stream=1109.380 MiB (6.025 GB/s @ duck_max) [2026-04-08 08:57:56.587004 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=113, expert_tiles=115, avg_tile_batch=1.67, prepare=52.862µs, send=1.368847ms, judge_wait=21.375863ms, fetch=1.48203ms, reduce=151ns; duck time-ns stats: p50=21.174673ms, p90=21.203012ms, max=21.224111ms; kernel_model: matmul=0.528482 GFLOP (24.900 GFLOP/s @ duck_max), param_stream=0.158269G (7.457 Gparam/s @ duck_max), weight_stream=169.878 MiB (8.393 GB/s @ duck_max) [2026-04-08 08:57:56.872683 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=244, expert_tiles=749, avg_tile_batch=3.50, prepare=759.702µs, send=17.212932ms, judge_wait=211.558524ms, fetch=21.580577ms, reduce=19ns; duck time-ns stats: p50=191.021521ms, p90=191.351543ms, max=191.62575ms; kernel_model: matmul=7.222591 GFLOP (37.691 GFLOP/s @ duck_max), param_stream=1.030816G (5.379 Gparam/s @ duck_max), weight_stream=1106.425 MiB (6.054 GB/s @ duck_max) [2026-04-08 08:57:57.144081 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=245, expert_tiles=749, avg_tile_batch=3.50, prepare=657.045µs, send=18.550536ms, judge_wait=216.038266ms, fetch=20.636494ms, reduce=138ns; duck time-ns stats: p50=195.441114ms, p90=196.454387ms, max=196.622629ms; kernel_model: matmul=7.222591 GFLOP (36.733 GFLOP/s @ duck_max), param_stream=1.030816G (5.243 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.900 GB/s @ duck_max) [2026-04-08 08:57:57.414163 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=246, expert_tiles=758, avg_tile_batch=3.46, prepare=651.225µs, send=18.432395ms, judge_wait=213.914861ms, fetch=21.656732ms, reduce=135ns; duck time-ns stats: p50=192.735369ms, p90=193.171275ms, max=193.642486ms; kernel_model: matmul=7.222591 GFLOP (37.299 GFLOP/s @ duck_max), param_stream=1.043202G (5.387 Gparam/s @ duck_max), weight_stream=1119.720 MiB (6.063 GB/s @ duck_max) [2026-04-08 08:57:57.685363 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=247, expert_tiles=759, avg_tile_batch=3.46, prepare=660.375µs, send=18.48705ms, judge_wait=214.994323ms, fetch=21.663188ms, reduce=138ns; duck time-ns stats: p50=193.776057ms, p90=194.093324ms, max=194.296328ms; kernel_model: matmul=7.222591 GFLOP (37.173 GFLOP/s @ duck_max), param_stream=1.044578G (5.376 Gparam/s @ duck_max), weight_stream=1121.197 MiB (6.051 GB/s @ duck_max) [2026-04-08 08:57:57.709616 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=101, expert_tiles=107, avg_tile_batch=1.79, prepare=52.973µs, send=1.372427ms, judge_wait=20.332121ms, fetch=1.488882ms, reduce=136ns; duck time-ns stats: p50=20.143905ms, p90=20.17512ms, max=20.190794ms; kernel_model: matmul=0.528482 GFLOP (26.174 GFLOP/s @ duck_max), param_stream=0.147259G (7.293 Gparam/s @ duck_max), weight_stream=158.061 MiB (8.209 GB/s @ duck_max) [2026-04-08 08:57:57.997747 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=241, expert_tiles=751, avg_tile_batch=3.49, prepare=771.294µs, send=18.561742ms, judge_wait=212.410531ms, fetch=21.636054ms, reduce=136ns; duck time-ns stats: p50=192.058605ms, p90=192.258672ms, max=192.404703ms; kernel_model: matmul=7.222591 GFLOP (37.539 GFLOP/s @ duck_max), param_stream=1.033568G (5.372 Gparam/s @ duck_max), weight_stream=1109.380 MiB (6.046 GB/s @ duck_max) [2026-04-08 08:57:58.265710 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=244, expert_tiles=753, avg_tile_batch=3.48, prepare=648.064µs, send=18.523125ms, judge_wait=212.849717ms, fetch=20.636645ms, reduce=19ns; duck time-ns stats: p50=191.948092ms, p90=192.145408ms, max=192.211238ms; kernel_model: matmul=7.222591 GFLOP (37.576 GFLOP/s @ duck_max), param_stream=1.036321G (5.392 Gparam/s @ duck_max), weight_stream=1112.334 MiB (6.068 GB/s @ duck_max) [2026-04-08 08:57:58.532036 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=244, expert_tiles=754, avg_tile_batch=3.48, prepare=648.336µs, send=17.226497ms, judge_wait=212.492457ms, fetch=20.622286ms, reduce=15ns; duck time-ns stats: p50=190.602902ms, p90=190.86642ms, max=190.94704ms; kernel_model: matmul=7.222591 GFLOP (37.825 GFLOP/s @ duck_max), param_stream=1.037697G (5.434 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.116 GB/s @ duck_max) [2026-04-08 08:57:58.801759 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=238, expert_tiles=752, avg_tile_batch=3.49, prepare=694.923µs, send=18.5577ms, judge_wait=214.521085ms, fetch=20.644546ms, reduce=20ns; duck time-ns stats: p50=193.428319ms, p90=193.620388ms, max=194.249742ms; kernel_model: matmul=7.222591 GFLOP (37.182 GFLOP/s @ duck_max), param_stream=1.034945G (5.328 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.996 GB/s @ duck_max) [2026-04-08 08:57:58.827558 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=93, expert_tiles=104, avg_tile_batch=1.85, prepare=52.499µs, send=1.449505ms, judge_wait=21.804568ms, fetch=1.48322ms, reduce=143ns; duck time-ns stats: p50=21.609144ms, p90=21.644425ms, max=21.656397ms; kernel_model: matmul=0.528482 GFLOP (24.403 GFLOP/s @ duck_max), param_stream=0.143131G (6.609 Gparam/s @ duck_max), weight_stream=153.629 MiB (7.439 GB/s @ duck_max) [2026-04-08 08:57:59.124380 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=237, expert_tiles=751, avg_tile_batch=3.49, prepare=763.291µs, send=18.573313ms, judge_wait=221.066821ms, fetch=21.637595ms, reduce=136ns; duck time-ns stats: p50=200.799735ms, p90=201.00054ms, max=201.166473ms; kernel_model: matmul=7.222591 GFLOP (35.904 GFLOP/s @ duck_max), param_stream=1.033568G (5.138 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.783 GB/s @ duck_max) [2026-04-08 08:57:59.395268 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=239, expert_tiles=745, avg_tile_batch=3.52, prepare=653.439µs, send=18.461638ms, judge_wait=214.760823ms, fetch=21.69335ms, reduce=19ns; duck time-ns stats: p50=192.610278ms, p90=192.813323ms, max=193.38489ms; kernel_model: matmul=7.222591 GFLOP (37.348 GFLOP/s @ duck_max), param_stream=1.025311G (5.302 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.967 GB/s @ duck_max) [2026-04-08 08:57:59.669173 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=247, expert_tiles=754, avg_tile_batch=3.48, prepare=652.04µs, send=18.550625ms, judge_wait=217.648441ms, fetch=21.628862ms, reduce=134ns; duck time-ns stats: p50=195.349007ms, p90=195.750431ms, max=195.894131ms; kernel_model: matmul=7.222591 GFLOP (36.870 GFLOP/s @ duck_max), param_stream=1.037697G (5.297 Gparam/s @ duck_max), weight_stream=1113.811 MiB (5.962 GB/s @ duck_max) [2026-04-08 08:57:59.941738 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=232, expert_tiles=747, avg_tile_batch=3.51, prepare=652.729µs, send=18.518897ms, judge_wait=216.225829ms, fetch=21.668215ms, reduce=20ns; duck time-ns stats: p50=195.084581ms, p90=195.429071ms, max=195.623533ms; kernel_model: matmul=7.222591 GFLOP (36.921 GFLOP/s @ duck_max), param_stream=1.028063G (5.255 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.915 GB/s @ duck_max) [2026-04-08 08:57:59.965078 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=99, expert_tiles=106, avg_tile_batch=1.81, prepare=53.813µs, send=1.290605ms, judge_wait=19.542405ms, fetch=1.474344ms, reduce=20ns; duck time-ns stats: p50=19.341003ms, p90=19.380729ms, max=19.401394ms; kernel_model: matmul=0.528482 GFLOP (27.239 GFLOP/s @ duck_max), param_stream=0.145883G (7.519 Gparam/s @ duck_max), weight_stream=156.584 MiB (8.463 GB/s @ duck_max) [2026-04-08 08:58:00.253980 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=249, expert_tiles=762, avg_tile_batch=3.44, prepare=779.506µs, send=18.474023ms, judge_wait=214.196145ms, fetch=20.625186ms, reduce=20ns; duck time-ns stats: p50=192.643531ms, p90=193.029077ms, max=193.067222ms; kernel_model: matmul=7.222591 GFLOP (37.410 GFLOP/s @ duck_max), param_stream=1.048707G (5.432 Gparam/s @ duck_max), weight_stream=1125.629 MiB (6.113 GB/s @ duck_max) [2026-04-08 08:58:00.526885 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=247, expert_tiles=760, avg_tile_batch=3.45, prepare=648.843µs, send=18.580346ms, judge_wait=217.691362ms, fetch=20.646557ms, reduce=20ns; duck time-ns stats: p50=196.395548ms, p90=196.625885ms, max=197.171245ms; kernel_model: matmul=7.222591 GFLOP (36.631 GFLOP/s @ duck_max), param_stream=1.045955G (5.305 Gparam/s @ duck_max), weight_stream=1122.675 MiB (5.970 GB/s @ duck_max) [2026-04-08 08:58:00.801432 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=240, expert_tiles=751, avg_tile_batch=3.49, prepare=656.799µs, send=18.561192ms, judge_wait=219.321049ms, fetch=20.639189ms, reduce=134ns; duck time-ns stats: p50=196.682043ms, p90=196.858744ms, max=197.069121ms; kernel_model: matmul=7.222591 GFLOP (36.650 GFLOP/s @ duck_max), param_stream=1.033568G (5.245 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.903 GB/s @ duck_max) [2026-04-08 08:58:01.072129 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=243, expert_tiles=745, avg_tile_batch=3.52, prepare=676.173µs, send=18.547125ms, judge_wait=214.534567ms, fetch=21.633001ms, reduce=136ns; duck time-ns stats: p50=194.172213ms, p90=194.440291ms, max=194.621456ms; kernel_model: matmul=7.222591 GFLOP (37.111 GFLOP/s @ duck_max), param_stream=1.025311G (5.268 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.929 GB/s @ duck_max) [2026-04-08 08:58:01.096178 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=109, expert_tiles=114, avg_tile_batch=1.68, prepare=53.076µs, send=1.484974ms, judge_wait=20.050808ms, fetch=1.48125ms, reduce=20ns; duck time-ns stats: p50=19.816118ms, p90=19.854814ms, max=19.872186ms; kernel_model: matmul=0.528482 GFLOP (26.594 GFLOP/s @ duck_max), param_stream=0.156893G (7.895 Gparam/s @ duck_max), weight_stream=168.401 MiB (8.886 GB/s @ duck_max) [2026-04-08 08:58:01.385299 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=241, expert_tiles=748, avg_tile_batch=3.51, prepare=779.392µs, send=17.215637ms, judge_wait=214.866686ms, fetch=21.618667ms, reduce=136ns; duck time-ns stats: p50=193.118928ms, p90=193.247446ms, max=193.492036ms; kernel_model: matmul=7.222591 GFLOP (37.328 GFLOP/s @ duck_max), param_stream=1.029439G (5.320 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.988 GB/s @ duck_max) [2026-04-08 08:58:01.659326 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=247, expert_tiles=753, avg_tile_batch=3.48, prepare=652.746µs, send=18.520119ms, judge_wait=217.872506ms, fetch=21.609809ms, reduce=135ns; duck time-ns stats: p50=196.179543ms, p90=196.414787ms, max=196.567393ms; kernel_model: matmul=7.222591 GFLOP (36.744 GFLOP/s @ duck_max), param_stream=1.036321G (5.272 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.934 GB/s @ duck_max) [2026-04-08 08:58:01.930504 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=243, expert_tiles=756, avg_tile_batch=3.47, prepare=652.155µs, send=18.569809ms, judge_wait=215.988995ms, fetch=20.642809ms, reduce=136ns; duck time-ns stats: p50=193.138611ms, p90=193.285035ms, max=193.573578ms; kernel_model: matmul=7.222591 GFLOP (37.312 GFLOP/s @ duck_max), param_stream=1.040450G (5.375 Gparam/s @ duck_max), weight_stream=1116.766 MiB (6.049 GB/s @ duck_max) [2026-04-08 08:58:02.202084 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=240, expert_tiles=751, avg_tile_batch=3.49, prepare=653.641µs, send=18.584702ms, judge_wait=216.223924ms, fetch=20.660815ms, reduce=138ns; duck time-ns stats: p50=193.390183ms, p90=193.675262ms, max=194.015922ms; kernel_model: matmul=7.222591 GFLOP (37.227 GFLOP/s @ duck_max), param_stream=1.033568G (5.327 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.996 GB/s @ duck_max) [2026-04-08 08:58:02.225962 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=97, expert_tiles=105, avg_tile_batch=1.83, prepare=53.467µs, send=1.303118ms, judge_wait=20.025906ms, fetch=1.485079ms, reduce=154ns; duck time-ns stats: p50=19.772411ms, p90=19.810247ms, max=19.859591ms; kernel_model: matmul=0.528482 GFLOP (26.611 GFLOP/s @ duck_max), param_stream=0.144507G (7.276 Gparam/s @ duck_max), weight_stream=155.106 MiB (8.190 GB/s @ duck_max) [2026-04-08 08:58:02.516832 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=754, avg_tile_batch=3.48, prepare=771.819µs, send=18.542895ms, judge_wait=215.209628ms, fetch=21.63043ms, reduce=138ns; duck time-ns stats: p50=191.801519ms, p90=192.22029ms, max=192.391396ms; kernel_model: matmul=7.222591 GFLOP (37.541 GFLOP/s @ duck_max), param_stream=1.037697G (5.394 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.071 GB/s @ duck_max) [2026-04-08 08:58:02.788193 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=242, expert_tiles=754, avg_tile_batch=3.48, prepare=653.179µs, send=18.558707ms, judge_wait=215.170535ms, fetch=21.616711ms, reduce=21ns; duck time-ns stats: p50=193.560214ms, p90=193.796057ms, max=193.984065ms; kernel_model: matmul=7.222591 GFLOP (37.233 GFLOP/s @ duck_max), param_stream=1.037697G (5.349 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.021 GB/s @ duck_max) [2026-04-08 08:58:03.062318 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=240, expert_tiles=762, avg_tile_batch=3.44, prepare=653.223µs, send=18.579764ms, judge_wait=218.931907ms, fetch=20.618021ms, reduce=20ns; duck time-ns stats: p50=197.324329ms, p90=197.722752ms, max=197.870818ms; kernel_model: matmul=7.222591 GFLOP (36.502 GFLOP/s @ duck_max), param_stream=1.048707G (5.300 Gparam/s @ duck_max), weight_stream=1125.629 MiB (5.965 GB/s @ duck_max) [2026-04-08 08:58:03.332622 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=239, expert_tiles=743, avg_tile_batch=3.53, prepare=652.647µs, send=17.214537ms, judge_wait=216.397869ms, fetch=20.651262ms, reduce=20ns; duck time-ns stats: p50=194.384326ms, p90=194.734982ms, max=194.921992ms; kernel_model: matmul=7.222591 GFLOP (37.054 GFLOP/s @ duck_max), param_stream=1.022558G (5.246 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.904 GB/s @ duck_max) [2026-04-08 08:58:03.357377 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=109, expert_tiles=114, avg_tile_batch=1.68, prepare=53.873µs, send=1.371707ms, judge_wait=20.833247ms, fetch=1.477192ms, reduce=20ns; duck time-ns stats: p50=20.608959ms, p90=20.664172ms, max=20.69561ms; kernel_model: matmul=0.528482 GFLOP (25.536 GFLOP/s @ duck_max), param_stream=0.156893G (7.581 Gparam/s @ duck_max), weight_stream=168.401 MiB (8.532 GB/s @ duck_max) [2026-04-08 08:58:03.646421 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=239, expert_tiles=745, avg_tile_batch=3.52, prepare=779.082µs, send=18.596051ms, judge_wait=214.41202ms, fetch=20.640016ms, reduce=135ns; duck time-ns stats: p50=191.509049ms, p90=191.759156ms, max=191.863174ms; kernel_model: matmul=7.222591 GFLOP (37.644 GFLOP/s @ duck_max), param_stream=1.025311G (5.344 Gparam/s @ duck_max), weight_stream=1100.517 MiB (6.015 GB/s @ duck_max) [2026-04-08 08:58:03.918539 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=247, expert_tiles=750, avg_tile_batch=3.50, prepare=651.361µs, send=18.566807ms, judge_wait=216.814608ms, fetch=20.672395ms, reduce=20ns; duck time-ns stats: p50=193.96362ms, p90=194.122941ms, max=194.283753ms; kernel_model: matmul=7.222591 GFLOP (37.175 GFLOP/s @ duck_max), param_stream=1.032192G (5.313 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.980 GB/s @ duck_max) [2026-04-08 08:58:04.191900 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=755, avg_tile_batch=3.48, prepare=663.593µs, send=18.524982ms, judge_wait=217.155782ms, fetch=21.639002ms, reduce=137ns; duck time-ns stats: p50=193.933327ms, p90=194.57429ms, max=194.830135ms; kernel_model: matmul=7.222591 GFLOP (37.071 GFLOP/s @ duck_max), param_stream=1.039073G (5.333 Gparam/s @ duck_max), weight_stream=1115.289 MiB (6.002 GB/s @ duck_max) [2026-04-08 08:58:04.468279 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=239, expert_tiles=754, avg_tile_batch=3.48, prepare=649.661µs, send=18.559029ms, judge_wait=220.107607ms, fetch=21.6598ms, reduce=136ns; duck time-ns stats: p50=198.640802ms, p90=198.879938ms, max=198.996834ms; kernel_model: matmul=7.222591 GFLOP (36.295 GFLOP/s @ duck_max), param_stream=1.037697G (5.215 Gparam/s @ duck_max), weight_stream=1113.811 MiB (5.869 GB/s @ duck_max) [2026-04-08 08:58:04.492427 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=94, expert_tiles=103, avg_tile_batch=1.86, prepare=53.825µs, send=1.378727ms, judge_wait=20.27122ms, fetch=1.471687ms, reduce=22ns; duck time-ns stats: p50=20.079304ms, p90=20.113557ms, max=20.131323ms; kernel_model: matmul=0.528482 GFLOP (26.252 GFLOP/s @ duck_max), param_stream=0.141754G (7.041 Gparam/s @ duck_max), weight_stream=152.152 MiB (7.925 GB/s @ duck_max) [2026-04-08 08:58:04.784805 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=244, expert_tiles=748, avg_tile_batch=3.51, prepare=793.022µs, send=17.236351ms, judge_wait=218.090584ms, fetch=21.65892ms, reduce=138ns; duck time-ns stats: p50=195.643384ms, p90=195.896273ms, max=196.199347ms; kernel_model: matmul=7.222591 GFLOP (36.813 GFLOP/s @ duck_max), param_stream=1.029439G (5.247 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.905 GB/s @ duck_max) [2026-04-08 08:58:05.060246 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=239, expert_tiles=744, avg_tile_batch=3.53, prepare=648.058µs, send=18.564684ms, judge_wait=219.125175ms, fetch=21.644985ms, reduce=135ns; duck time-ns stats: p50=196.557675ms, p90=196.897421ms, max=197.162377ms; kernel_model: matmul=7.222591 GFLOP (36.633 GFLOP/s @ duck_max), param_stream=1.023934G (5.193 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.845 GB/s @ duck_max) [2026-04-08 08:58:05.341041 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=235, expert_tiles=750, avg_tile_batch=3.50, prepare=654.233µs, send=18.477867ms, judge_wait=224.632389ms, fetch=21.639824ms, reduce=15ns; duck time-ns stats: p50=204.208862ms, p90=204.632365ms, max=204.728707ms; kernel_model: matmul=7.222591 GFLOP (35.279 GFLOP/s @ duck_max), param_stream=1.032192G (5.042 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.674 GB/s @ duck_max) [2026-04-08 08:58:05.615600 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=751, avg_tile_batch=3.49, prepare=659.476µs, send=18.555254ms, judge_wait=219.351049ms, fetch=20.648918ms, reduce=104ns; duck time-ns stats: p50=195.329433ms, p90=195.596688ms, max=195.907531ms; kernel_model: matmul=7.222591 GFLOP (36.867 GFLOP/s @ duck_max), param_stream=1.033568G (5.276 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.938 GB/s @ duck_max) [2026-04-08 08:58:05.638608 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=81, expert_tiles=94, avg_tile_batch=2.04, prepare=55.149µs, send=1.417813ms, judge_wait=19.054201ms, fetch=1.482653ms, reduce=109ns; duck time-ns stats: p50=18.848296ms, p90=18.888538ms, max=18.92049ms; kernel_model: matmul=0.528482 GFLOP (27.932 GFLOP/s @ duck_max), param_stream=0.129368G (6.837 Gparam/s @ duck_max), weight_stream=138.857 MiB (7.695 GB/s @ duck_max) [2026-04-08 08:58:05.927042 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=749, avg_tile_batch=3.50, prepare=758.49µs, send=17.215973ms, judge_wait=215.261538ms, fetch=20.635999ms, reduce=146ns; duck time-ns stats: p50=192.635613ms, p90=193.014784ms, max=193.29234ms; kernel_model: matmul=7.222591 GFLOP (37.366 GFLOP/s @ duck_max), param_stream=1.030816G (5.333 Gparam/s @ duck_max), weight_stream=1106.425 MiB (6.002 GB/s @ duck_max) [2026-04-08 08:58:06.223344 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=756, avg_tile_batch=3.47, prepare=659.997µs, send=18.582135ms, judge_wait=240.973242ms, fetch=20.649285ms, reduce=135ns; duck time-ns stats: p50=219.196777ms, p90=219.705716ms, max=219.939638ms; kernel_model: matmul=7.222591 GFLOP (32.839 GFLOP/s @ duck_max), param_stream=1.040450G (4.731 Gparam/s @ duck_max), weight_stream=1116.766 MiB (5.324 GB/s @ duck_max) [2026-04-08 08:58:06.497703 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=750, avg_tile_batch=3.50, prepare=651.189µs, send=18.494694ms, judge_wait=219.226333ms, fetch=20.630899ms, reduce=157ns; duck time-ns stats: p50=198.552629ms, p90=198.682526ms, max=198.870592ms; kernel_model: matmul=7.222591 GFLOP (36.318 GFLOP/s @ duck_max), param_stream=1.032192G (5.190 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.842 GB/s @ duck_max) [2026-04-08 08:58:06.767025 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=748, avg_tile_batch=3.51, prepare=669.255µs, send=18.564684ms, judge_wait=213.779344ms, fetch=20.654145ms, reduce=138ns; duck time-ns stats: p50=192.252023ms, p90=193.092678ms, max=193.309218ms; kernel_model: matmul=7.222591 GFLOP (37.363 GFLOP/s @ duck_max), param_stream=1.029439G (5.325 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.994 GB/s @ duck_max) [2026-04-08 08:58:06.790224 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=85, expert_tiles=97, avg_tile_batch=1.98, prepare=52.284µs, send=1.291202ms, judge_wait=19.367138ms, fetch=1.47712ms, reduce=100ns; duck time-ns stats: p50=19.090315ms, p90=19.13978ms, max=19.234724ms; kernel_model: matmul=0.528482 GFLOP (27.475 GFLOP/s @ duck_max), param_stream=0.133497G (6.940 Gparam/s @ duck_max), weight_stream=143.289 MiB (7.811 GB/s @ duck_max) [2026-04-08 08:58:07.117702 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=750, avg_tile_batch=3.50, prepare=768.933µs, send=18.543636ms, judge_wait=251.71333ms, fetch=21.670264ms, reduce=138ns; duck time-ns stats: p50=231.555561ms, p90=231.967909ms, max=232.585927ms; kernel_model: matmul=7.222591 GFLOP (31.053 GFLOP/s @ duck_max), param_stream=1.032192G (4.438 Gparam/s @ duck_max), weight_stream=1107.903 MiB (4.995 GB/s @ duck_max) [2026-04-08 08:58:07.392497 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=751, avg_tile_batch=3.49, prepare=649.148µs, send=18.481576ms, judge_wait=219.668228ms, fetch=20.633328ms, reduce=134ns; duck time-ns stats: p50=198.037855ms, p90=198.190338ms, max=198.241545ms; kernel_model: matmul=7.222591 GFLOP (36.433 GFLOP/s @ duck_max), param_stream=1.033568G (5.214 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.868 GB/s @ duck_max) [2026-04-08 08:58:07.677837 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=221, expert_tiles=750, avg_tile_batch=3.50, prepare=656.697µs, send=18.576041ms, judge_wait=229.055876ms, fetch=21.629368ms, reduce=106ns; duck time-ns stats: p50=207.78941ms, p90=207.933999ms, max=208.057407ms; kernel_model: matmul=7.222591 GFLOP (34.714 GFLOP/s @ duck_max), param_stream=1.032192G (4.961 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.584 GB/s @ duck_max) [2026-04-08 08:58:07.957692 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=748, avg_tile_batch=3.51, prepare=647.969µs, send=18.525251ms, judge_wait=223.594835ms, fetch=21.622894ms, reduce=102ns; duck time-ns stats: p50=202.183284ms, p90=202.652006ms, max=202.877613ms; kernel_model: matmul=7.222591 GFLOP (35.601 GFLOP/s @ duck_max), param_stream=1.029439G (5.074 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.711 GB/s @ duck_max) [2026-04-08 08:58:07.981387 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=92, expert_tiles=104, avg_tile_batch=1.85, prepare=53.757µs, send=1.338461ms, judge_wait=19.795392ms, fetch=1.485141ms, reduce=104ns; duck time-ns stats: p50=19.60957ms, p90=19.631472ms, max=19.648783ms; kernel_model: matmul=0.528482 GFLOP (26.896 GFLOP/s @ duck_max), param_stream=0.143131G (7.284 Gparam/s @ duck_max), weight_stream=153.629 MiB (8.199 GB/s @ duck_max) [2026-04-08 08:58:08.269317 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=224, expert_tiles=742, avg_tile_batch=3.54, prepare=774.707µs, send=18.561614ms, judge_wait=213.321273ms, fetch=20.639932ms, reduce=136ns; duck time-ns stats: p50=191.480353ms, p90=191.869838ms, max=192.323517ms; kernel_model: matmul=7.222591 GFLOP (37.554 GFLOP/s @ duck_max), param_stream=1.021182G (5.310 Gparam/s @ duck_max), weight_stream=1096.085 MiB (5.976 GB/s @ duck_max) [2026-04-08 08:58:08.542465 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=221, expert_tiles=753, avg_tile_batch=3.48, prepare=649.742µs, send=18.569019ms, judge_wait=216.957445ms, fetch=21.659108ms, reduce=20ns; duck time-ns stats: p50=194.651796ms, p90=194.870809ms, max=195.015288ms; kernel_model: matmul=7.222591 GFLOP (37.036 GFLOP/s @ duck_max), param_stream=1.036321G (5.314 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.981 GB/s @ duck_max) [2026-04-08 08:58:08.816455 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=220, expert_tiles=748, avg_tile_batch=3.51, prepare=655.383µs, send=18.547731ms, judge_wait=218.788364ms, fetch=20.651387ms, reduce=135ns; duck time-ns stats: p50=196.773593ms, p90=197.014488ms, max=197.481047ms; kernel_model: matmul=7.222591 GFLOP (36.574 GFLOP/s @ duck_max), param_stream=1.029439G (5.213 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.867 GB/s @ duck_max) [2026-04-08 08:58:09.099250 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=218, expert_tiles=744, avg_tile_batch=3.53, prepare=652.178µs, send=18.582ms, judge_wait=226.347383ms, fetch=21.618031ms, reduce=134ns; duck time-ns stats: p50=207.322797ms, p90=207.733293ms, max=207.980355ms; kernel_model: matmul=7.222591 GFLOP (34.727 GFLOP/s @ duck_max), param_stream=1.023934G (4.923 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.541 GB/s @ duck_max) [2026-04-08 08:58:09.122591 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=80, expert_tiles=96, avg_tile_batch=2.00, prepare=52.114µs, send=1.303506ms, judge_wait=19.488512ms, fetch=1.478151ms, reduce=135ns; duck time-ns stats: p50=19.220709ms, p90=19.252326ms, max=19.306661ms; kernel_model: matmul=0.528482 GFLOP (27.373 GFLOP/s @ duck_max), param_stream=0.132121G (6.843 Gparam/s @ duck_max), weight_stream=141.812 MiB (7.702 GB/s @ duck_max) [2026-04-08 08:58:09.413081 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=216, expert_tiles=750, avg_tile_batch=3.50, prepare=759.17µs, send=17.240808ms, judge_wait=216.28184ms, fetch=21.618751ms, reduce=19ns; duck time-ns stats: p50=194.638068ms, p90=194.905158ms, max=195.075913ms; kernel_model: matmul=7.222591 GFLOP (37.025 GFLOP/s @ duck_max), param_stream=1.032192G (5.291 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.955 GB/s @ duck_max) [2026-04-08 08:58:09.683676 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=215, expert_tiles=745, avg_tile_batch=3.52, prepare=666.069µs, send=18.568245ms, judge_wait=214.317634ms, fetch=21.58143ms, reduce=20ns; duck time-ns stats: p50=191.876795ms, p90=192.148882ms, max=192.275906ms; kernel_model: matmul=7.222591 GFLOP (37.564 GFLOP/s @ duck_max), param_stream=1.025311G (5.332 Gparam/s @ duck_max), weight_stream=1100.517 MiB (6.002 GB/s @ duck_max) [2026-04-08 08:58:09.952421 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=212, expert_tiles=740, avg_tile_batch=3.55, prepare=665.975µs, send=18.53599ms, judge_wait=212.323816ms, fetch=21.701197ms, reduce=135ns; duck time-ns stats: p50=191.070298ms, p90=191.274046ms, max=191.355546ms; kernel_model: matmul=7.222591 GFLOP (37.744 GFLOP/s @ duck_max), param_stream=1.018429G (5.322 Gparam/s @ duck_max), weight_stream=1093.130 MiB (5.990 GB/s @ duck_max) [2026-04-08 08:58:10.268000 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=218, expert_tiles=750, avg_tile_batch=3.50, prepare=660.451µs, send=18.403789ms, judge_wait=260.466952ms, fetch=20.604682ms, reduce=133ns; duck time-ns stats: p50=242.464294ms, p90=242.736181ms, max=242.869801ms; kernel_model: matmul=7.222591 GFLOP (29.739 GFLOP/s @ duck_max), param_stream=1.032192G (4.250 Gparam/s @ duck_max), weight_stream=1107.903 MiB (4.783 GB/s @ duck_max) [2026-04-08 08:58:10.291022 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=75, expert_tiles=91, avg_tile_batch=2.11, prepare=53.452µs, send=1.316167ms, judge_wait=19.15299ms, fetch=1.483906ms, reduce=135ns; duck time-ns stats: p50=18.952383ms, p90=18.99386ms, max=19.018377ms; kernel_model: matmul=0.528482 GFLOP (27.788 GFLOP/s @ duck_max), param_stream=0.125239G (6.585 Gparam/s @ duck_max), weight_stream=134.426 MiB (7.412 GB/s @ duck_max) [2026-04-08 08:58:10.590211 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=750, avg_tile_batch=3.50, prepare=778.765µs, send=18.576307ms, judge_wait=223.56699ms, fetch=21.620701ms, reduce=133ns; duck time-ns stats: p50=201.93429ms, p90=202.220497ms, max=202.389945ms; kernel_model: matmul=7.222591 GFLOP (35.687 GFLOP/s @ duck_max), param_stream=1.032192G (5.100 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.740 GB/s @ duck_max) [2026-04-08 08:58:10.859386 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=224, expert_tiles=746, avg_tile_batch=3.52, prepare=652.315µs, send=18.569869ms, judge_wait=212.903995ms, fetch=21.65222ms, reduce=19ns; duck time-ns stats: p50=190.394621ms, p90=190.579218ms, max=190.63349ms; kernel_model: matmul=7.222591 GFLOP (37.887 GFLOP/s @ duck_max), param_stream=1.026687G (5.386 Gparam/s @ duck_max), weight_stream=1101.994 MiB (6.061 GB/s @ duck_max) [2026-04-08 08:58:11.135503 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=223, expert_tiles=751, avg_tile_batch=3.49, prepare=656.641µs, send=18.534719ms, judge_wait=219.946106ms, fetch=21.599064ms, reduce=135ns; duck time-ns stats: p50=199.06809ms, p90=199.301998ms, max=199.423116ms; kernel_model: matmul=7.222591 GFLOP (36.217 GFLOP/s @ duck_max), param_stream=1.033568G (5.183 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.833 GB/s @ duck_max) [2026-04-08 08:58:11.418961 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=220, expert_tiles=744, avg_tile_batch=3.53, prepare=666.088µs, send=18.549304ms, judge_wait=228.16888ms, fetch=20.630258ms, reduce=20ns; duck time-ns stats: p50=206.961932ms, p90=207.303144ms, max=207.783358ms; kernel_model: matmul=7.222591 GFLOP (34.760 GFLOP/s @ duck_max), param_stream=1.023934G (4.928 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.546 GB/s @ duck_max) [2026-04-08 08:58:11.443367 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=92, expert_tiles=104, avg_tile_batch=1.85, prepare=52.706µs, send=1.315589ms, judge_wait=20.563536ms, fetch=1.477773ms, reduce=20ns; duck time-ns stats: p50=20.365048ms, p90=20.391499ms, max=20.425574ms; kernel_model: matmul=0.528482 GFLOP (25.874 GFLOP/s @ duck_max), param_stream=0.143131G (7.007 Gparam/s @ duck_max), weight_stream=153.629 MiB (7.887 GB/s @ duck_max) [2026-04-08 08:58:11.735786 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=214, expert_tiles=732, avg_tile_batch=3.58, prepare=782.688µs, send=20.91545ms, judge_wait=214.32918ms, fetch=21.612785ms, reduce=135ns; duck time-ns stats: p50=191.829061ms, p90=192.159385ms, max=192.587522ms; kernel_model: matmul=7.222591 GFLOP (37.503 GFLOP/s @ duck_max), param_stream=1.007419G (5.231 Gparam/s @ duck_max), weight_stream=1081.313 MiB (5.887 GB/s @ duck_max) [2026-04-08 08:58:12.003915 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=224, expert_tiles=753, avg_tile_batch=3.48, prepare=657.806µs, send=18.51985ms, judge_wait=211.984997ms, fetch=21.634822ms, reduce=133ns; duck time-ns stats: p50=190.254055ms, p90=190.6424ms, max=190.827057ms; kernel_model: matmul=7.222591 GFLOP (37.849 GFLOP/s @ duck_max), param_stream=1.036321G (5.431 Gparam/s @ duck_max), weight_stream=1112.334 MiB (6.112 GB/s @ duck_max) [2026-04-08 08:58:12.273938 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=748, avg_tile_batch=3.51, prepare=664.587µs, send=18.549275ms, judge_wait=214.71591ms, fetch=20.646077ms, reduce=18ns; duck time-ns stats: p50=192.880224ms, p90=193.096026ms, max=193.280057ms; kernel_model: matmul=7.222591 GFLOP (37.369 GFLOP/s @ duck_max), param_stream=1.029439G (5.326 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.995 GB/s @ duck_max) [2026-04-08 08:58:12.544702 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=223, expert_tiles=751, avg_tile_batch=3.49, prepare=649.611µs, send=18.479578ms, judge_wait=215.431225ms, fetch=20.677382ms, reduce=141ns; duck time-ns stats: p50=194.710719ms, p90=195.048463ms, max=195.655422ms; kernel_model: matmul=7.222591 GFLOP (36.915 GFLOP/s @ duck_max), param_stream=1.033568G (5.283 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.945 GB/s @ duck_max) [2026-04-08 08:58:12.567994 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=95, expert_tiles=106, avg_tile_batch=1.81, prepare=53.968µs, send=1.290752ms, judge_wait=19.48076ms, fetch=1.477523ms, reduce=21ns; duck time-ns stats: p50=19.260899ms, p90=19.295873ms, max=19.346576ms; kernel_model: matmul=0.528482 GFLOP (27.317 GFLOP/s @ duck_max), param_stream=0.145883G (7.541 Gparam/s @ duck_max), weight_stream=156.584 MiB (8.487 GB/s @ duck_max) [2026-04-08 08:58:12.860057 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=746, avg_tile_batch=3.52, prepare=773.493µs, send=17.235145ms, judge_wait=217.804771ms, fetch=21.60832ms, reduce=20ns; duck time-ns stats: p50=196.563196ms, p90=197.321202ms, max=197.504466ms; kernel_model: matmul=7.222591 GFLOP (36.569 GFLOP/s @ duck_max), param_stream=1.026687G (5.198 Gparam/s @ duck_max), weight_stream=1101.994 MiB (5.851 GB/s @ duck_max) [2026-04-08 08:58:13.130143 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=221, expert_tiles=743, avg_tile_batch=3.53, prepare=665.79µs, send=18.53222ms, judge_wait=213.842132ms, fetch=21.670691ms, reduce=134ns; duck time-ns stats: p50=192.961258ms, p90=193.47327ms, max=193.753953ms; kernel_model: matmul=7.222591 GFLOP (37.277 GFLOP/s @ duck_max), param_stream=1.022558G (5.278 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.940 GB/s @ duck_max) [2026-04-08 08:58:13.396770 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=223, expert_tiles=746, avg_tile_batch=3.52, prepare=658.497µs, send=18.537903ms, judge_wait=211.417878ms, fetch=20.649026ms, reduce=138ns; duck time-ns stats: p50=189.526122ms, p90=189.827385ms, max=190.380692ms; kernel_model: matmul=7.222591 GFLOP (37.938 GFLOP/s @ duck_max), param_stream=1.026687G (5.393 Gparam/s @ duck_max), weight_stream=1101.994 MiB (6.070 GB/s @ duck_max) [2026-04-08 08:58:13.667075 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=222, expert_tiles=744, avg_tile_batch=3.53, prepare=650.874µs, send=18.561427ms, judge_wait=214.003014ms, fetch=21.646969ms, reduce=139ns; duck time-ns stats: p50=192.632604ms, p90=192.994111ms, max=193.201589ms; kernel_model: matmul=7.222591 GFLOP (37.384 GFLOP/s @ duck_max), param_stream=1.023934G (5.300 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.965 GB/s @ duck_max) [2026-04-08 08:58:13.689625 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=83, expert_tiles=94, avg_tile_batch=2.04, prepare=53.422µs, send=1.338853ms, judge_wait=18.668108ms, fetch=1.479484ms, reduce=133ns; duck time-ns stats: p50=18.49142ms, p90=18.52771ms, max=18.538075ms; kernel_model: matmul=0.528482 GFLOP (28.508 GFLOP/s @ duck_max), param_stream=0.129368G (6.979 Gparam/s @ duck_max), weight_stream=138.857 MiB (7.854 GB/s @ duck_max) [2026-04-08 08:58:13.982455 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=222, expert_tiles=751, avg_tile_batch=3.49, prepare=759.588µs, send=17.217063ms, judge_wait=219.630801ms, fetch=20.632933ms, reduce=20ns; duck time-ns stats: p50=196.901456ms, p90=197.156037ms, max=197.447338ms; kernel_model: matmul=7.222591 GFLOP (36.580 GFLOP/s @ duck_max), param_stream=1.033568G (5.235 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.892 GB/s @ duck_max) [2026-04-08 08:58:14.259179 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=751, avg_tile_batch=3.49, prepare=653.928µs, send=18.560173ms, judge_wait=221.509935ms, fetch=20.634092ms, reduce=143ns; duck time-ns stats: p50=199.681743ms, p90=199.940339ms, max=199.997325ms; kernel_model: matmul=7.222591 GFLOP (36.113 GFLOP/s @ duck_max), param_stream=1.033568G (5.168 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.816 GB/s @ duck_max) [2026-04-08 08:58:14.532841 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=753, avg_tile_batch=3.48, prepare=651.47µs, send=18.566076ms, judge_wait=217.446426ms, fetch=21.567453ms, reduce=135ns; duck time-ns stats: p50=193.848278ms, p90=194.183629ms, max=194.42609ms; kernel_model: matmul=7.222591 GFLOP (37.148 GFLOP/s @ duck_max), param_stream=1.036321G (5.330 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.999 GB/s @ duck_max) [2026-04-08 08:58:14.807646 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=747, avg_tile_batch=3.51, prepare=647.179µs, send=18.565809ms, judge_wait=218.47819ms, fetch=21.66469ms, reduce=20ns; duck time-ns stats: p50=197.218089ms, p90=197.501058ms, max=197.61219ms; kernel_model: matmul=7.222591 GFLOP (36.549 GFLOP/s @ duck_max), param_stream=1.028063G (5.202 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.855 GB/s @ duck_max) [2026-04-08 08:58:14.830893 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=83, expert_tiles=97, avg_tile_batch=1.98, prepare=56.96µs, send=1.311991ms, judge_wait=19.426225ms, fetch=1.470432ms, reduce=20ns; duck time-ns stats: p50=19.229256ms, p90=19.267356ms, max=19.292685ms; kernel_model: matmul=0.528482 GFLOP (27.393 GFLOP/s @ duck_max), param_stream=0.133497G (6.920 Gparam/s @ duck_max), weight_stream=143.289 MiB (7.788 GB/s @ duck_max) [2026-04-08 08:58:15.121663 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=748, avg_tile_batch=3.51, prepare=774.433µs, send=17.23018ms, judge_wait=217.546303ms, fetch=20.636797ms, reduce=136ns; duck time-ns stats: p50=196.461708ms, p90=196.711822ms, max=196.856918ms; kernel_model: matmul=7.222591 GFLOP (36.690 GFLOP/s @ duck_max), param_stream=1.029439G (5.229 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.886 GB/s @ duck_max) [2026-04-08 08:58:15.399463 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=753, avg_tile_batch=3.48, prepare=661.217µs, send=18.549935ms, judge_wait=221.448888ms, fetch=21.656682ms, reduce=135ns; duck time-ns stats: p50=198.464954ms, p90=198.84178ms, max=199.067117ms; kernel_model: matmul=7.222591 GFLOP (36.282 GFLOP/s @ duck_max), param_stream=1.036321G (5.206 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.859 GB/s @ duck_max) [2026-04-08 08:58:15.673231 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=745, avg_tile_batch=3.52, prepare=657.35µs, send=18.448017ms, judge_wait=217.712372ms, fetch=21.578125ms, reduce=104ns; duck time-ns stats: p50=196.561112ms, p90=196.782523ms, max=196.97412ms; kernel_model: matmul=7.222591 GFLOP (36.668 GFLOP/s @ duck_max), param_stream=1.025311G (5.205 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.859 GB/s @ duck_max) [2026-04-08 08:58:15.944676 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=751, avg_tile_batch=3.49, prepare=658.085µs, send=18.554243ms, judge_wait=215.149485ms, fetch=21.597513ms, reduce=97ns; duck time-ns stats: p50=192.731757ms, p90=193.090456ms, max=193.284876ms; kernel_model: matmul=7.222591 GFLOP (37.368 GFLOP/s @ duck_max), param_stream=1.033568G (5.347 Gparam/s @ duck_max), weight_stream=1109.380 MiB (6.018 GB/s @ duck_max) [2026-04-08 08:58:15.968856 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=85, expert_tiles=99, avg_tile_batch=1.94, prepare=55.179µs, send=1.341982ms, judge_wait=20.279455ms, fetch=1.480612ms, reduce=105ns; duck time-ns stats: p50=20.07117ms, p90=20.103779ms, max=20.136238ms; kernel_model: matmul=0.528482 GFLOP (26.245 GFLOP/s @ duck_max), param_stream=0.136249G (6.766 Gparam/s @ duck_max), weight_stream=146.243 MiB (7.615 GB/s @ duck_max) [2026-04-08 08:58:16.260332 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=743, avg_tile_batch=3.53, prepare=777.985µs, send=18.478025ms, judge_wait=215.824492ms, fetch=21.626642ms, reduce=136ns; duck time-ns stats: p50=193.979381ms, p90=194.366213ms, max=194.47271ms; kernel_model: matmul=7.222591 GFLOP (37.139 GFLOP/s @ duck_max), param_stream=1.022558G (5.258 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.918 GB/s @ duck_max) [2026-04-08 08:58:16.540596 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=222, expert_tiles=753, avg_tile_batch=3.48, prepare=652.064µs, send=18.543848ms, judge_wait=223.908191ms, fetch=21.635738ms, reduce=20ns; duck time-ns stats: p50=204.615701ms, p90=204.870162ms, max=205.055021ms; kernel_model: matmul=7.222591 GFLOP (35.223 GFLOP/s @ duck_max), param_stream=1.036321G (5.054 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.688 GB/s @ duck_max) [2026-04-08 08:58:16.811110 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=221, expert_tiles=743, avg_tile_batch=3.53, prepare=654.09µs, send=18.425116ms, judge_wait=214.393109ms, fetch=21.66255ms, reduce=136ns; duck time-ns stats: p50=192.296476ms, p90=192.763376ms, max=193.12095ms; kernel_model: matmul=7.222591 GFLOP (37.399 GFLOP/s @ duck_max), param_stream=1.022558G (5.295 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.959 GB/s @ duck_max) [2026-04-08 08:58:17.085033 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=216, expert_tiles=742, avg_tile_batch=3.54, prepare=651.536µs, send=18.553123ms, judge_wait=217.638486ms, fetch=21.623266ms, reduce=137ns; duck time-ns stats: p50=195.727124ms, p90=196.241477ms, max=196.534089ms; kernel_model: matmul=7.222591 GFLOP (36.750 GFLOP/s @ duck_max), param_stream=1.021182G (5.196 Gparam/s @ duck_max), weight_stream=1096.085 MiB (5.848 GB/s @ duck_max) [2026-04-08 08:58:17.109856 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=88, expert_tiles=99, avg_tile_batch=1.94, prepare=53.597µs, send=1.346243ms, judge_wait=20.954209ms, fetch=1.474138ms, reduce=20ns; duck time-ns stats: p50=20.710329ms, p90=20.741942ms, max=20.782733ms; kernel_model: matmul=0.528482 GFLOP (25.429 GFLOP/s @ duck_max), param_stream=0.136249G (6.556 Gparam/s @ duck_max), weight_stream=146.243 MiB (7.379 GB/s @ duck_max) [2026-04-08 08:58:17.399853 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=760, avg_tile_batch=3.45, prepare=764.393µs, send=17.215478ms, judge_wait=216.761581ms, fetch=20.634101ms, reduce=20ns; duck time-ns stats: p50=194.697458ms, p90=194.996037ms, max=195.062207ms; kernel_model: matmul=7.222591 GFLOP (37.027 GFLOP/s @ duck_max), param_stream=1.045955G (5.362 Gparam/s @ duck_max), weight_stream=1122.675 MiB (6.035 GB/s @ duck_max) [2026-04-08 08:58:17.669959 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=743, avg_tile_batch=3.53, prepare=653.895µs, send=18.51961ms, judge_wait=213.963846ms, fetch=21.652895ms, reduce=20ns; duck time-ns stats: p50=191.958136ms, p90=192.341038ms, max=192.529989ms; kernel_model: matmul=7.222591 GFLOP (37.514 GFLOP/s @ duck_max), param_stream=1.022558G (5.311 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.978 GB/s @ duck_max) [2026-04-08 08:58:17.942495 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=760, avg_tile_batch=3.45, prepare=654.421µs, send=18.575102ms, judge_wait=216.305381ms, fetch=21.608296ms, reduce=20ns; duck time-ns stats: p50=194.621156ms, p90=194.926344ms, max=195.008957ms; kernel_model: matmul=7.222591 GFLOP (37.037 GFLOP/s @ duck_max), param_stream=1.045955G (5.364 Gparam/s @ duck_max), weight_stream=1122.675 MiB (6.037 GB/s @ duck_max) [2026-04-08 08:58:18.218182 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=228, expert_tiles=760, avg_tile_batch=3.45, prepare=653.732µs, send=18.582204ms, judge_wait=219.452298ms, fetch=21.63702ms, reduce=136ns; duck time-ns stats: p50=196.543806ms, p90=196.72662ms, max=196.909679ms; kernel_model: matmul=7.222591 GFLOP (36.680 GFLOP/s @ duck_max), param_stream=1.045955G (5.312 Gparam/s @ duck_max), weight_stream=1122.675 MiB (5.978 GB/s @ duck_max) [2026-04-08 08:58:18.241759 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=99, expert_tiles=105, avg_tile_batch=1.83, prepare=53.152µs, send=1.446758ms, judge_wait=19.57456ms, fetch=1.485776ms, reduce=152ns; duck time-ns stats: p50=19.365383ms, p90=19.406435ms, max=19.435812ms; kernel_model: matmul=0.528482 GFLOP (27.191 GFLOP/s @ duck_max), param_stream=0.144507G (7.435 Gparam/s @ duck_max), weight_stream=155.106 MiB (8.368 GB/s @ duck_max) [2026-04-08 08:58:18.533226 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=751, avg_tile_batch=3.49, prepare=770.043µs, send=18.402819ms, judge_wait=215.802962ms, fetch=21.586241ms, reduce=135ns; duck time-ns stats: p50=192.390053ms, p90=192.660219ms, max=192.736107ms; kernel_model: matmul=7.222591 GFLOP (37.474 GFLOP/s @ duck_max), param_stream=1.033568G (5.363 Gparam/s @ duck_max), weight_stream=1109.380 MiB (6.036 GB/s @ duck_max) [2026-04-08 08:58:18.806420 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=748, avg_tile_batch=3.51, prepare=652.176µs, send=18.544385ms, judge_wait=218.063203ms, fetch=20.622721ms, reduce=21ns; duck time-ns stats: p50=194.734432ms, p90=195.133064ms, max=195.272935ms; kernel_model: matmul=7.222591 GFLOP (36.987 GFLOP/s @ duck_max), param_stream=1.029439G (5.272 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.933 GB/s @ duck_max) [2026-04-08 08:58:19.077277 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=756, avg_tile_batch=3.47, prepare=656.141µs, send=18.519139ms, judge_wait=214.651362ms, fetch=21.640254ms, reduce=149ns; duck time-ns stats: p50=191.369366ms, p90=191.695553ms, max=191.893067ms; kernel_model: matmul=7.222591 GFLOP (37.639 GFLOP/s @ duck_max), param_stream=1.040450G (5.422 Gparam/s @ duck_max), weight_stream=1116.766 MiB (6.102 GB/s @ duck_max) [2026-04-08 08:58:19.349731 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=228, expert_tiles=747, avg_tile_batch=3.51, prepare=656.995µs, send=18.550769ms, judge_wait=216.251483ms, fetch=21.626755ms, reduce=149ns; duck time-ns stats: p50=194.126693ms, p90=194.599968ms, max=194.926322ms; kernel_model: matmul=7.222591 GFLOP (37.053 GFLOP/s @ duck_max), param_stream=1.028063G (5.274 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.936 GB/s @ duck_max) [2026-04-08 08:58:19.373713 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=102, expert_tiles=110, avg_tile_batch=1.75, prepare=54.522µs, send=1.42742ms, judge_wait=20.003338ms, fetch=1.481876ms, reduce=151ns; duck time-ns stats: p50=19.801601ms, p90=19.854775ms, max=19.86932ms; kernel_model: matmul=0.528482 GFLOP (26.598 GFLOP/s @ duck_max), param_stream=0.151388G (7.619 Gparam/s @ duck_max), weight_stream=162.492 MiB (8.575 GB/s @ duck_max) [2026-04-08 08:58:19.663809 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=747, avg_tile_batch=3.51, prepare=771.932µs, send=17.212075ms, judge_wait=215.812382ms, fetch=21.604531ms, reduce=20ns; duck time-ns stats: p50=192.861749ms, p90=193.318861ms, max=193.465829ms; kernel_model: matmul=7.222591 GFLOP (37.333 GFLOP/s @ duck_max), param_stream=1.028063G (5.314 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.981 GB/s @ duck_max) [2026-04-08 08:58:19.934748 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=232, expert_tiles=754, avg_tile_batch=3.48, prepare=697.281µs, send=18.428918ms, judge_wait=214.697694ms, fetch=21.632204ms, reduce=20ns; duck time-ns stats: p50=192.529077ms, p90=192.962796ms, max=193.109074ms; kernel_model: matmul=7.222591 GFLOP (37.402 GFLOP/s @ duck_max), param_stream=1.037697G (5.374 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.048 GB/s @ duck_max) [2026-04-08 08:58:20.205551 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=748, avg_tile_batch=3.51, prepare=646.325µs, send=18.467469ms, judge_wait=215.563922ms, fetch=20.66093ms, reduce=133ns; duck time-ns stats: p50=192.44974ms, p90=192.647604ms, max=192.882506ms; kernel_model: matmul=7.222591 GFLOP (37.446 GFLOP/s @ duck_max), param_stream=1.029439G (5.337 Gparam/s @ duck_max), weight_stream=1104.948 MiB (6.007 GB/s @ duck_max) [2026-04-08 08:58:20.475294 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=214, expert_tiles=740, avg_tile_batch=3.55, prepare=652.113µs, send=18.461994ms, judge_wait=213.541378ms, fetch=21.638568ms, reduce=132ns; duck time-ns stats: p50=190.227967ms, p90=190.654472ms, max=190.701968ms; kernel_model: matmul=7.222591 GFLOP (37.874 GFLOP/s @ duck_max), param_stream=1.018429G (5.340 Gparam/s @ duck_max), weight_stream=1093.130 MiB (6.011 GB/s @ duck_max) [2026-04-08 08:58:20.498522 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=88, expert_tiles=99, avg_tile_batch=1.94, prepare=51.386µs, send=1.324171ms, judge_wait=19.261995ms, fetch=1.478893ms, reduce=32ns; duck time-ns stats: p50=19.054051ms, p90=19.088658ms, max=19.127698ms; kernel_model: matmul=0.528482 GFLOP (27.629 GFLOP/s @ duck_max), param_stream=0.136249G (7.123 Gparam/s @ duck_max), weight_stream=146.243 MiB (8.017 GB/s @ duck_max) [2026-04-08 08:58:20.788522 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=752, avg_tile_batch=3.49, prepare=765.214µs, send=18.405089ms, judge_wait=215.414435ms, fetch=20.631334ms, reduce=134ns; duck time-ns stats: p50=193.722135ms, p90=193.983602ms, max=194.431812ms; kernel_model: matmul=7.222591 GFLOP (37.147 GFLOP/s @ duck_max), param_stream=1.034945G (5.323 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.991 GB/s @ duck_max) [2026-04-08 08:58:21.066462 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=235, expert_tiles=760, avg_tile_batch=3.45, prepare=654.613µs, send=18.512283ms, judge_wait=221.65897ms, fetch=21.611603ms, reduce=19ns; duck time-ns stats: p50=201.268411ms, p90=201.645647ms, max=201.697494ms; kernel_model: matmul=7.222591 GFLOP (35.809 GFLOP/s @ duck_max), param_stream=1.045955G (5.186 Gparam/s @ duck_max), weight_stream=1122.675 MiB (5.837 GB/s @ duck_max) [2026-04-08 08:58:21.336702 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=749, avg_tile_batch=3.50, prepare=653.711µs, send=18.440508ms, judge_wait=214.089178ms, fetch=21.625303ms, reduce=136ns; duck time-ns stats: p50=191.640287ms, p90=191.773221ms, max=192.014041ms; kernel_model: matmul=7.222591 GFLOP (37.615 GFLOP/s @ duck_max), param_stream=1.030816G (5.368 Gparam/s @ duck_max), weight_stream=1106.425 MiB (6.042 GB/s @ duck_max) [2026-04-08 08:58:21.608968 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=217, expert_tiles=740, avg_tile_batch=3.55, prepare=656.601µs, send=18.471439ms, judge_wait=216.011207ms, fetch=21.604902ms, reduce=141ns; duck time-ns stats: p50=192.23253ms, p90=192.660724ms, max=193.171252ms; kernel_model: matmul=7.222591 GFLOP (37.390 GFLOP/s @ duck_max), param_stream=1.018429G (5.272 Gparam/s @ duck_max), weight_stream=1093.130 MiB (5.934 GB/s @ duck_max) [2026-04-08 08:58:21.637658 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=83, expert_tiles=97, avg_tile_batch=1.98, prepare=54.031µs, send=1.290158ms, judge_wait=24.853864ms, fetch=1.48629ms, reduce=139ns; duck time-ns stats: p50=24.646451ms, p90=24.673738ms, max=24.689707ms; kernel_model: matmul=0.528482 GFLOP (21.405 GFLOP/s @ duck_max), param_stream=0.133497G (5.407 Gparam/s @ duck_max), weight_stream=143.289 MiB (6.085 GB/s @ duck_max) [2026-04-08 08:58:21.931086 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=750, avg_tile_batch=3.50, prepare=779.484µs, send=18.554328ms, judge_wait=217.402473ms, fetch=21.611508ms, reduce=19ns; duck time-ns stats: p50=194.647323ms, p90=195.009597ms, max=195.190826ms; kernel_model: matmul=7.222591 GFLOP (37.003 GFLOP/s @ duck_max), param_stream=1.032192G (5.288 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.952 GB/s @ duck_max) [2026-04-08 08:58:22.201236 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=239, expert_tiles=754, avg_tile_batch=3.48, prepare=661.355µs, send=17.213969ms, judge_wait=216.213556ms, fetch=20.585265ms, reduce=20ns; duck time-ns stats: p50=192.53711ms, p90=192.70253ms, max=192.779834ms; kernel_model: matmul=7.222591 GFLOP (37.465 GFLOP/s @ duck_max), param_stream=1.037697G (5.383 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.058 GB/s @ duck_max) [2026-04-08 08:58:22.470370 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=232, expert_tiles=748, avg_tile_batch=3.51, prepare=652.883µs, send=18.453532ms, judge_wait=213.941793ms, fetch=20.62918ms, reduce=136ns; duck time-ns stats: p50=190.103228ms, p90=190.264533ms, max=191.001182ms; kernel_model: matmul=7.222591 GFLOP (37.814 GFLOP/s @ duck_max), param_stream=1.029439G (5.390 Gparam/s @ duck_max), weight_stream=1104.948 MiB (6.066 GB/s @ duck_max) [2026-04-08 08:58:22.743506 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=749, avg_tile_batch=3.50, prepare=652.102µs, send=18.472195ms, judge_wait=217.992588ms, fetch=20.633744ms, reduce=133ns; duck time-ns stats: p50=196.084198ms, p90=196.348097ms, max=196.416367ms; kernel_model: matmul=7.222591 GFLOP (36.772 GFLOP/s @ duck_max), param_stream=1.030816G (5.248 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.907 GB/s @ duck_max) [2026-04-08 08:58:22.767146 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=90, expert_tiles=100, avg_tile_batch=1.92, prepare=53.33µs, send=1.383238ms, judge_wait=19.750181ms, fetch=1.479928ms, reduce=19ns; duck time-ns stats: p50=19.562358ms, p90=19.587408ms, max=19.60037ms; kernel_model: matmul=0.528482 GFLOP (26.963 GFLOP/s @ duck_max), param_stream=0.137626G (7.022 Gparam/s @ duck_max), weight_stream=147.720 MiB (7.903 GB/s @ duck_max) [2026-04-08 08:58:23.056576 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=745, avg_tile_batch=3.52, prepare=769.166µs, send=18.596674ms, judge_wait=213.133931ms, fetch=22.2539ms, reduce=20ns; duck time-ns stats: p50=191.60406ms, p90=191.889965ms, max=192.011571ms; kernel_model: matmul=7.222591 GFLOP (37.615 GFLOP/s @ duck_max), param_stream=1.025311G (5.340 Gparam/s @ duck_max), weight_stream=1100.517 MiB (6.010 GB/s @ duck_max) [2026-04-08 08:58:23.326194 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=755, avg_tile_batch=3.48, prepare=656.439µs, send=17.211514ms, judge_wait=214.654563ms, fetch=21.65484ms, reduce=20ns; duck time-ns stats: p50=192.446325ms, p90=192.774592ms, max=193.09386ms; kernel_model: matmul=7.222591 GFLOP (37.405 GFLOP/s @ duck_max), param_stream=1.039073G (5.381 Gparam/s @ duck_max), weight_stream=1115.289 MiB (6.056 GB/s @ duck_max) [2026-04-08 08:58:23.598100 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=753, avg_tile_batch=3.48, prepare=651.927µs, send=18.483558ms, judge_wait=216.834726ms, fetch=20.625905ms, reduce=20ns; duck time-ns stats: p50=193.286246ms, p90=193.679672ms, max=194.044117ms; kernel_model: matmul=7.222591 GFLOP (37.221 GFLOP/s @ duck_max), param_stream=1.036321G (5.341 Gparam/s @ duck_max), weight_stream=1112.334 MiB (6.011 GB/s @ duck_max) [2026-04-08 08:58:23.877020 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=228, expert_tiles=747, avg_tile_batch=3.51, prepare=657.568µs, send=18.582349ms, judge_wait=223.754516ms, fetch=20.619977ms, reduce=21ns; duck time-ns stats: p50=201.078512ms, p90=201.24604ms, max=201.755824ms; kernel_model: matmul=7.222591 GFLOP (35.799 GFLOP/s @ duck_max), param_stream=1.028063G (5.096 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.735 GB/s @ duck_max) [2026-04-08 08:58:23.900133 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=98, expert_tiles=106, avg_tile_batch=1.81, prepare=53.586µs, send=1.458852ms, judge_wait=19.175082ms, fetch=1.473999ms, reduce=16ns; duck time-ns stats: p50=19.01068ms, p90=19.030927ms, max=19.035436ms; kernel_model: matmul=0.528482 GFLOP (27.763 GFLOP/s @ duck_max), param_stream=0.145883G (7.664 Gparam/s @ duck_max), weight_stream=156.584 MiB (8.625 GB/s @ duck_max) [2026-04-08 08:58:24.202907 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=243, expert_tiles=755, avg_tile_batch=3.48, prepare=774.184µs, send=17.237779ms, judge_wait=228.496209ms, fetch=21.693189ms, reduce=137ns; duck time-ns stats: p50=207.166483ms, p90=207.660488ms, max=207.805142ms; kernel_model: matmul=7.222591 GFLOP (34.757 GFLOP/s @ duck_max), param_stream=1.039073G (5.000 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.628 GB/s @ duck_max) [2026-04-08 08:58:24.471126 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=749, avg_tile_batch=3.50, prepare=653.971µs, send=18.564275ms, judge_wait=213.092144ms, fetch=20.644252ms, reduce=21ns; duck time-ns stats: p50=191.238336ms, p90=191.755714ms, max=191.987278ms; kernel_model: matmul=7.222591 GFLOP (37.620 GFLOP/s @ duck_max), param_stream=1.030816G (5.369 Gparam/s @ duck_max), weight_stream=1106.425 MiB (6.043 GB/s @ duck_max) [2026-04-08 08:58:24.743092 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=748, avg_tile_batch=3.51, prepare=654.567µs, send=17.214227ms, judge_wait=217.988276ms, fetch=20.630075ms, reduce=16ns; duck time-ns stats: p50=195.310321ms, p90=195.858439ms, max=196.631238ms; kernel_model: matmul=7.222591 GFLOP (36.732 GFLOP/s @ duck_max), param_stream=1.029439G (5.235 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.892 GB/s @ duck_max) [2026-04-08 08:58:25.016496 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=742, avg_tile_batch=3.54, prepare=649.252µs, send=18.473973ms, judge_wait=217.279658ms, fetch=21.632027ms, reduce=107ns; duck time-ns stats: p50=195.761654ms, p90=196.13273ms, max=196.223215ms; kernel_model: matmul=7.222591 GFLOP (36.808 GFLOP/s @ duck_max), param_stream=1.021182G (5.204 Gparam/s @ duck_max), weight_stream=1096.085 MiB (5.857 GB/s @ duck_max) [2026-04-08 08:58:25.040133 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=100, expert_tiles=107, avg_tile_batch=1.79, prepare=53.621µs, send=1.387951ms, judge_wait=19.686984ms, fetch=1.492871ms, reduce=138ns; duck time-ns stats: p50=19.470439ms, p90=19.505474ms, max=19.538561ms; kernel_model: matmul=0.528482 GFLOP (27.048 GFLOP/s @ duck_max), param_stream=0.147259G (7.537 Gparam/s @ duck_max), weight_stream=158.061 MiB (8.483 GB/s @ duck_max) [2026-04-08 08:58:25.342201 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=240, expert_tiles=746, avg_tile_batch=3.52, prepare=765.213µs, send=19.469378ms, judge_wait=217.766584ms, fetch=29.250184ms, reduce=146ns; duck time-ns stats: p50=196.266913ms, p90=196.529933ms, max=196.889628ms; kernel_model: matmul=7.222591 GFLOP (36.683 GFLOP/s @ duck_max), param_stream=1.026687G (5.215 Gparam/s @ duck_max), weight_stream=1101.994 MiB (5.869 GB/s @ duck_max) [2026-04-08 08:58:25.624378 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=755, avg_tile_batch=3.48, prepare=657.489µs, send=18.508167ms, judge_wait=225.92981ms, fetch=21.647078ms, reduce=140ns; duck time-ns stats: p50=203.988206ms, p90=204.227626ms, max=204.259096ms; kernel_model: matmul=7.222591 GFLOP (35.360 GFLOP/s @ duck_max), param_stream=1.039073G (5.087 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.725 GB/s @ duck_max) [2026-04-08 08:58:25.896222 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=754, avg_tile_batch=3.48, prepare=657.906µs, send=18.50347ms, judge_wait=215.55343ms, fetch=21.64351ms, reduce=140ns; duck time-ns stats: p50=193.303025ms, p90=193.501509ms, max=193.59648ms; kernel_model: matmul=7.222591 GFLOP (37.307 GFLOP/s @ duck_max), param_stream=1.037697G (5.360 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.033 GB/s @ duck_max) [2026-04-08 08:58:26.168375 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=756, avg_tile_batch=3.47, prepare=658.066µs, send=18.438812ms, judge_wait=215.965407ms, fetch=21.648815ms, reduce=134ns; duck time-ns stats: p50=193.542201ms, p90=193.730581ms, max=193.844631ms; kernel_model: matmul=7.222591 GFLOP (37.260 GFLOP/s @ duck_max), param_stream=1.040450G (5.367 Gparam/s @ duck_max), weight_stream=1116.766 MiB (6.041 GB/s @ duck_max) [2026-04-08 08:58:26.190973 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=85, expert_tiles=97, avg_tile_batch=1.98, prepare=53.671µs, send=1.302191ms, judge_wait=18.736334ms, fetch=1.479114ms, reduce=136ns; duck time-ns stats: p50=18.495487ms, p90=18.542894ms, max=18.592858ms; kernel_model: matmul=0.528482 GFLOP (28.424 GFLOP/s @ duck_max), param_stream=0.133497G (7.180 Gparam/s @ duck_max), weight_stream=143.289 MiB (8.081 GB/s @ duck_max) [2026-04-08 08:58:26.483863 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=752, avg_tile_batch=3.49, prepare=773.634µs, send=17.237468ms, judge_wait=219.320299ms, fetch=20.647088ms, reduce=138ns; duck time-ns stats: p50=196.364535ms, p90=197.129061ms, max=197.334449ms; kernel_model: matmul=7.222591 GFLOP (36.601 GFLOP/s @ duck_max), param_stream=1.034945G (5.245 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.903 GB/s @ duck_max) [2026-04-08 08:58:26.754440 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=747, avg_tile_batch=3.51, prepare=648.188µs, send=18.545749ms, judge_wait=214.143778ms, fetch=21.649531ms, reduce=135ns; duck time-ns stats: p50=191.39074ms, p90=191.81633ms, max=191.944288ms; kernel_model: matmul=7.222591 GFLOP (37.629 GFLOP/s @ duck_max), param_stream=1.028063G (5.356 Gparam/s @ duck_max), weight_stream=1103.471 MiB (6.028 GB/s @ duck_max) [2026-04-08 08:58:27.031826 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=237, expert_tiles=750, avg_tile_batch=3.50, prepare=654.203µs, send=18.339945ms, judge_wait=221.122327ms, fetch=21.633302ms, reduce=140ns; duck time-ns stats: p50=197.977395ms, p90=198.425644ms, max=198.47194ms; kernel_model: matmul=7.222591 GFLOP (36.391 GFLOP/s @ duck_max), param_stream=1.032192G (5.201 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.853 GB/s @ duck_max) [2026-04-08 08:58:27.305857 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=750, avg_tile_batch=3.50, prepare=648.003µs, send=18.321203ms, judge_wait=218.890689ms, fetch=20.658551ms, reduce=133ns; duck time-ns stats: p50=194.97345ms, p90=195.332827ms, max=195.609181ms; kernel_model: matmul=7.222591 GFLOP (36.924 GFLOP/s @ duck_max), param_stream=1.032192G (5.277 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.939 GB/s @ duck_max) [2026-04-08 08:58:27.328832 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=87, expert_tiles=95, avg_tile_batch=2.02, prepare=53.084µs, send=1.292357ms, judge_wait=19.122274ms, fetch=1.487036ms, reduce=139ns; duck time-ns stats: p50=18.814865ms, p90=18.882692ms, max=18.98219ms; kernel_model: matmul=0.528482 GFLOP (27.841 GFLOP/s @ duck_max), param_stream=0.130744G (6.888 Gparam/s @ duck_max), weight_stream=140.334 MiB (7.752 GB/s @ duck_max) [2026-04-08 08:58:27.620489 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=235, expert_tiles=759, avg_tile_batch=3.46, prepare=763.966µs, send=18.558978ms, judge_wait=216.921907ms, fetch=20.641655ms, reduce=21ns; duck time-ns stats: p50=194.937185ms, p90=195.355141ms, max=195.87963ms; kernel_model: matmul=7.222591 GFLOP (36.873 GFLOP/s @ duck_max), param_stream=1.044578G (5.333 Gparam/s @ duck_max), weight_stream=1121.197 MiB (6.002 GB/s @ duck_max) [2026-04-08 08:58:27.898586 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=751, avg_tile_batch=3.49, prepare=655.802µs, send=18.477849ms, judge_wait=222.649926ms, fetch=20.650625ms, reduce=135ns; duck time-ns stats: p50=199.594532ms, p90=199.9707ms, max=200.160885ms; kernel_model: matmul=7.222591 GFLOP (36.084 GFLOP/s @ duck_max), param_stream=1.033568G (5.164 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.812 GB/s @ duck_max) [2026-04-08 08:58:28.167900 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=238, expert_tiles=760, avg_tile_batch=3.45, prepare=656.259µs, send=17.211255ms, judge_wait=215.380491ms, fetch=20.616935ms, reduce=151ns; duck time-ns stats: p50=193.828964ms, p90=194.10147ms, max=194.136842ms; kernel_model: matmul=7.222591 GFLOP (37.204 GFLOP/s @ duck_max), param_stream=1.045955G (5.388 Gparam/s @ duck_max), weight_stream=1122.675 MiB (6.064 GB/s @ duck_max) [2026-04-08 08:58:28.439684 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=742, avg_tile_batch=3.54, prepare=650.733µs, send=18.497627ms, judge_wait=215.39981ms, fetch=21.646749ms, reduce=149ns; duck time-ns stats: p50=193.289471ms, p90=193.596305ms, max=193.825273ms; kernel_model: matmul=7.222591 GFLOP (37.263 GFLOP/s @ duck_max), param_stream=1.021182G (5.269 Gparam/s @ duck_max), weight_stream=1096.085 MiB (5.930 GB/s @ duck_max) [2026-04-08 08:58:28.462953 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=92, expert_tiles=102, avg_tile_batch=1.88, prepare=53.309µs, send=1.292481ms, judge_wait=19.458612ms, fetch=1.481832ms, reduce=20ns; duck time-ns stats: p50=19.258999ms, p90=19.289448ms, max=19.323763ms; kernel_model: matmul=0.528482 GFLOP (27.349 GFLOP/s @ duck_max), param_stream=0.140378G (7.265 Gparam/s @ duck_max), weight_stream=150.675 MiB (8.176 GB/s @ duck_max) [2026-04-08 08:58:28.752859 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=240, expert_tiles=757, avg_tile_batch=3.47, prepare=766.632µs, send=17.2153ms, judge_wait=215.103698ms, fetch=21.641937ms, reduce=20ns; duck time-ns stats: p50=192.691779ms, p90=193.056798ms, max=193.214237ms; kernel_model: matmul=7.222591 GFLOP (37.381 GFLOP/s @ duck_max), param_stream=1.041826G (5.392 Gparam/s @ duck_max), weight_stream=1118.243 MiB (6.069 GB/s @ duck_max) [2026-04-08 08:58:29.026999 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=753, avg_tile_batch=3.48, prepare=653.166µs, send=18.545703ms, judge_wait=217.902561ms, fetch=21.633295ms, reduce=137ns; duck time-ns stats: p50=196.296032ms, p90=196.686062ms, max=197.032316ms; kernel_model: matmul=7.222591 GFLOP (36.657 GFLOP/s @ duck_max), param_stream=1.036321G (5.260 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.920 GB/s @ duck_max) [2026-04-08 08:58:29.301116 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=745, avg_tile_batch=3.52, prepare=647.999µs, send=18.565614ms, judge_wait=218.855434ms, fetch=20.653257ms, reduce=135ns; duck time-ns stats: p50=195.089022ms, p90=195.499509ms, max=195.815711ms; kernel_model: matmul=7.222591 GFLOP (36.885 GFLOP/s @ duck_max), param_stream=1.025311G (5.236 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.893 GB/s @ duck_max) [2026-04-08 08:58:29.573278 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=228, expert_tiles=753, avg_tile_batch=3.48, prepare=650.362µs, send=18.54021ms, judge_wait=215.751559ms, fetch=21.681369ms, reduce=130ns; duck time-ns stats: p50=191.81949ms, p90=192.14587ms, max=192.293877ms; kernel_model: matmul=7.222591 GFLOP (37.560 GFLOP/s @ duck_max), param_stream=1.036321G (5.389 Gparam/s @ duck_max), weight_stream=1112.334 MiB (6.066 GB/s @ duck_max) [2026-04-08 08:58:29.596418 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=88, expert_tiles=94, avg_tile_batch=2.04, prepare=52.542µs, send=1.304336ms, judge_wait=19.335502ms, fetch=1.46777ms, reduce=20ns; duck time-ns stats: p50=19.072435ms, p90=19.114717ms, max=19.161026ms; kernel_model: matmul=0.528482 GFLOP (27.581 GFLOP/s @ duck_max), param_stream=0.129368G (6.752 Gparam/s @ duck_max), weight_stream=138.857 MiB (7.599 GB/s @ duck_max) [2026-04-08 08:58:29.888635 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=746, avg_tile_batch=3.52, prepare=769.708µs, send=18.587947ms, judge_wait=216.326746ms, fetch=21.627783ms, reduce=133ns; duck time-ns stats: p50=194.583614ms, p90=194.941645ms, max=195.044681ms; kernel_model: matmul=7.222591 GFLOP (37.030 GFLOP/s @ duck_max), param_stream=1.026687G (5.264 Gparam/s @ duck_max), weight_stream=1101.994 MiB (5.924 GB/s @ duck_max) [2026-04-08 08:58:30.160762 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=235, expert_tiles=758, avg_tile_batch=3.46, prepare=651.746µs, send=18.38147ms, judge_wait=215.843703ms, fetch=21.633844ms, reduce=136ns; duck time-ns stats: p50=193.46578ms, p90=193.806114ms, max=194.10043ms; kernel_model: matmul=7.222591 GFLOP (37.211 GFLOP/s @ duck_max), param_stream=1.043202G (5.375 Gparam/s @ duck_max), weight_stream=1119.720 MiB (6.049 GB/s @ duck_max) [2026-04-08 08:58:30.435404 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=240, expert_tiles=750, avg_tile_batch=3.50, prepare=645.74µs, send=18.353224ms, judge_wait=218.392788ms, fetch=21.628711ms, reduce=133ns; duck time-ns stats: p50=194.975338ms, p90=195.177401ms, max=195.339096ms; kernel_model: matmul=7.222591 GFLOP (36.975 GFLOP/s @ duck_max), param_stream=1.032192G (5.284 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.947 GB/s @ duck_max) [2026-04-08 08:58:30.706915 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=751, avg_tile_batch=3.49, prepare=654.916µs, send=17.210961ms, judge_wait=217.544735ms, fetch=20.64752ms, reduce=136ns; duck time-ns stats: p50=194.858259ms, p90=195.126407ms, max=195.409233ms; kernel_model: matmul=7.222591 GFLOP (36.961 GFLOP/s @ duck_max), param_stream=1.033568G (5.289 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.953 GB/s @ duck_max) [2026-04-08 08:58:30.729916 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=88, expert_tiles=102, avg_tile_batch=1.88, prepare=52.795µs, send=1.298543ms, judge_wait=19.170429ms, fetch=1.478643ms, reduce=135ns; duck time-ns stats: p50=18.990486ms, p90=19.027848ms, max=19.036173ms; kernel_model: matmul=0.528482 GFLOP (27.762 GFLOP/s @ duck_max), param_stream=0.140378G (7.374 Gparam/s @ duck_max), weight_stream=150.675 MiB (8.300 GB/s @ duck_max) [2026-04-08 08:58:31.034547 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=745, avg_tile_batch=3.52, prepare=764.669µs, send=18.414971ms, judge_wait=228.935081ms, fetch=21.614839ms, reduce=133ns; duck time-ns stats: p50=205.47926ms, p90=205.854783ms, max=206.196116ms; kernel_model: matmul=7.222591 GFLOP (35.028 GFLOP/s @ duck_max), param_stream=1.025311G (4.973 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.596 GB/s @ duck_max) [2026-04-08 08:58:31.329089 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=757, avg_tile_batch=3.47, prepare=659.082µs, send=18.522256ms, judge_wait=238.321482ms, fetch=21.616946ms, reduce=139ns; duck time-ns stats: p50=214.980444ms, p90=215.282438ms, max=215.609587ms; kernel_model: matmul=7.222591 GFLOP (33.498 GFLOP/s @ duck_max), param_stream=1.041826G (4.832 Gparam/s @ duck_max), weight_stream=1118.243 MiB (5.438 GB/s @ duck_max) [2026-04-08 08:58:31.624095 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=757, avg_tile_batch=3.47, prepare=653.361µs, send=18.565451ms, judge_wait=238.69345ms, fetch=21.627212ms, reduce=137ns; duck time-ns stats: p50=217.424978ms, p90=217.985827ms, max=218.621726ms; kernel_model: matmul=7.222591 GFLOP (33.037 GFLOP/s @ duck_max), param_stream=1.041826G (4.765 Gparam/s @ duck_max), weight_stream=1118.243 MiB (5.363 GB/s @ duck_max) [2026-04-08 08:58:31.918460 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=232, expert_tiles=749, avg_tile_batch=3.50, prepare=652.361µs, send=18.487883ms, judge_wait=238.159151ms, fetch=21.630213ms, reduce=21ns; duck time-ns stats: p50=215.693138ms, p90=215.892027ms, max=216.135957ms; kernel_model: matmul=7.222591 GFLOP (33.417 GFLOP/s @ duck_max), param_stream=1.030816G (4.769 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.368 GB/s @ duck_max) [2026-04-08 08:58:31.941945 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=86, expert_tiles=99, avg_tile_batch=1.94, prepare=53.231µs, send=1.34432ms, judge_wait=19.6502ms, fetch=1.473495ms, reduce=19ns; duck time-ns stats: p50=19.488761ms, p90=19.510372ms, max=19.516523ms; kernel_model: matmul=0.528482 GFLOP (27.079 GFLOP/s @ duck_max), param_stream=0.136249G (6.981 Gparam/s @ duck_max), weight_stream=146.243 MiB (7.857 GB/s @ duck_max) [2026-04-08 08:58:32.238672 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=755, avg_tile_batch=3.48, prepare=779.075µs, send=17.21218ms, judge_wait=222.34465ms, fetch=21.629625ms, reduce=138ns; duck time-ns stats: p50=199.771521ms, p90=200.001345ms, max=200.03162ms; kernel_model: matmul=7.222591 GFLOP (36.107 GFLOP/s @ duck_max), param_stream=1.039073G (5.195 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.846 GB/s @ duck_max) [2026-04-08 08:58:32.510763 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=755, avg_tile_batch=3.48, prepare=653.03µs, send=18.418633ms, judge_wait=216.895353ms, fetch=20.627455ms, reduce=20ns; duck time-ns stats: p50=194.167596ms, p90=194.518707ms, max=194.93797ms; kernel_model: matmul=7.222591 GFLOP (37.051 GFLOP/s @ duck_max), param_stream=1.039073G (5.330 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.999 GB/s @ duck_max) [2026-04-08 08:58:32.786272 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=240, expert_tiles=761, avg_tile_batch=3.45, prepare=649.711µs, send=18.447058ms, judge_wait=220.247006ms, fetch=20.652353ms, reduce=134ns; duck time-ns stats: p50=196.149292ms, p90=196.51583ms, max=196.709402ms; kernel_model: matmul=7.222591 GFLOP (36.717 GFLOP/s @ duck_max), param_stream=1.047331G (5.324 Gparam/s @ duck_max), weight_stream=1124.152 MiB (5.992 GB/s @ duck_max) [2026-04-08 08:58:33.059132 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=749, avg_tile_batch=3.50, prepare=652.971µs, send=18.434914ms, judge_wait=216.682496ms, fetch=21.630473ms, reduce=136ns; duck time-ns stats: p50=193.067938ms, p90=193.289811ms, max=193.579573ms; kernel_model: matmul=7.222591 GFLOP (37.311 GFLOP/s @ duck_max), param_stream=1.030816G (5.325 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.993 GB/s @ duck_max) [2026-04-08 08:58:33.081799 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=85, expert_tiles=98, avg_tile_batch=1.96, prepare=53.521µs, send=1.302517ms, judge_wait=18.66144ms, fetch=1.481633ms, reduce=142ns; duck time-ns stats: p50=18.458679ms, p90=18.494626ms, max=18.517465ms; kernel_model: matmul=0.528482 GFLOP (28.540 GFLOP/s @ duck_max), param_stream=0.134873G (7.284 Gparam/s @ duck_max), weight_stream=144.766 MiB (8.198 GB/s @ duck_max) [2026-04-08 08:58:33.373610 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=239, expert_tiles=753, avg_tile_batch=3.48, prepare=765.95µs, send=17.212496ms, judge_wait=218.277537ms, fetch=20.635099ms, reduce=136ns; duck time-ns stats: p50=195.65134ms, p90=195.826464ms, max=196.11072ms; kernel_model: matmul=7.222591 GFLOP (36.829 GFLOP/s @ duck_max), param_stream=1.036321G (5.284 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.947 GB/s @ duck_max) [2026-04-08 08:58:33.647391 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=228, expert_tiles=744, avg_tile_batch=3.53, prepare=651.239µs, send=18.439904ms, judge_wait=217.690091ms, fetch=21.665225ms, reduce=20ns; duck time-ns stats: p50=193.335917ms, p90=193.551306ms, max=193.601252ms; kernel_model: matmul=7.222591 GFLOP (37.307 GFLOP/s @ duck_max), param_stream=1.023934G (5.289 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.953 GB/s @ duck_max) [2026-04-08 08:58:33.923786 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=757, avg_tile_batch=3.47, prepare=690.147µs, send=18.542624ms, judge_wait=220.056123ms, fetch=21.636868ms, reduce=134ns; duck time-ns stats: p50=197.491684ms, p90=197.778286ms, max=197.899768ms; kernel_model: matmul=7.222591 GFLOP (36.496 GFLOP/s @ duck_max), param_stream=1.041826G (5.264 Gparam/s @ duck_max), weight_stream=1118.243 MiB (5.925 GB/s @ duck_max) [2026-04-08 08:58:34.194972 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=745, avg_tile_batch=3.52, prepare=657.437µs, send=18.500045ms, judge_wait=215.012386ms, fetch=21.573289ms, reduce=143ns; duck time-ns stats: p50=194.337824ms, p90=194.625717ms, max=195.008431ms; kernel_model: matmul=7.222591 GFLOP (37.037 GFLOP/s @ duck_max), param_stream=1.025311G (5.258 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.918 GB/s @ duck_max) [2026-04-08 08:58:34.217599 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=89, expert_tiles=99, avg_tile_batch=1.94, prepare=51.278µs, send=1.389578ms, judge_wait=18.708814ms, fetch=1.486652ms, reduce=135ns; duck time-ns stats: p50=18.50166ms, p90=18.556746ms, max=18.575064ms; kernel_model: matmul=0.528482 GFLOP (28.451 GFLOP/s @ duck_max), param_stream=0.136249G (7.335 Gparam/s @ duck_max), weight_stream=146.243 MiB (8.256 GB/s @ duck_max) [2026-04-08 08:58:34.513890 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=242, expert_tiles=759, avg_tile_batch=3.46, prepare=763.058µs, send=17.236028ms, judge_wait=222.072738ms, fetch=21.633454ms, reduce=20ns; duck time-ns stats: p50=197.880439ms, p90=198.05677ms, max=198.226829ms; kernel_model: matmul=7.222591 GFLOP (36.436 GFLOP/s @ duck_max), param_stream=1.044578G (5.270 Gparam/s @ duck_max), weight_stream=1121.197 MiB (5.931 GB/s @ duck_max) [2026-04-08 08:58:34.786209 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=752, avg_tile_batch=3.49, prepare=666.062µs, send=21.14897ms, judge_wait=213.318997ms, fetch=21.655856ms, reduce=138ns; duck time-ns stats: p50=191.23898ms, p90=191.450823ms, max=191.74547ms; kernel_model: matmul=7.222591 GFLOP (37.668 GFLOP/s @ duck_max), param_stream=1.034945G (5.397 Gparam/s @ duck_max), weight_stream=1110.857 MiB (6.075 GB/s @ duck_max) [2026-04-08 08:58:35.057000 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=752, avg_tile_batch=3.49, prepare=649.778µs, send=18.399693ms, judge_wait=214.663468ms, fetch=21.641823ms, reduce=105ns; duck time-ns stats: p50=191.264843ms, p90=191.55085ms, max=191.796124ms; kernel_model: matmul=7.222591 GFLOP (37.658 GFLOP/s @ duck_max), param_stream=1.034945G (5.396 Gparam/s @ duck_max), weight_stream=1110.857 MiB (6.073 GB/s @ duck_max) [2026-04-08 08:58:35.326496 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=751, avg_tile_batch=3.49, prepare=652.19µs, send=18.503385ms, judge_wait=213.23092ms, fetch=21.647747ms, reduce=20ns; duck time-ns stats: p50=190.899508ms, p90=191.377701ms, max=191.870981ms; kernel_model: matmul=7.222591 GFLOP (37.643 GFLOP/s @ duck_max), param_stream=1.033568G (5.387 Gparam/s @ duck_max), weight_stream=1109.380 MiB (6.063 GB/s @ duck_max) [2026-04-08 08:58:35.348913 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=76, expert_tiles=91, avg_tile_batch=2.11, prepare=52.76µs, send=1.29334ms, judge_wait=18.536596ms, fetch=1.473479ms, reduce=29ns; duck time-ns stats: p50=18.378943ms, p90=18.398053ms, max=18.407117ms; kernel_model: matmul=0.528482 GFLOP (28.711 GFLOP/s @ duck_max), param_stream=0.125239G (6.804 Gparam/s @ duck_max), weight_stream=134.426 MiB (7.658 GB/s @ duck_max) [2026-04-08 08:58:35.647954 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=220, expert_tiles=753, avg_tile_batch=3.48, prepare=768.753µs, send=18.361496ms, judge_wait=224.435389ms, fetch=20.636351ms, reduce=20ns; duck time-ns stats: p50=201.39601ms, p90=201.78007ms, max=202.13446ms; kernel_model: matmul=7.222591 GFLOP (35.732 GFLOP/s @ duck_max), param_stream=1.036321G (5.127 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.770 GB/s @ duck_max) [2026-04-08 08:58:35.920081 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=219, expert_tiles=754, avg_tile_batch=3.48, prepare=695.142µs, send=18.534228ms, judge_wait=215.833432ms, fetch=21.616067ms, reduce=136ns; duck time-ns stats: p50=193.429695ms, p90=193.896507ms, max=194.054791ms; kernel_model: matmul=7.222591 GFLOP (37.219 GFLOP/s @ duck_max), param_stream=1.037697G (5.347 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.018 GB/s @ duck_max) [2026-04-08 08:58:36.193405 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=218, expert_tiles=745, avg_tile_batch=3.52, prepare=649.568µs, send=18.539641ms, judge_wait=218.051415ms, fetch=20.639576ms, reduce=15ns; duck time-ns stats: p50=196.089258ms, p90=196.39212ms, max=196.529054ms; kernel_model: matmul=7.222591 GFLOP (36.751 GFLOP/s @ duck_max), param_stream=1.025311G (5.217 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.872 GB/s @ duck_max) [2026-04-08 08:58:36.470869 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=216, expert_tiles=748, avg_tile_batch=3.51, prepare=652.744µs, send=18.486514ms, judge_wait=221.265295ms, fetch=21.650009ms, reduce=15ns; duck time-ns stats: p50=200.922963ms, p90=201.124366ms, max=201.317247ms; kernel_model: matmul=7.222591 GFLOP (35.877 GFLOP/s @ duck_max), param_stream=1.029439G (5.114 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.755 GB/s @ duck_max) [2026-04-08 08:58:36.492961 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=87, expert_tiles=95, avg_tile_batch=2.02, prepare=53.156µs, send=1.37387ms, judge_wait=18.220148ms, fetch=1.472364ms, reduce=15ns; duck time-ns stats: p50=18.035554ms, p90=18.058674ms, max=18.089122ms; kernel_model: matmul=0.528482 GFLOP (29.215 GFLOP/s @ duck_max), param_stream=0.130744G (7.228 Gparam/s @ duck_max), weight_stream=140.334 MiB (8.135 GB/s @ duck_max) [2026-04-08 08:58:36.794188 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=758, avg_tile_batch=3.46, prepare=764.029µs, send=17.215947ms, judge_wait=226.801868ms, fetch=21.612364ms, reduce=135ns; duck time-ns stats: p50=204.773827ms, p90=205.003025ms, max=205.095029ms; kernel_model: matmul=7.222591 GFLOP (35.216 GFLOP/s @ duck_max), param_stream=1.043202G (5.086 Gparam/s @ duck_max), weight_stream=1119.720 MiB (5.725 GB/s @ duck_max) [2026-04-08 08:58:37.064972 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=232, expert_tiles=758, avg_tile_batch=3.46, prepare=652.503µs, send=18.34743ms, judge_wait=215.579019ms, fetch=20.62964ms, reduce=132ns; duck time-ns stats: p50=193.381606ms, p90=193.700196ms, max=193.86939ms; kernel_model: matmul=7.222591 GFLOP (37.255 GFLOP/s @ duck_max), param_stream=1.043202G (5.381 Gparam/s @ duck_max), weight_stream=1119.720 MiB (6.056 GB/s @ duck_max) [2026-04-08 08:58:37.337791 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=752, avg_tile_batch=3.49, prepare=651.371µs, send=18.349496ms, judge_wait=216.648721ms, fetch=21.644751ms, reduce=102ns; duck time-ns stats: p50=194.57894ms, p90=194.863788ms, max=195.049017ms; kernel_model: matmul=7.222591 GFLOP (37.030 GFLOP/s @ duck_max), param_stream=1.034945G (5.306 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.972 GB/s @ duck_max) [2026-04-08 08:58:37.612136 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=751, avg_tile_batch=3.49, prepare=653.535µs, send=18.413293ms, judge_wait=218.261769ms, fetch=21.59449ms, reduce=17ns; duck time-ns stats: p50=195.143368ms, p90=195.599746ms, max=195.828414ms; kernel_model: matmul=7.222591 GFLOP (36.882 GFLOP/s @ duck_max), param_stream=1.033568G (5.278 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.940 GB/s @ duck_max) [2026-04-08 08:58:37.635799 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=89, expert_tiles=98, avg_tile_batch=1.96, prepare=53.18µs, send=1.41712ms, judge_wait=19.712057ms, fetch=1.474074ms, reduce=106ns; duck time-ns stats: p50=19.498192ms, p90=19.52859ms, max=19.551076ms; kernel_model: matmul=0.528482 GFLOP (27.031 GFLOP/s @ duck_max), param_stream=0.134873G (6.898 Gparam/s @ duck_max), weight_stream=144.766 MiB (7.764 GB/s @ duck_max) [2026-04-08 08:58:37.925369 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=241, expert_tiles=750, avg_tile_batch=3.50, prepare=801.118µs, send=18.475549ms, judge_wait=213.810333ms, fetch=21.641727ms, reduce=138ns; duck time-ns stats: p50=191.689966ms, p90=191.963103ms, max=192.162401ms; kernel_model: matmul=7.222591 GFLOP (37.586 GFLOP/s @ duck_max), param_stream=1.032192G (5.371 Gparam/s @ duck_max), weight_stream=1107.903 MiB (6.046 GB/s @ duck_max) [2026-04-08 08:58:38.199519 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=755, avg_tile_batch=3.48, prepare=651.24µs, send=18.458859ms, judge_wait=218.982796ms, fetch=20.63832ms, reduce=21ns; duck time-ns stats: p50=195.737175ms, p90=196.081582ms, max=196.2922ms; kernel_model: matmul=7.222591 GFLOP (36.795 GFLOP/s @ duck_max), param_stream=1.039073G (5.294 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.958 GB/s @ duck_max) [2026-04-08 08:58:38.473530 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=232, expert_tiles=755, avg_tile_batch=3.48, prepare=651.246µs, send=18.531572ms, judge_wait=218.779545ms, fetch=20.629467ms, reduce=138ns; duck time-ns stats: p50=196.777348ms, p90=197.061581ms, max=197.112556ms; kernel_model: matmul=7.222591 GFLOP (36.642 GFLOP/s @ duck_max), param_stream=1.039073G (5.271 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.933 GB/s @ duck_max) [2026-04-08 08:58:38.747240 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=754, avg_tile_batch=3.48, prepare=662.348µs, send=18.497719ms, judge_wait=218.60028ms, fetch=20.636429ms, reduce=20ns; duck time-ns stats: p50=196.619471ms, p90=196.939397ms, max=197.187396ms; kernel_model: matmul=7.222591 GFLOP (36.628 GFLOP/s @ duck_max), param_stream=1.037697G (5.262 Gparam/s @ duck_max), weight_stream=1113.811 MiB (5.923 GB/s @ duck_max) [2026-04-08 08:58:38.771140 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=91, expert_tiles=99, avg_tile_batch=1.94, prepare=52.265µs, send=1.446613ms, judge_wait=19.916099ms, fetch=1.484014ms, reduce=138ns; duck time-ns stats: p50=19.684784ms, p90=19.729643ms, max=19.773913ms; kernel_model: matmul=0.528482 GFLOP (26.726 GFLOP/s @ duck_max), param_stream=0.136249G (6.890 Gparam/s @ duck_max), weight_stream=146.243 MiB (7.755 GB/s @ duck_max) [2026-04-08 08:58:39.067202 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=750, avg_tile_batch=3.50, prepare=785.632µs, send=17.212758ms, judge_wait=221.570789ms, fetch=21.654834ms, reduce=137ns; duck time-ns stats: p50=199.895361ms, p90=200.310243ms, max=200.407457ms; kernel_model: matmul=7.222591 GFLOP (36.040 GFLOP/s @ duck_max), param_stream=1.032192G (5.150 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.797 GB/s @ duck_max) [2026-04-08 08:58:39.346298 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=752, avg_tile_batch=3.49, prepare=656.562µs, send=17.21496ms, judge_wait=224.169065ms, fetch=21.644416ms, reduce=135ns; duck time-ns stats: p50=202.305372ms, p90=202.714872ms, max=202.896172ms; kernel_model: matmul=7.222591 GFLOP (35.597 GFLOP/s @ duck_max), param_stream=1.034945G (5.101 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.741 GB/s @ duck_max) [2026-04-08 08:58:39.628388 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=224, expert_tiles=748, avg_tile_batch=3.51, prepare=653.171µs, send=18.520877ms, judge_wait=225.883664ms, fetch=21.598022ms, reduce=20ns; duck time-ns stats: p50=204.240745ms, p90=204.507598ms, max=204.761247ms; kernel_model: matmul=7.222591 GFLOP (35.273 GFLOP/s @ duck_max), param_stream=1.029439G (5.028 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.658 GB/s @ duck_max) [2026-04-08 08:58:39.902021 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=751, avg_tile_batch=3.49, prepare=653.413µs, send=18.549779ms, judge_wait=217.323549ms, fetch=21.623915ms, reduce=134ns; duck time-ns stats: p50=197.304459ms, p90=197.475231ms, max=197.709139ms; kernel_model: matmul=7.222591 GFLOP (36.531 GFLOP/s @ duck_max), param_stream=1.033568G (5.228 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.884 GB/s @ duck_max) [2026-04-08 08:58:39.923891 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=74, expert_tiles=87, avg_tile_batch=2.21, prepare=52.363µs, send=1.292123ms, judge_wait=18.030831ms, fetch=1.486814ms, reduce=142ns; duck time-ns stats: p50=17.844735ms, p90=17.861595ms, max=17.901913ms; kernel_model: matmul=0.528482 GFLOP (29.521 GFLOP/s @ duck_max), param_stream=0.119734G (6.688 Gparam/s @ duck_max), weight_stream=128.517 MiB (7.528 GB/s @ duck_max) [2026-04-08 08:58:40.213356 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=753, avg_tile_batch=3.48, prepare=770.25µs, send=18.562767ms, judge_wait=213.82332ms, fetch=21.642127ms, reduce=28ns; duck time-ns stats: p50=192.248181ms, p90=192.500366ms, max=192.599452ms; kernel_model: matmul=7.222591 GFLOP (37.501 GFLOP/s @ duck_max), param_stream=1.036321G (5.381 Gparam/s @ duck_max), weight_stream=1112.334 MiB (6.056 GB/s @ duck_max) [2026-04-08 08:58:40.483197 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=746, avg_tile_batch=3.52, prepare=653.884µs, send=18.542197ms, judge_wait=213.475422ms, fetch=21.654257ms, reduce=135ns; duck time-ns stats: p50=191.60561ms, p90=191.889014ms, max=191.994561ms; kernel_model: matmul=7.222591 GFLOP (37.619 GFLOP/s @ duck_max), param_stream=1.026687G (5.347 Gparam/s @ duck_max), weight_stream=1101.994 MiB (6.019 GB/s @ duck_max) [2026-04-08 08:58:40.762062 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=223, expert_tiles=752, avg_tile_batch=3.49, prepare=657.358µs, send=18.417675ms, judge_wait=222.650602ms, fetch=21.652805ms, reduce=104ns; duck time-ns stats: p50=201.404637ms, p90=201.595155ms, max=201.670404ms; kernel_model: matmul=7.222591 GFLOP (35.814 GFLOP/s @ duck_max), param_stream=1.034945G (5.132 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.776 GB/s @ duck_max) [2026-04-08 08:58:41.036670 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=214, expert_tiles=748, avg_tile_batch=3.51, prepare=652.564µs, send=18.45135ms, judge_wait=218.381306ms, fetch=21.6604ms, reduce=15ns; duck time-ns stats: p50=196.261743ms, p90=196.585601ms, max=196.854597ms; kernel_model: matmul=7.222591 GFLOP (36.690 GFLOP/s @ duck_max), param_stream=1.029439G (5.229 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.886 GB/s @ duck_max) [2026-04-08 08:58:41.060904 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=85, expert_tiles=98, avg_tile_batch=1.96, prepare=55.643µs, send=1.290931ms, judge_wait=20.396383ms, fetch=1.485775ms, reduce=138ns; duck time-ns stats: p50=20.212553ms, p90=20.25578ms, max=20.26224ms; kernel_model: matmul=0.528482 GFLOP (26.082 GFLOP/s @ duck_max), param_stream=0.134873G (6.656 Gparam/s @ duck_max), weight_stream=144.766 MiB (7.492 GB/s @ duck_max) [2026-04-08 08:58:41.354742 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=758, avg_tile_batch=3.46, prepare=762.154µs, send=17.214417ms, judge_wait=219.575174ms, fetch=21.610605ms, reduce=138ns; duck time-ns stats: p50=198.584236ms, p90=198.857072ms, max=199.202938ms; kernel_model: matmul=7.222591 GFLOP (36.257 GFLOP/s @ duck_max), param_stream=1.043202G (5.237 Gparam/s @ duck_max), weight_stream=1119.720 MiB (5.894 GB/s @ duck_max) [2026-04-08 08:58:41.627689 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=748, avg_tile_batch=3.51, prepare=649.695µs, send=18.555273ms, judge_wait=216.694824ms, fetch=21.590166ms, reduce=138ns; duck time-ns stats: p50=193.56183ms, p90=193.719225ms, max=193.88298ms; kernel_model: matmul=7.222591 GFLOP (37.252 GFLOP/s @ duck_max), param_stream=1.029439G (5.310 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.976 GB/s @ duck_max) [2026-04-08 08:58:41.900064 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=747, avg_tile_batch=3.51, prepare=651.221µs, send=18.534533ms, judge_wait=217.069352ms, fetch=20.648054ms, reduce=14ns; duck time-ns stats: p50=194.374022ms, p90=194.579198ms, max=194.800139ms; kernel_model: matmul=7.222591 GFLOP (37.077 GFLOP/s @ duck_max), param_stream=1.028063G (5.278 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.940 GB/s @ duck_max) [2026-04-08 08:58:42.172049 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=223, expert_tiles=745, avg_tile_batch=3.52, prepare=672.36µs, send=18.449717ms, judge_wait=214.266058ms, fetch=21.660587ms, reduce=101ns; duck time-ns stats: p50=191.879894ms, p90=192.155882ms, max=192.330825ms; kernel_model: matmul=7.222591 GFLOP (37.553 GFLOP/s @ duck_max), param_stream=1.025311G (5.331 Gparam/s @ duck_max), weight_stream=1100.517 MiB (6.000 GB/s @ duck_max) [2026-04-08 08:58:42.196422 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=83, expert_tiles=94, avg_tile_batch=2.04, prepare=53.568µs, send=1.293186ms, judge_wait=20.521384ms, fetch=1.48108ms, reduce=139ns; duck time-ns stats: p50=20.308222ms, p90=20.35475ms, max=20.377479ms; kernel_model: matmul=0.528482 GFLOP (25.935 GFLOP/s @ duck_max), param_stream=0.129368G (6.349 Gparam/s @ duck_max), weight_stream=138.857 MiB (7.145 GB/s @ duck_max) [2026-04-08 08:58:42.488902 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=750, avg_tile_batch=3.50, prepare=747.118µs, send=18.492222ms, judge_wait=216.939876ms, fetch=21.635612ms, reduce=136ns; duck time-ns stats: p50=194.550074ms, p90=194.786981ms, max=195.015472ms; kernel_model: matmul=7.222591 GFLOP (37.036 GFLOP/s @ duck_max), param_stream=1.032192G (5.293 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.957 GB/s @ duck_max) [2026-04-08 08:58:42.769356 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=756, avg_tile_batch=3.47, prepare=650.237µs, send=17.211312ms, judge_wait=225.509555ms, fetch=21.687841ms, reduce=135ns; duck time-ns stats: p50=204.511731ms, p90=205.027979ms, max=205.504668ms; kernel_model: matmul=7.222591 GFLOP (35.146 GFLOP/s @ duck_max), param_stream=1.040450G (5.063 Gparam/s @ duck_max), weight_stream=1116.766 MiB (5.698 GB/s @ duck_max) [2026-04-08 08:58:43.041094 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=743, avg_tile_batch=3.53, prepare=653.584µs, send=18.54878ms, judge_wait=215.477845ms, fetch=21.647355ms, reduce=136ns; duck time-ns stats: p50=192.302851ms, p90=192.678832ms, max=193.052584ms; kernel_model: matmul=7.222591 GFLOP (37.413 GFLOP/s @ duck_max), param_stream=1.022558G (5.297 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.961 GB/s @ duck_max) [2026-04-08 08:58:43.316607 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=216, expert_tiles=755, avg_tile_batch=3.48, prepare=652.315µs, send=18.513494ms, judge_wait=219.418499ms, fetch=21.641618ms, reduce=137ns; duck time-ns stats: p50=196.830003ms, p90=197.097755ms, max=197.378411ms; kernel_model: matmul=7.222591 GFLOP (36.593 GFLOP/s @ duck_max), param_stream=1.039073G (5.264 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.925 GB/s @ duck_max) [2026-04-08 08:58:43.339950 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=86, expert_tiles=96, avg_tile_batch=2.00, prepare=52.913µs, send=1.478561ms, judge_wait=19.313812ms, fetch=1.488761ms, reduce=109ns; duck time-ns stats: p50=19.132761ms, p90=19.162399ms, max=19.172364ms; kernel_model: matmul=0.528482 GFLOP (27.565 GFLOP/s @ duck_max), param_stream=0.132121G (6.891 Gparam/s @ duck_max), weight_stream=141.812 MiB (7.756 GB/s @ duck_max) [2026-04-08 08:58:43.632650 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=757, avg_tile_batch=3.47, prepare=759.817µs, send=17.232875ms, judge_wait=218.636254ms, fetch=21.625776ms, reduce=135ns; duck time-ns stats: p50=196.062948ms, p90=196.201214ms, max=196.236628ms; kernel_model: matmul=7.222591 GFLOP (36.806 GFLOP/s @ duck_max), param_stream=1.041826G (5.309 Gparam/s @ duck_max), weight_stream=1118.243 MiB (5.975 GB/s @ duck_max) [2026-04-08 08:58:43.910930 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=753, avg_tile_batch=3.48, prepare=655.549µs, send=18.564603ms, judge_wait=221.994801ms, fetch=21.68362ms, reduce=138ns; duck time-ns stats: p50=201.836048ms, p90=202.149461ms, max=202.411017ms; kernel_model: matmul=7.222591 GFLOP (35.683 GFLOP/s @ duck_max), param_stream=1.036321G (5.120 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.762 GB/s @ duck_max) [2026-04-08 08:58:44.198372 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=755, avg_tile_batch=3.48, prepare=656.266µs, send=18.545666ms, judge_wait=231.216ms, fetch=21.671987ms, reduce=137ns; duck time-ns stats: p50=214.514221ms, p90=214.874294ms, max=215.061139ms; kernel_model: matmul=7.222591 GFLOP (33.584 GFLOP/s @ duck_max), param_stream=1.039073G (4.832 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.438 GB/s @ duck_max) [2026-04-08 08:58:44.479860 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=218, expert_tiles=750, avg_tile_batch=3.50, prepare=655.916µs, send=18.566904ms, judge_wait=226.359342ms, fetch=20.624908ms, reduce=135ns; duck time-ns stats: p50=208.643368ms, p90=208.910717ms, max=209.033824ms; kernel_model: matmul=7.222591 GFLOP (34.552 GFLOP/s @ duck_max), param_stream=1.032192G (4.938 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.558 GB/s @ duck_max) [2026-04-08 08:58:44.504637 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=84, expert_tiles=93, avg_tile_batch=2.06, prepare=53.503µs, send=1.480085ms, judge_wait=20.753798ms, fetch=1.484174ms, reduce=153ns; duck time-ns stats: p50=20.560442ms, p90=20.599131ms, max=20.611491ms; kernel_model: matmul=0.528482 GFLOP (25.640 GFLOP/s @ duck_max), param_stream=0.127992G (6.210 Gparam/s @ duck_max), weight_stream=137.380 MiB (6.989 GB/s @ duck_max) [2026-04-08 08:58:44.813705 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=223, expert_tiles=749, avg_tile_batch=3.50, prepare=764.833µs, send=17.216972ms, judge_wait=234.942972ms, fetch=21.634561ms, reduce=147ns; duck time-ns stats: p50=212.71322ms, p90=213.021717ms, max=213.11829ms; kernel_model: matmul=7.222591 GFLOP (33.890 GFLOP/s @ duck_max), param_stream=1.030816G (4.837 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.444 GB/s @ duck_max) [2026-04-08 08:58:45.084873 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=750, avg_tile_batch=3.50, prepare=651.912µs, send=18.573266ms, judge_wait=214.913584ms, fetch=21.634333ms, reduce=138ns; duck time-ns stats: p50=192.008599ms, p90=192.27197ms, max=192.539863ms; kernel_model: matmul=7.222591 GFLOP (37.512 GFLOP/s @ duck_max), param_stream=1.032192G (5.361 Gparam/s @ duck_max), weight_stream=1107.903 MiB (6.034 GB/s @ duck_max) [2026-04-08 08:58:45.366769 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=750, avg_tile_batch=3.50, prepare=649.291µs, send=18.539272ms, judge_wait=225.697236ms, fetch=21.651026ms, reduce=137ns; duck time-ns stats: p50=201.674513ms, p90=202.189662ms, max=202.395843ms; kernel_model: matmul=7.222591 GFLOP (35.685 GFLOP/s @ duck_max), param_stream=1.032192G (5.100 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.740 GB/s @ duck_max) [2026-04-08 08:58:45.645616 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=221, expert_tiles=743, avg_tile_batch=3.53, prepare=648.246µs, send=18.575973ms, judge_wait=223.683309ms, fetch=20.643137ms, reduce=22ns; duck time-ns stats: p50=201.371539ms, p90=201.798934ms, max=202.110436ms; kernel_model: matmul=7.222591 GFLOP (35.736 GFLOP/s @ duck_max), param_stream=1.022558G (5.059 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.694 GB/s @ duck_max) [2026-04-08 08:58:45.667161 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=74, expert_tiles=88, avg_tile_batch=2.18, prepare=53.735µs, send=1.453976ms, judge_wait=17.533206ms, fetch=1.493ms, reduce=106ns; duck time-ns stats: p50=17.343203ms, p90=17.38235ms, max=17.393792ms; kernel_model: matmul=0.528482 GFLOP (30.383 GFLOP/s @ duck_max), param_stream=0.121111G (6.963 Gparam/s @ duck_max), weight_stream=129.994 MiB (7.837 GB/s @ duck_max) [2026-04-08 08:58:45.972217 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=758, avg_tile_batch=3.46, prepare=765.617µs, send=17.213383ms, judge_wait=230.957121ms, fetch=21.648373ms, reduce=133ns; duck time-ns stats: p50=214.250678ms, p90=214.544846ms, max=214.619057ms; kernel_model: matmul=7.222591 GFLOP (33.653 GFLOP/s @ duck_max), param_stream=1.043202G (4.861 Gparam/s @ duck_max), weight_stream=1119.720 MiB (5.471 GB/s @ duck_max) [2026-04-08 08:58:46.260741 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=219, expert_tiles=743, avg_tile_batch=3.53, prepare=653.209µs, send=18.481121ms, judge_wait=233.379293ms, fetch=20.676418ms, reduce=22ns; duck time-ns stats: p50=218.795208ms, p90=219.15388ms, max=219.274888ms; kernel_model: matmul=7.222591 GFLOP (32.939 GFLOP/s @ duck_max), param_stream=1.022558G (4.663 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.249 GB/s @ duck_max) [2026-04-08 08:58:46.536427 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=220, expert_tiles=745, avg_tile_batch=3.52, prepare=650.927µs, send=18.524757ms, judge_wait=219.569946ms, fetch=21.614302ms, reduce=135ns; duck time-ns stats: p50=198.014601ms, p90=198.372546ms, max=198.706143ms; kernel_model: matmul=7.222591 GFLOP (36.348 GFLOP/s @ duck_max), param_stream=1.025311G (5.160 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.807 GB/s @ duck_max) [2026-04-08 08:58:46.816235 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=209, expert_tiles=744, avg_tile_batch=3.53, prepare=651.881µs, send=17.214562ms, judge_wait=225.865309ms, fetch=20.679125ms, reduce=137ns; duck time-ns stats: p50=206.010669ms, p90=206.688017ms, max=207.276443ms; kernel_model: matmul=7.222591 GFLOP (34.845 GFLOP/s @ duck_max), param_stream=1.023934G (4.940 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.560 GB/s @ duck_max) [2026-04-08 08:58:46.839528 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=88, expert_tiles=103, avg_tile_batch=1.86, prepare=51.794µs, send=1.386937ms, judge_wait=19.376835ms, fetch=1.479084ms, reduce=133ns; duck time-ns stats: p50=19.187318ms, p90=19.211501ms, max=19.234569ms; kernel_model: matmul=0.528482 GFLOP (27.476 GFLOP/s @ duck_max), param_stream=0.141754G (7.370 Gparam/s @ duck_max), weight_stream=152.152 MiB (8.295 GB/s @ duck_max) [2026-04-08 08:58:47.131952 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=237, expert_tiles=752, avg_tile_batch=3.49, prepare=759.45µs, send=17.228825ms, judge_wait=218.241969ms, fetch=21.638815ms, reduce=135ns; duck time-ns stats: p50=195.417187ms, p90=195.609163ms, max=195.913211ms; kernel_model: matmul=7.222591 GFLOP (36.866 GFLOP/s @ duck_max), param_stream=1.034945G (5.283 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.946 GB/s @ duck_max) [2026-04-08 08:58:47.406957 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=747, avg_tile_batch=3.51, prepare=653.876µs, send=18.520509ms, judge_wait=218.837843ms, fetch=21.652291ms, reduce=134ns; duck time-ns stats: p50=198.115809ms, p90=198.370216ms, max=198.628137ms; kernel_model: matmul=7.222591 GFLOP (36.362 GFLOP/s @ duck_max), param_stream=1.028063G (5.176 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.825 GB/s @ duck_max) [2026-04-08 08:58:47.678092 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=757, avg_tile_batch=3.47, prepare=650.495µs, send=18.544146ms, judge_wait=214.9433ms, fetch=21.631682ms, reduce=145ns; duck time-ns stats: p50=193.366294ms, p90=193.711836ms, max=193.767849ms; kernel_model: matmul=7.222591 GFLOP (37.274 GFLOP/s @ duck_max), param_stream=1.041826G (5.377 Gparam/s @ duck_max), weight_stream=1118.243 MiB (6.051 GB/s @ duck_max) [2026-04-08 08:58:47.947952 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=222, expert_tiles=747, avg_tile_batch=3.51, prepare=659.441µs, send=18.569469ms, judge_wait=213.55751ms, fetch=21.683342ms, reduce=133ns; duck time-ns stats: p50=190.474195ms, p90=190.807463ms, max=191.033994ms; kernel_model: matmul=7.222591 GFLOP (37.808 GFLOP/s @ duck_max), param_stream=1.028063G (5.382 Gparam/s @ duck_max), weight_stream=1103.471 MiB (6.057 GB/s @ duck_max) [2026-04-08 08:58:47.970744 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=95, expert_tiles=104, avg_tile_batch=1.85, prepare=52.971µs, send=1.369798ms, judge_wait=18.888055ms, fetch=1.483539ms, reduce=143ns; duck time-ns stats: p50=18.70664ms, p90=18.737986ms, max=18.753343ms; kernel_model: matmul=0.528482 GFLOP (28.181 GFLOP/s @ duck_max), param_stream=0.143131G (7.632 Gparam/s @ duck_max), weight_stream=153.629 MiB (8.590 GB/s @ duck_max) [2026-04-08 08:58:48.262572 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=757, avg_tile_batch=3.47, prepare=779.568µs, send=17.218117ms, judge_wait=218.636041ms, fetch=20.63619ms, reduce=139ns; duck time-ns stats: p50=194.294055ms, p90=194.669438ms, max=195.093067ms; kernel_model: matmul=7.222591 GFLOP (37.021 GFLOP/s @ duck_max), param_stream=1.041826G (5.340 Gparam/s @ duck_max), weight_stream=1118.243 MiB (6.010 GB/s @ duck_max) [2026-04-08 08:58:48.538931 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=750, avg_tile_batch=3.50, prepare=654.929µs, send=18.569842ms, judge_wait=220.09997ms, fetch=21.694064ms, reduce=139ns; duck time-ns stats: p50=198.685378ms, p90=198.924798ms, max=199.090585ms; kernel_model: matmul=7.222591 GFLOP (36.278 GFLOP/s @ duck_max), param_stream=1.032192G (5.185 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.835 GB/s @ duck_max) [2026-04-08 08:58:48.806158 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=222, expert_tiles=748, avg_tile_batch=3.51, prepare=656.834µs, send=18.550993ms, judge_wait=211.070232ms, fetch=21.621359ms, reduce=135ns; duck time-ns stats: p50=189.377291ms, p90=189.633889ms, max=189.796685ms; kernel_model: matmul=7.222591 GFLOP (38.054 GFLOP/s @ duck_max), param_stream=1.029439G (5.424 Gparam/s @ duck_max), weight_stream=1104.948 MiB (6.105 GB/s @ duck_max) [2026-04-08 08:58:49.100391 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=218, expert_tiles=754, avg_tile_batch=3.48, prepare=650.636µs, send=17.210302ms, judge_wait=239.345724ms, fetch=21.670382ms, reduce=138ns; duck time-ns stats: p50=217.218924ms, p90=217.528171ms, max=217.954911ms; kernel_model: matmul=7.222591 GFLOP (33.138 GFLOP/s @ duck_max), param_stream=1.037697G (4.761 Gparam/s @ duck_max), weight_stream=1113.811 MiB (5.359 GB/s @ duck_max) [2026-04-08 08:58:49.124161 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=82, expert_tiles=98, avg_tile_batch=1.96, prepare=54.75µs, send=1.395443ms, judge_wait=19.865576ms, fetch=1.471014ms, reduce=20ns; duck time-ns stats: p50=19.663081ms, p90=19.712325ms, max=19.716018ms; kernel_model: matmul=0.528482 GFLOP (26.805 GFLOP/s @ duck_max), param_stream=0.134873G (6.841 Gparam/s @ duck_max), weight_stream=144.766 MiB (7.699 GB/s @ duck_max) [2026-04-08 08:58:49.418571 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=209, expert_tiles=743, avg_tile_batch=3.53, prepare=766.251µs, send=17.211343ms, judge_wait=220.2216ms, fetch=21.659712ms, reduce=18ns; duck time-ns stats: p50=196.901362ms, p90=197.071824ms, max=197.255857ms; kernel_model: matmul=7.222591 GFLOP (36.615 GFLOP/s @ duck_max), param_stream=1.022558G (5.184 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.834 GB/s @ duck_max) [2026-04-08 08:58:49.689121 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=216, expert_tiles=743, avg_tile_batch=3.53, prepare=665.621µs, send=18.524946ms, judge_wait=215.388911ms, fetch=20.657997ms, reduce=137ns; duck time-ns stats: p50=194.034175ms, p90=194.555656ms, max=194.843876ms; kernel_model: matmul=7.222591 GFLOP (37.069 GFLOP/s @ duck_max), param_stream=1.022558G (5.248 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.907 GB/s @ duck_max) [2026-04-08 08:58:49.959559 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=213, expert_tiles=741, avg_tile_batch=3.54, prepare=652.844µs, send=18.57034ms, judge_wait=214.1539ms, fetch=21.66512ms, reduce=137ns; duck time-ns stats: p50=191.816275ms, p90=192.289569ms, max=192.645695ms; kernel_model: matmul=7.222591 GFLOP (37.492 GFLOP/s @ duck_max), param_stream=1.019806G (5.294 Gparam/s @ duck_max), weight_stream=1094.608 MiB (5.958 GB/s @ duck_max) [2026-04-08 08:58:50.229248 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=222, expert_tiles=745, avg_tile_batch=3.52, prepare=653.192µs, send=18.552111ms, judge_wait=213.509492ms, fetch=21.641695ms, reduce=137ns; duck time-ns stats: p50=190.92255ms, p90=191.110856ms, max=191.971563ms; kernel_model: matmul=7.222591 GFLOP (37.623 GFLOP/s @ duck_max), param_stream=1.025311G (5.341 Gparam/s @ duck_max), weight_stream=1100.517 MiB (6.011 GB/s @ duck_max) [2026-04-08 08:58:50.253762 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=24, top_k=8, tasks=192, unique_experts=96, expert_tiles=104, avg_tile_batch=1.85, prepare=52.753µs, send=1.466559ms, judge_wait=20.455272ms, fetch=1.483337ms, reduce=137ns; duck time-ns stats: p50=20.250593ms, p90=20.275513ms, max=20.316717ms; kernel_model: matmul=0.528482 GFLOP (26.012 GFLOP/s @ duck_max), param_stream=0.143131G (7.045 Gparam/s @ duck_max), weight_stream=153.629 MiB (7.929 GB/s @ duck_max) [2026-04-08 08:58:50.580687 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=748, avg_tile_batch=3.51, prepare=3.483632ms, send=19.844156ms, judge_wait=213.083127ms, fetch=21.600602ms, reduce=135ns; duck time-ns stats: p50=192.847749ms, p90=193.288183ms, max=194.045576ms; kernel_model: matmul=7.222591 GFLOP (37.221 GFLOP/s @ duck_max), param_stream=1.029439G (5.305 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.971 GB/s @ duck_max) [2026-04-08 08:58:50.851089 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=232, expert_tiles=750, avg_tile_batch=3.50, prepare=2.346766ms, send=17.463695ms, judge_wait=211.596692ms, fetch=21.620427ms, reduce=134ns; duck time-ns stats: p50=190.9645ms, p90=191.199395ms, max=191.40392ms; kernel_model: matmul=7.222591 GFLOP (37.735 GFLOP/s @ duck_max), param_stream=1.032192G (5.393 Gparam/s @ duck_max), weight_stream=1107.903 MiB (6.069 GB/s @ duck_max) [2026-04-08 08:58:51.121862 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=756, avg_tile_batch=3.47, prepare=3.006175ms, send=17.223983ms, judge_wait=214.940998ms, fetch=20.674757ms, reduce=20ns; duck time-ns stats: p50=192.758637ms, p90=193.049462ms, max=193.345463ms; kernel_model: matmul=7.222591 GFLOP (37.356 GFLOP/s @ duck_max), param_stream=1.040450G (5.381 Gparam/s @ duck_max), weight_stream=1116.766 MiB (6.057 GB/s @ duck_max) [2026-04-08 08:58:51.392736 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=745, avg_tile_batch=3.52, prepare=3.0123ms, send=18.580615ms, judge_wait=213.714808ms, fetch=20.639491ms, reduce=135ns; duck time-ns stats: p50=192.597251ms, p90=192.843383ms, max=193.045173ms; kernel_model: matmul=7.222591 GFLOP (37.414 GFLOP/s @ duck_max), param_stream=1.025311G (5.311 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.978 GB/s @ duck_max) [2026-04-08 08:58:51.417835 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=23, top_k=8, tasks=184, unique_experts=103, expert_tiles=109, avg_tile_batch=1.69, prepare=44.551µs, send=2.592209ms, judge_wait=19.757266ms, fetch=1.417118ms, reduce=135ns; duck time-ns stats: p50=19.546842ms, p90=19.585189ms, max=19.59957ms; kernel_model: matmul=0.506462 GFLOP (25.840 GFLOP/s @ duck_max), param_stream=0.150012G (7.654 Gparam/s @ duck_max), weight_stream=161.015 MiB (8.614 GB/s @ duck_max) [2026-04-08 08:58:51.435038 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.093038ms; phases: prepare=5.047µs, send=127.269µs, judge_wait=829.835µs, fetch=92.979µs, reduce=20ns, writeback=715ns; duck time-ns stats: p50=745.088µs, p90=748.975µs, max=752.266µs; effective_read: activated_experts=8, params=0.011010G (14.636 Gparam/s @ duck_max), memory=11.818 MiB (16.472 GB/s @ duck_max), judge_gap=77.569µs, judge_ratio=1.103x [2026-04-08 08:58:52.149000 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.196247ms; phases: prepare=7.122µs, send=225.193µs, judge_wait=828.29µs, fetch=96.712µs, reduce=20ns, writeback=536ns; duck time-ns stats: p50=701.427µs, p90=703.885µs, max=705.733µs; effective_read: activated_experts=8, params=0.011010G (15.601 Gparam/s @ duck_max), memory=11.818 MiB (17.559 GB/s @ duck_max), judge_gap=122.557µs, judge_ratio=1.174x Token # 1: 1864.661ms; value: next_token_ids=tensor([378], device='cuda:0') mtp accept=1 prop=378 top1=378 accp=1.000 next=draft=24268 prop=24268 olap pair=683.6ms serial=1262.5ms gain=578.9ms ratio=0.46 s0=599.4ms s1=663.1ms wait=0.2/45.5ms pred gate=device [2026-04-08 08:58:52.153008 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 972.408µs; phases: prepare=3.279µs, send=61.141µs, judge_wait=778.369µs, fetch=92.064µs, reduce=20ns, writeback=553ns; duck time-ns stats: p50=692.614µs, p90=698.835µs, max=702.211µs; effective_read: activated_experts=8, params=0.011010G (15.679 Gparam/s @ duck_max), memory=11.818 MiB (17.647 GB/s @ duck_max), judge_gap=76.158µs, judge_ratio=1.108x Token # 2: 3.832ms; value: next_token_ids=tensor([24268], device='cuda:0') mtp accept=1 prop=24268 top1=24268 accp=0.999 next=pair draft=1131 prop=1131 pred gate=device [2026-04-08 08:58:52.266011 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 976.741µs; phases: prepare=3.736µs, send=61.748µs, judge_wait=782.834µs, fetch=90.947µs, reduce=20ns, writeback=479ns; duck time-ns stats: p50=699.515µs, p90=703.531µs, max=706.904µs; effective_read: activated_experts=8, params=0.011010G (15.575 Gparam/s @ duck_max), memory=11.818 MiB (17.530 GB/s @ duck_max), judge_gap=75.93µs, judge_ratio=1.107x Token # 3: 113.092ms; value: next_token_ids=tensor([1131], device='cuda:0') mtp accept=1 prop=1131 top1=1131 accp=1.000 next=draft=642 prop=642 olap pair=107.8ms serial=190.1ms gain=82.3ms ratio=0.43 s0=4.5ms s1=185.6ms wait=0.1/51.9ms pred gate=device [2026-04-08 08:58:52.269971 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.02006ms; phases: prepare=3.119µs, send=61.428µs, judge_wait=778.641µs, fetch=99.651µs, reduce=138ns, writeback=452ns; duck time-ns stats: p50=693.465µs, p90=696.718µs, max=702.964µs; effective_read: activated_experts=8, params=0.011010G (15.662 Gparam/s @ duck_max), memory=11.818 MiB (17.628 GB/s @ duck_max), judge_gap=75.677µs, judge_ratio=1.108x Token # 4: 3.838ms; value: next_token_ids=tensor([642], device='cuda:0') mtp accept=1 prop=642 top1=642 accp=0.963 next=pair draft=768 prop=768 pred gate=device [2026-04-08 08:58:52.382696 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 977.25µs; phases: prepare=3.626µs, send=62.903µs, judge_wait=781.481µs, fetch=90.915µs, reduce=20ns, writeback=560ns; duck time-ns stats: p50=699.225µs, p90=703.142µs, max=704.325µs; effective_read: activated_experts=8, params=0.011010G (15.632 Gparam/s @ duck_max), memory=11.818 MiB (17.594 GB/s @ duck_max), judge_gap=77.156µs, judge_ratio=1.110x Token # 5: 112.820ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=1255 prop=1255 olap pair=107.6ms serial=190.5ms gain=82.9ms ratio=0.44 s0=4.3ms s1=186.2ms wait=0.1/52.0ms pred gate=device [2026-04-08 08:58:52.386566 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 972.327µs; phases: prepare=3.276µs, send=61.739µs, judge_wait=779.839µs, fetch=90.839µs, reduce=20ns, writeback=490ns; duck time-ns stats: p50=695.606µs, p90=703.964µs, max=705.14µs; effective_read: activated_experts=8, params=0.011010G (15.614 Gparam/s @ duck_max), memory=11.818 MiB (17.573 GB/s @ duck_max), judge_gap=74.699µs, judge_ratio=1.106x Token # 6: 3.783ms; value: next_token_ids=tensor([1275], device='cuda:0') mtp accept=0 prop=1255 top1=1275 accp=0.320 next=pair draft=112016 prop=112016 pred gate=device [2026-04-08 08:58:52.500025 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 983.119µs; phases: prepare=3.749µs, send=61.414µs, judge_wait=789.792µs, fetch=91.596µs, reduce=20ns, writeback=478ns; duck time-ns stats: p50=705.584µs, p90=710.096µs, max=713.796µs; effective_read: activated_experts=8, params=0.011010G (15.425 Gparam/s @ duck_max), memory=11.818 MiB (17.360 GB/s @ duck_max), judge_gap=75.996µs, judge_ratio=1.106x Token # 7: 113.570ms; value: next_token_ids=tensor([112016], device='cuda:0') mtp accept=1 prop=112016 top1=112016 accp=1.000 next=draft=24268 prop=24268 olap pair=108.3ms serial=191.6ms gain=83.4ms ratio=0.43 s0=4.9ms s1=186.8ms wait=0.1/51.2ms pred gate=device [2026-04-08 08:58:52.503900 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 965.636µs; phases: prepare=3.076µs, send=61.677µs, judge_wait=768.585µs, fetch=97.087µs, reduce=20ns, writeback=541ns; duck time-ns stats: p50=679.285µs, p90=688.425µs, max=693.195µs; effective_read: activated_experts=8, params=0.011010G (15.883 Gparam/s @ duck_max), memory=11.818 MiB (17.876 GB/s @ duck_max), judge_gap=75.39µs, judge_ratio=1.109x Token # 8: 3.763ms; value: next_token_ids=tensor([24268], device='cuda:0') mtp accept=1 prop=24268 top1=24268 accp=0.867 next=pair draft=16291 prop=14205 pred gate=device [2026-04-08 08:58:52.617451 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 974.942µs; phases: prepare=3.586µs, send=62.341µs, judge_wait=780.713µs, fetch=91.008µs, reduce=20ns, writeback=546ns; duck time-ns stats: p50=698.133µs, p90=703.487µs, max=705.959µs; effective_read: activated_experts=8, params=0.011010G (15.596 Gparam/s @ duck_max), memory=11.818 MiB (17.553 GB/s @ duck_max), judge_gap=74.754µs, judge_ratio=1.106x Token # 9: 113.678ms; value: next_token_ids=tensor([14205], device='cuda:0') mtp accept=1 prop=14205 top1=2053 accp=0.319 next=draft=545 prop=545 olap pair=108.4ms serial=192.0ms gain=83.6ms ratio=0.44 s0=4.6ms s1=187.4ms wait=0.1/51.8ms pred gate=device Token # 10: 3.778ms; value: next_token_ids=tensor([545], device='cuda:0') mtp accept=1 prop=545 top1=545 accp=1.000 next=pair draft=30869 prop=30869 pred gate=device Token # 11: 113.489ms; value: next_token_ids=tensor([30869], device='cuda:0') mtp accept=1 prop=30869 top1=984 accp=0.562 next=draft=22651 prop=22651 olap pair=108.2ms serial=191.8ms gain=83.6ms ratio=0.44 s0=4.5ms s1=187.3ms wait=0.1/51.7ms pred gate=device Token # 12: 3.878ms; value: next_token_ids=tensor([22651], device='cuda:0') mtp accept=1 prop=22651 top1=22651 accp=0.798 next=pair draft=4374 prop=4374 pred gate=device Token # 13: 114.243ms; value: next_token_ids=tensor([4374], device='cuda:0') mtp accept=1 prop=4374 top1=4374 accp=1.000 next=draft=1465 prop=1465 olap pair=108.9ms serial=193.0ms gain=84.2ms ratio=0.44 s0=4.0ms s1=189.0ms wait=0.1/52.9ms pred gate=device Token # 14: 3.766ms; value: next_token_ids=tensor([1465], device='cuda:0') mtp accept=1 prop=1465 top1=1465 accp=1.000 next=pair draft=13582 prop=13582 pred gate=device Token # 15: 114.792ms; value: next_token_ids=tensor([13582], device='cuda:0') mtp accept=1 prop=13582 top1=13582 accp=1.000 next=draft=21 prop=21 olap pair=109.5ms serial=194.4ms gain=84.9ms ratio=0.44 s0=3.9ms s1=190.5ms wait=0.1/53.2ms pred gate=device Token # 16: 3.768ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 17: 114.785ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=20 prop=20 olap pair=109.5ms serial=194.2ms gain=84.8ms ratio=0.44 s0=3.7ms s1=190.5ms wait=0.1/53.3ms pred gate=device Token # 18: 3.785ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 19: 115.009ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=36101 prop=36101 olap pair=109.6ms serial=194.0ms gain=84.4ms ratio=0.44 s0=4.7ms s1=189.4ms wait=0.1/52.2ms pred gate=device Token # 20: 3.783ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=0.999 next=pair draft=2971 prop=2971 pred gate=device Token # 21: 114.950ms; value: next_token_ids=tensor([2971], device='cuda:0') mtp accept=1 prop=2971 top1=2971 accp=1.000 next=draft=5367 prop=5866 olap pair=109.7ms serial=193.8ms gain=84.1ms ratio=0.43 s0=6.1ms s1=187.7ms wait=0.2/50.4ms pred gate=device Token # 22: 3.813ms; value: next_token_ids=tensor([5367], device='cuda:0') mtp accept=0 prop=5866 top1=5367 accp=0.822 next=pair draft=31826 prop=31826 pred gate=device Token # 23: 115.042ms; value: next_token_ids=tensor([31826], device='cuda:0') mtp accept=1 prop=31826 top1=556 accp=0.199 next=draft=1057 prop=1057 olap pair=109.8ms serial=192.8ms gain=83.0ms ratio=0.43 s0=4.6ms s1=188.2ms wait=0.1/52.1ms pred gate=device Token # 24: 3.748ms; value: next_token_ids=tensor([1057], device='cuda:0') mtp accept=1 prop=1057 top1=1057 accp=0.968 next=pair draft=3968 prop=3968 pred gate=device Token # 25: 114.548ms; value: next_token_ids=tensor([545], device='cuda:0') mtp accept=0 prop=3968 top1=545 accp=0.286 next=draft=18452 prop=18452 olap pair=109.2ms serial=193.6ms gain=84.4ms ratio=0.44 s0=4.3ms s1=189.3ms wait=0.1/52.0ms pred gate=device Token # 26: 114.926ms; value: next_token_ids=tensor([18452], device='cuda:0') mtp accept=1 prop=18452 top1=18452 accp=1.000 next=draft=1316 prop=1316 olap pair=109.6ms serial=194.5ms gain=84.9ms ratio=0.44 s0=3.8ms s1=190.7ms wait=0.1/53.3ms pred gate=device Token # 27: 3.774ms; value: next_token_ids=tensor([1316], device='cuda:0') mtp accept=1 prop=1316 top1=1316 accp=1.000 next=pair draft=3613 prop=3613 pred gate=device Token # 28: 114.602ms; value: next_token_ids=tensor([3613], device='cuda:0') mtp accept=1 prop=3613 top1=3613 accp=0.999 next=draft=970 prop=970 olap pair=109.3ms serial=194.1ms gain=84.7ms ratio=0.44 s0=3.9ms s1=190.2ms wait=0.1/53.0ms pred gate=device Token # 29: 3.840ms; value: next_token_ids=tensor([970], device='cuda:0') mtp accept=1 prop=970 top1=970 accp=0.770 next=pair draft=867 prop=867 pred gate=device Token # 30: 114.872ms; value: next_token_ids=tensor([867], device='cuda:0') mtp accept=1 prop=867 top1=867 accp=0.606 next=draft=13332 prop=13332 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=3.9ms s1=190.7ms wait=0.1/53.1ms pred gate=device Token # 31: 3.770ms; value: next_token_ids=tensor([13332], device='cuda:0') mtp accept=1 prop=13332 top1=13332 accp=0.871 next=pair draft=76148 prop=76148 pred gate=device Token # 32: 115.268ms; value: next_token_ids=tensor([76148], device='cuda:0') mtp accept=1 prop=76148 top1=76148 accp=1.000 next=draft=303 prop=303 olap pair=109.9ms serial=195.2ms gain=85.2ms ratio=0.44 s0=3.8ms s1=191.4ms wait=0.1/53.2ms pred gate=device Token # 33: 3.763ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.996 next=pair draft=7601 prop=7601 pred gate=device Token # 34: 114.394ms; value: next_token_ids=tensor([7601], device='cuda:0') mtp accept=1 prop=7601 top1=867 accp=0.538 next=draft=34221 prop=34221 olap pair=109.1ms serial=193.5ms gain=84.4ms ratio=0.44 s0=3.9ms s1=189.7ms wait=0.1/53.1ms pred gate=device Token # 35: 3.743ms; value: next_token_ids=tensor([34221], device='cuda:0') mtp accept=1 prop=34221 top1=34221 accp=0.964 next=pair draft=112016 prop=112016 pred gate=device Token # 36: 115.077ms; value: next_token_ids=tensor([428], device='cuda:0') mtp accept=0 prop=112016 top1=428 accp=0.431 next=draft=112016 prop=112016 olap pair=109.8ms serial=194.3ms gain=84.5ms ratio=0.44 s0=6.3ms s1=188.0ms wait=0.2/50.3ms pred gate=device Token # 37: 115.151ms; value: next_token_ids=tensor([112016], device='cuda:0') mtp accept=1 prop=112016 top1=112016 accp=1.000 next=draft=24268 prop=24268 olap pair=109.7ms serial=194.9ms gain=85.1ms ratio=0.44 s0=3.8ms s1=191.1ms wait=0.1/53.2ms pred gate=device Token # 38: 3.738ms; value: next_token_ids=tensor([24268], device='cuda:0') mtp accept=1 prop=24268 top1=24268 accp=1.000 next=pair draft=26663 prop=26663 pred gate=device Token # 39: 114.764ms; value: next_token_ids=tensor([26663], device='cuda:0') mtp accept=1 prop=26663 top1=26663 accp=0.981 next=draft=114131 prop=114131 olap pair=109.5ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.0ms wait=0.1/51.9ms pred gate=device Token # 40: 3.804ms; value: next_token_ids=tensor([32652], device='cuda:0') mtp accept=0 prop=114131 top1=32652 accp=0.162 next=pair draft=6221 prop=6221 pred gate=device Token # 41: 115.911ms; value: next_token_ids=tensor([6221], device='cuda:0') mtp accept=1 prop=6221 top1=6221 accp=1.000 next=draft=2073 prop=2073 olap pair=110.6ms serial=195.5ms gain=84.9ms ratio=0.43 s0=4.1ms s1=191.4ms wait=0.1/52.9ms pred gate=device Token # 42: 3.785ms; value: next_token_ids=tensor([2073], device='cuda:0') mtp accept=1 prop=2073 top1=2073 accp=0.999 next=pair draft=3515 prop=3515 pred gate=device Token # 43: 116.423ms; value: next_token_ids=tensor([3515], device='cuda:0') mtp accept=1 prop=3515 top1=3515 accp=0.724 next=draft=4427 prop=4427 olap pair=110.3ms serial=194.2ms gain=83.9ms ratio=0.43 s0=6.1ms s1=188.1ms wait=0.2/50.8ms pred gate=device Token # 44: 4.442ms; value: next_token_ids=tensor([4427], device='cuda:0') mtp accept=1 prop=4427 top1=4427 accp=0.923 next=pair draft=88618 prop=88618 pred gate=device Token # 45: 115.786ms; value: next_token_ids=tensor([2823], device='cuda:0') mtp accept=0 prop=88618 top1=2823 accp=0.123 next=draft=12543 prop=12543 olap pair=109.6ms serial=194.0ms gain=84.4ms ratio=0.44 s0=6.0ms s1=188.0ms wait=0.2/50.9ms pred gate=device Token # 46: 115.768ms; value: next_token_ids=tensor([88618], device='cuda:0') mtp accept=0 prop=12543 top1=12543 accp=0.937 next=draft=303 prop=303 olap pair=110.2ms serial=193.8ms gain=83.6ms ratio=0.43 s0=6.9ms s1=186.9ms wait=0.2/49.8ms pred gate=device Token # 47: 115.006ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.997 next=draft=8563 prop=8563 olap pair=109.7ms serial=194.6ms gain=84.9ms ratio=0.44 s0=3.9ms s1=190.8ms wait=0.1/53.2ms pred gate=device Token # 48: 3.777ms; value: next_token_ids=tensor([8563], device='cuda:0') mtp accept=1 prop=8563 top1=8563 accp=0.998 next=pair draft=14587 prop=4377 pred gate=device Token # 49: 115.258ms; value: next_token_ids=tensor([4377], device='cuda:0') mtp accept=1 prop=4377 top1=65008 accp=0.426 next=draft=14587 prop=14587 olap pair=109.8ms serial=194.7ms gain=84.9ms ratio=0.44 s0=4.0ms s1=190.7ms wait=0.1/52.9ms pred gate=device Token # 50: 3.786ms; value: next_token_ids=tensor([51315], device='cuda:0') mtp accept=0 prop=14587 top1=14587 accp=0.634 next=pair draft=64407 prop=64407 pred gate=device Token # 51: 114.686ms; value: next_token_ids=tensor([6147], device='cuda:0') mtp accept=0 prop=64407 top1=6147 accp=0.027 next=draft=1057 prop=1057 olap pair=109.3ms serial=193.3ms gain=84.0ms ratio=0.43 s0=6.2ms s1=187.2ms wait=0.2/50.5ms pred gate=device Token # 52: 114.446ms; value: next_token_ids=tensor([1057], device='cuda:0') mtp accept=1 prop=1057 top1=1057 accp=1.000 next=draft=116863 prop=116863 olap pair=109.0ms serial=193.3ms gain=84.3ms ratio=0.44 s0=3.8ms s1=189.5ms wait=0.1/53.2ms pred gate=device Token # 53: 3.800ms; value: next_token_ids=tensor([116863], device='cuda:0') mtp accept=1 prop=116863 top1=116863 accp=1.000 next=pair draft=36101 prop=36101 pred gate=device Token # 54: 115.940ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=0.613 next=draft=3500 prop=3500 olap pair=110.6ms serial=195.8ms gain=85.2ms ratio=0.44 s0=3.9ms s1=191.8ms wait=0.1/53.2ms pred gate=device Token # 55: 3.795ms; value: next_token_ids=tensor([3500], device='cuda:0') mtp accept=1 prop=3500 top1=3500 accp=0.995 next=pair draft=320 prop=320 pred gate=device Token # 56: 115.451ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=1877 prop=950 olap pair=110.2ms serial=193.3ms gain=83.1ms ratio=0.43 s0=4.3ms s1=189.0ms wait=0.1/53.0ms pred gate=device Token # 57: 3.833ms; value: next_token_ids=tensor([950], device='cuda:0') mtp accept=1 prop=950 top1=950 accp=0.773 next=pair draft=11049 prop=11049 pred gate=device Token # 58: 115.336ms; value: next_token_ids=tensor([11049], device='cuda:0') mtp accept=1 prop=11049 top1=11049 accp=0.879 next=draft=389 prop=389 olap pair=110.0ms serial=195.0ms gain=85.0ms ratio=0.44 s0=4.1ms s1=190.9ms wait=0.1/52.9ms pred gate=device Token # 59: 3.783ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=0.997 next=pair draft=6032 prop=6032 pred gate=device Token # 60: 114.830ms; value: next_token_ids=tensor([6032], device='cuda:0') mtp accept=1 prop=6032 top1=6032 accp=0.533 next=draft=1275 prop=1275 olap pair=109.5ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.4ms s1=189.8ms wait=0.1/52.1ms pred gate=device Token # 61: 3.808ms; value: next_token_ids=tensor([580], device='cuda:0') mtp accept=0 prop=1275 top1=580 accp=0.077 next=pair draft=653 prop=984 pred gate=device Token # 62: 115.270ms; value: next_token_ids=tensor([21978], device='cuda:0') mtp accept=0 prop=984 top1=21978 accp=0.098 next=draft=6742 prop=6742 olap pair=110.0ms serial=195.1ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.8ms wait=0.1/52.1ms pred gate=device Token # 63: 114.937ms; value: next_token_ids=tensor([6742], device='cuda:0') mtp accept=1 prop=6742 top1=6742 accp=0.998 next=draft=8738 prop=8738 olap pair=109.6ms serial=194.5ms gain=84.9ms ratio=0.44 s0=3.9ms s1=190.6ms wait=0.1/53.1ms pred gate=device Token # 64: 3.781ms; value: next_token_ids=tensor([8738], device='cuda:0') mtp accept=1 prop=8738 top1=8738 accp=0.999 next=pair draft=429 prop=429 pred gate=device Token # 65: 114.659ms; value: next_token_ids=tensor([429], device='cuda:0') mtp accept=1 prop=429 top1=429 accp=1.000 next=draft=30869 prop=30869 olap pair=109.3ms serial=194.0ms gain=84.6ms ratio=0.44 s0=3.9ms s1=190.1ms wait=0.1/53.2ms pred gate=device Token # 66: 3.790ms; value: next_token_ids=tensor([30869], device='cuda:0') mtp accept=1 prop=30869 top1=30869 accp=0.984 next=pair draft=22651 prop=22651 pred gate=device Token # 67: 115.283ms; value: next_token_ids=tensor([22651], device='cuda:0') mtp accept=1 prop=22651 top1=22651 accp=0.993 next=draft=4374 prop=4374 olap pair=109.9ms serial=194.7ms gain=84.9ms ratio=0.44 s0=4.2ms s1=190.5ms wait=0.1/52.3ms pred gate=device Token # 68: 3.875ms; value: next_token_ids=tensor([4374], device='cuda:0') mtp accept=1 prop=4374 top1=4374 accp=1.000 next=pair draft=1465 prop=1465 pred gate=device Token # 69: 115.285ms; value: next_token_ids=tensor([1465], device='cuda:0') mtp accept=1 prop=1465 top1=1465 accp=1.000 next=draft=13582 prop=13582 olap pair=110.0ms serial=193.8ms gain=83.8ms ratio=0.43 s0=4.4ms s1=189.4ms wait=0.1/52.6ms pred gate=device Token # 70: 3.827ms; value: next_token_ids=tensor([13582], device='cuda:0') mtp accept=1 prop=13582 top1=13582 accp=1.000 next=pair draft=21 prop=21 pred gate=device Token # 71: 114.665ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=draft=16 prop=16 olap pair=109.5ms serial=194.0ms gain=84.5ms ratio=0.44 s0=4.1ms s1=189.9ms wait=0.1/52.9ms pred gate=device Token # 72: 3.791ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 73: 115.342ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=1237 prop=1237 olap pair=110.1ms serial=195.0ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.7ms wait=0.1/52.1ms pred gate=device Token # 74: 3.815ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=1237 top1=223 accp=0.214 next=pair draft=8842 prop=8842 pred gate=device Token # 75: 115.526ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=1.000 next=draft=478 prop=478 olap pair=110.2ms serial=195.5ms gain=85.4ms ratio=0.44 s0=3.8ms s1=191.7ms wait=0.1/53.3ms pred gate=device Token # 76: 3.757ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=pair draft=666 prop=666 pred gate=device Token # 77: 114.982ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=0.898 next=draft=7346 prop=7346 olap pair=109.7ms serial=194.3ms gain=84.6ms ratio=0.44 s0=3.9ms s1=190.4ms wait=0.1/53.1ms pred gate=device Token # 78: 3.730ms; value: next_token_ids=tensor([1735], device='cuda:0') mtp accept=0 prop=7346 top1=1735 accp=0.372 next=pair draft=938 prop=938 pred gate=device Token # 79: 115.953ms; value: next_token_ids=tensor([938], device='cuda:0') mtp accept=1 prop=938 top1=938 accp=0.994 next=draft=8563 prop=8563 olap pair=110.6ms serial=196.0ms gain=85.4ms ratio=0.44 s0=3.9ms s1=192.0ms wait=0.1/53.1ms pred gate=device Token # 80: 3.771ms; value: next_token_ids=tensor([8563], device='cuda:0') mtp accept=1 prop=8563 top1=8563 accp=1.000 next=pair draft=11049 prop=11049 pred gate=device Token # 81: 114.818ms; value: next_token_ids=tensor([11049], device='cuda:0') mtp accept=1 prop=11049 top1=11049 accp=0.505 next=draft=5866 prop=5866 olap pair=109.5ms serial=193.7ms gain=84.2ms ratio=0.43 s0=4.2ms s1=189.5ms wait=0.1/52.4ms pred gate=device Token # 82: 3.755ms; value: next_token_ids=tensor([5866], device='cuda:0') mtp accept=1 prop=5866 top1=5866 accp=0.997 next=pair draft=12145 prop=5678 pred gate=device Token # 83: 114.820ms; value: next_token_ids=tensor([2541], device='cuda:0') mtp accept=0 prop=5678 top1=2541 accp=0.250 next=draft=223 prop=223 olap pair=109.5ms serial=194.2ms gain=84.6ms ratio=0.44 s0=4.0ms s1=190.1ms wait=0.1/53.1ms pred gate=device Token # 84: 115.252ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=0 prop=223 top1=2619 accp=0.133 next=draft=19 prop=19 olap pair=109.9ms serial=195.0ms gain=85.1ms ratio=0.44 s0=4.1ms s1=190.9ms wait=0.1/52.7ms pred gate=device Token # 85: 117.097ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=223 prop=223 olap pair=111.7ms serial=198.5ms gain=86.8ms ratio=0.44 s0=4.7ms s1=193.8ms wait=0.1/52.5ms pred gate=device Token # 86: 3.785ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.805 next=pair draft=2353 prop=2353 pred gate=device Token # 87: 115.350ms; value: next_token_ids=tensor([2353], device='cuda:0') mtp accept=1 prop=2353 top1=2353 accp=0.891 next=draft=26348 prop=26348 olap pair=110.1ms serial=194.9ms gain=84.8ms ratio=0.44 s0=4.3ms s1=190.6ms wait=0.1/52.5ms pred gate=device Token # 88: 3.807ms; value: next_token_ids=tensor([26348], device='cuda:0') mtp accept=1 prop=26348 top1=26348 accp=1.000 next=pair draft=58 prop=58 pred gate=device Token # 89: 114.525ms; value: next_token_ids=tensor([58], device='cuda:0') mtp accept=1 prop=58 top1=58 accp=1.000 next=draft=223 prop=223 olap pair=109.3ms serial=194.1ms gain=84.7ms ratio=0.44 s0=3.9ms s1=190.2ms wait=0.1/53.0ms pred gate=device Token # 90: 3.780ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=24091 prop=24091 pred gate=device Token # 91: 114.833ms; value: next_token_ids=tensor([24091], device='cuda:0') mtp accept=1 prop=24091 top1=24091 accp=1.000 next=draft=18 prop=18 olap pair=109.6ms serial=194.6ms gain=85.0ms ratio=0.44 s0=4.5ms s1=190.1ms wait=0.1/47.1ms pred gate=device Token # 92: 3.668ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=pair draft=666 prop=666 pred gate=device Token # 93: 114.299ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=0.837 next=draft=223 prop=223 olap pair=109.1ms serial=193.7ms gain=84.6ms ratio=0.44 s0=4.0ms s1=189.7ms wait=0.1/47.9ms pred gate=device Token # 94: 3.737ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.997 next=pair draft=17906 prop=17906 pred gate=device Token # 95: 116.019ms; value: next_token_ids=tensor([17906], device='cuda:0') mtp accept=1 prop=17906 top1=17906 accp=0.989 next=draft=2619 prop=2619 olap pair=109.9ms serial=194.5ms gain=84.6ms ratio=0.43 s0=5.6ms s1=188.9ms wait=0.2/46.1ms pred gate=device Token # 96: 4.549ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=pair draft=2111 prop=2111 pred gate=device Token # 97: 114.972ms; value: next_token_ids=tensor([2111], device='cuda:0') mtp accept=1 prop=2111 top1=2111 accp=1.000 next=draft=223 prop=223 olap pair=109.6ms serial=193.7ms gain=84.1ms ratio=0.43 s0=7.3ms s1=186.4ms wait=0.2/44.0ms pred gate=device Token # 98: 3.765ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=2426 prop=2426 pred gate=device Token # 99: 115.343ms; value: next_token_ids=tensor([2426], device='cuda:0') mtp accept=1 prop=2426 top1=2426 accp=0.999 next=draft=38775 prop=38775 olap pair=110.1ms serial=195.7ms gain=85.6ms ratio=0.44 s0=3.8ms s1=192.0ms wait=0.1/48.2ms pred gate=device Token # 100: 3.702ms; value: next_token_ids=tensor([38775], device='cuda:0') mtp accept=1 prop=38775 top1=38775 accp=1.000 next=pair draft=471 prop=471 pred gate=device Token # 101: 114.814ms; value: next_token_ids=tensor([471], device='cuda:0') mtp accept=1 prop=471 top1=471 accp=1.000 next=draft=1457 prop=1457 olap pair=109.5ms serial=193.5ms gain=84.0ms ratio=0.43 s0=7.0ms s1=186.5ms wait=0.2/44.2ms pred gate=device Token # 102: 3.739ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 103: 115.321ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.646 next=draft=844 prop=844 olap pair=110.1ms serial=195.4ms gain=85.3ms ratio=0.44 s0=4.1ms s1=191.3ms wait=0.1/47.5ms pred gate=device Token # 104: 3.702ms; value: next_token_ids=tensor([844], device='cuda:0') mtp accept=1 prop=844 top1=844 accp=0.684 next=pair draft=41727 prop=41727 pred gate=device Token # 105: 114.905ms; value: next_token_ids=tensor([41727], device='cuda:0') mtp accept=1 prop=41727 top1=41727 accp=0.986 next=draft=666 prop=666 olap pair=109.7ms serial=194.5ms gain=84.9ms ratio=0.44 s0=4.0ms s1=190.5ms wait=0.1/47.4ms pred gate=device Token # 106: 3.725ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 107: 114.119ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=8738 prop=8738 olap pair=108.9ms serial=193.2ms gain=84.3ms ratio=0.44 s0=4.0ms s1=189.2ms wait=0.1/47.7ms pred gate=device Token # 108: 3.682ms; value: next_token_ids=tensor([8738], device='cuda:0') mtp accept=1 prop=8738 top1=8738 accp=1.000 next=pair draft=2619 prop=2619 pred gate=device Token # 109: 114.650ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.789 next=draft=53091 prop=53091 olap pair=109.4ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.0ms s1=190.3ms wait=0.1/47.9ms pred gate=device Token # 110: 3.862ms; value: next_token_ids=tensor([53091], device='cuda:0') mtp accept=1 prop=53091 top1=53091 accp=0.947 next=pair draft=4374 prop=4374 pred gate=device Token # 111: 114.704ms; value: next_token_ids=tensor([4374], device='cuda:0') mtp accept=1 prop=4374 top1=4374 accp=1.000 next=draft=1465 prop=1465 olap pair=109.4ms serial=194.4ms gain=85.0ms ratio=0.44 s0=4.4ms s1=190.0ms wait=0.1/46.9ms pred gate=device Token # 112: 3.722ms; value: next_token_ids=tensor([1465], device='cuda:0') mtp accept=1 prop=1465 top1=1465 accp=1.000 next=pair draft=13582 prop=13582 pred gate=device Token # 113: 114.660ms; value: next_token_ids=tensor([13582], device='cuda:0') mtp accept=1 prop=13582 top1=13582 accp=1.000 next=draft=21 prop=21 olap pair=109.4ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.4ms s1=189.9ms wait=0.1/47.2ms pred gate=device Token # 114: 3.744ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 115: 114.734ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=20 prop=20 olap pair=109.6ms serial=194.9ms gain=85.2ms ratio=0.44 s0=3.8ms s1=191.1ms wait=0.1/48.1ms pred gate=device Token # 116: 3.657ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=1237 prop=1237 pred gate=device Token # 117: 114.294ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=0.800 next=draft=13380 prop=13380 olap pair=109.1ms serial=193.8ms gain=84.7ms ratio=0.44 s0=3.8ms s1=190.0ms wait=0.1/48.1ms pred gate=device Token # 118: 3.724ms; value: next_token_ids=tensor([28769], device='cuda:0') mtp accept=0 prop=13380 top1=28769 accp=0.036 next=pair draft=36 prop=36 pred gate=device Token # 119: 114.621ms; value: next_token_ids=tensor([36], device='cuda:0') mtp accept=1 prop=36 top1=36 accp=0.990 next=draft=223 prop=223 olap pair=109.4ms serial=192.6ms gain=83.3ms ratio=0.43 s0=4.6ms s1=188.1ms wait=0.1/46.9ms pred gate=device Token # 120: 3.831ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=10602 prop=10602 pred gate=device Token # 121: 114.851ms; value: next_token_ids=tensor([10602], device='cuda:0') mtp accept=1 prop=10602 top1=10602 accp=1.000 next=draft=303 prop=303 olap pair=109.6ms serial=193.6ms gain=83.9ms ratio=0.43 s0=4.7ms s1=188.9ms wait=0.1/46.9ms pred gate=device Token # 122: 3.728ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=12519 prop=12519 pred gate=device Token # 123: 114.435ms; value: next_token_ids=tensor([3910], device='cuda:0') mtp accept=0 prop=12519 top1=3910 accp=0.328 next=draft=373 prop=373 olap pair=109.2ms serial=194.0ms gain=84.7ms ratio=0.44 s0=4.5ms s1=189.5ms wait=0.1/46.9ms pred gate=device Token # 124: 115.469ms; value: next_token_ids=tensor([373], device='cuda:0') mtp accept=1 prop=373 top1=373 accp=1.000 next=draft=8835 prop=8835 olap pair=109.7ms serial=194.5ms gain=84.8ms ratio=0.44 s0=5.0ms s1=189.5ms wait=0.1/46.6ms pred gate=device Token # 125: 3.743ms; value: next_token_ids=tensor([8835], device='cuda:0') mtp accept=1 prop=8835 top1=8835 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 126: 114.429ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.982 next=draft=97740 prop=15158 olap pair=109.1ms serial=193.9ms gain=84.8ms ratio=0.44 s0=3.9ms s1=190.0ms wait=0.1/47.9ms pred gate=device Token # 127: 3.737ms; value: next_token_ids=tensor([15158], device='cuda:0') mtp accept=1 prop=15158 top1=15158 accp=0.267 next=pair draft=1564 prop=1564 pred gate=device Token # 128: 115.033ms; value: next_token_ids=tensor([1564], device='cuda:0') mtp accept=1 prop=1564 top1=1564 accp=1.000 next=draft=1227 prop=1227 olap pair=109.8ms serial=195.0ms gain=85.2ms ratio=0.44 s0=3.9ms s1=191.0ms wait=0.1/48.0ms pred gate=device Token # 129: 3.701ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=1 prop=1227 top1=1227 accp=1.000 next=pair draft=666 prop=666 pred gate=device Token # 130: 114.001ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=320 prop=320 olap pair=108.8ms serial=192.9ms gain=84.1ms ratio=0.44 s0=4.9ms s1=188.0ms wait=0.1/46.6ms pred gate=device Token # 131: 3.741ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=85566 prop=85566 pred gate=device Token # 132: 114.385ms; value: next_token_ids=tensor([85566], device='cuda:0') mtp accept=1 prop=85566 top1=85566 accp=0.996 next=draft=11841 prop=50933 olap pair=109.2ms serial=193.6ms gain=84.4ms ratio=0.44 s0=4.6ms s1=188.9ms wait=0.1/46.9ms pred gate=device Token # 133: 3.737ms; value: next_token_ids=tensor([43080], device='cuda:0') mtp accept=0 prop=50933 top1=43080 accp=0.006 next=pair draft=2619 prop=2619 pred gate=device Token # 134: 114.929ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.841 next=draft=5769 prop=5769 olap pair=109.6ms serial=194.4ms gain=84.8ms ratio=0.44 s0=5.5ms s1=188.9ms wait=0.2/46.1ms pred gate=device Token # 135: 3.779ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=1.000 next=pair draft=22 prop=22 pred gate=device Token # 136: 115.495ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=draft=17840 prop=17840 olap pair=109.5ms serial=193.7ms gain=84.2ms ratio=0.43 s0=7.6ms s1=186.1ms wait=0.2/43.5ms pred gate=device Token # 137: 4.618ms; value: next_token_ids=tensor([17840], device='cuda:0') mtp accept=1 prop=17840 top1=17840 accp=0.786 next=pair draft=666 prop=666 pred gate=device Token # 138: 114.792ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=223 accp=0.274 next=draft=223 prop=223 olap pair=109.4ms serial=194.1ms gain=84.7ms ratio=0.44 s0=5.0ms s1=189.1ms wait=0.1/47.1ms pred gate=device Token # 139: 3.785ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=14171 prop=14171 pred gate=device Token # 140: 115.161ms; value: next_token_ids=tensor([14171], device='cuda:0') mtp accept=1 prop=14171 top1=14171 accp=0.817 next=draft=6533 prop=6533 olap pair=109.9ms serial=195.4ms gain=85.5ms ratio=0.44 s0=3.9ms s1=191.5ms wait=0.1/48.0ms pred gate=device Token # 141: 3.837ms; value: next_token_ids=tensor([6533], device='cuda:0') mtp accept=1 prop=6533 top1=6533 accp=0.857 next=pair draft=525 prop=525 pred gate=device Token # 142: 114.624ms; value: next_token_ids=tensor([525], device='cuda:0') mtp accept=1 prop=525 top1=525 accp=1.000 next=draft=1237 prop=1237 olap pair=109.4ms serial=194.0ms gain=84.7ms ratio=0.44 s0=4.8ms s1=189.3ms wait=0.1/46.7ms pred gate=device Token # 143: 3.735ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=15150 prop=36132 pred gate=device Token # 144: 114.698ms; value: next_token_ids=tensor([8979], device='cuda:0') mtp accept=0 prop=36132 top1=15150 accp=0.570 next=draft=15150 prop=15150 olap pair=109.4ms serial=194.1ms gain=84.7ms ratio=0.44 s0=4.8ms s1=189.3ms wait=0.1/46.7ms pred gate=device Token # 145: 114.451ms; value: next_token_ids=tensor([15150], device='cuda:0') mtp accept=1 prop=15150 top1=15150 accp=0.996 next=draft=2284 prop=2284 olap pair=109.1ms serial=193.8ms gain=84.6ms ratio=0.44 s0=3.9ms s1=189.9ms wait=0.1/48.4ms pred gate=device Token # 146: 3.716ms; value: next_token_ids=tensor([545], device='cuda:0') mtp accept=0 prop=2284 top1=2284 accp=0.829 next=pair draft=2284 prop=2284 pred gate=device Token # 147: 114.228ms; value: next_token_ids=tensor([428], device='cuda:0') mtp accept=0 prop=2284 top1=428 accp=0.053 next=draft=7163 prop=7163 olap pair=109.0ms serial=193.5ms gain=84.5ms ratio=0.44 s0=3.9ms s1=189.6ms wait=0.1/48.3ms pred gate=device Token # 148: 115.191ms; value: next_token_ids=tensor([7163], device='cuda:0') mtp accept=1 prop=7163 top1=7163 accp=1.000 next=draft=27521 prop=27521 olap pair=109.9ms serial=195.2ms gain=85.4ms ratio=0.44 s0=3.8ms s1=191.4ms wait=0.1/48.5ms pred gate=device Token # 149: 3.702ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=19698 prop=19698 pred gate=device Token # 150: 115.862ms; value: next_token_ids=tensor([19698], device='cuda:0') mtp accept=1 prop=19698 top1=19698 accp=1.000 next=draft=223 prop=223 olap pair=109.8ms serial=194.3ms gain=84.5ms ratio=0.43 s0=6.3ms s1=188.0ms wait=0.2/45.4ms pred gate=device Token # 151: 4.863ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=0 prop=223 top1=389 accp=0.052 next=pair draft=1703 prop=1703 pred gate=device Token # 152: 115.753ms; value: next_token_ids=tensor([1703], device='cuda:0') mtp accept=1 prop=1703 top1=1703 accp=1.000 next=draft=996 prop=996 olap pair=110.3ms serial=195.8ms gain=85.5ms ratio=0.44 s0=4.3ms s1=191.5ms wait=0.1/47.7ms pred gate=device Token # 153: 3.723ms; value: next_token_ids=tensor([996], device='cuda:0') mtp accept=1 prop=996 top1=996 accp=1.000 next=pair draft=3467 prop=3467 pred gate=device Token # 154: 114.881ms; value: next_token_ids=tensor([3467], device='cuda:0') mtp accept=1 prop=3467 top1=3467 accp=1.000 next=draft=4231 prop=4231 olap pair=109.6ms serial=194.6ms gain=84.9ms ratio=0.44 s0=4.9ms s1=189.7ms wait=0.1/46.6ms pred gate=device Token # 155: 3.778ms; value: next_token_ids=tensor([4231], device='cuda:0') mtp accept=1 prop=4231 top1=4231 accp=0.636 next=pair draft=66 prop=66 pred gate=device Token # 156: 114.798ms; value: next_token_ids=tensor([66], device='cuda:0') mtp accept=1 prop=66 top1=66 accp=0.586 next=draft=9047 prop=9047 olap pair=109.6ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.8ms s1=189.5ms wait=0.2/46.5ms pred gate=device Token # 157: 3.731ms; value: next_token_ids=tensor([9047], device='cuda:0') mtp accept=1 prop=9047 top1=9047 accp=1.000 next=pair draft=36666 prop=36666 pred gate=device Token # 158: 115.365ms; value: next_token_ids=tensor([36666], device='cuda:0') mtp accept=1 prop=36666 top1=36666 accp=1.000 next=draft=45834 prop=45834 olap pair=110.0ms serial=195.5ms gain=85.4ms ratio=0.44 s0=4.4ms s1=191.1ms wait=0.1/47.1ms pred gate=device Token # 159: 3.718ms; value: next_token_ids=tensor([45834], device='cuda:0') mtp accept=1 prop=45834 top1=45834 accp=1.000 next=pair draft=66 prop=66 pred gate=device Token # 160: 114.925ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=0 prop=66 top1=31 accp=0.476 next=draft=11154 prop=11154 olap pair=109.6ms serial=194.6ms gain=85.1ms ratio=0.44 s0=4.4ms s1=190.3ms wait=0.1/47.2ms pred gate=device Token # 161: 115.899ms; value: next_token_ids=tensor([11154], device='cuda:0') mtp accept=1 prop=11154 top1=11154 accp=1.000 next=draft=26 prop=26 olap pair=110.6ms serial=196.8ms gain=86.2ms ratio=0.44 s0=4.4ms s1=192.4ms wait=0.1/47.4ms pred gate=device Token # 162: 3.691ms; value: next_token_ids=tensor([26], device='cuda:0') mtp accept=1 prop=26 top1=26 accp=1.000 next=pair draft=66 prop=66 pred gate=device Token # 163: 114.599ms; value: next_token_ids=tensor([66], device='cuda:0') mtp accept=1 prop=66 top1=66 accp=1.000 next=draft=7417 prop=7417 olap pair=109.4ms serial=194.4ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.1ms wait=0.1/47.6ms pred gate=device Token # 164: 3.694ms; value: next_token_ids=tensor([7417], device='cuda:0') mtp accept=1 prop=7417 top1=7417 accp=0.857 next=pair draft=10861 prop=5771 pred gate=device Token # 165: 115.106ms; value: next_token_ids=tensor([10861], device='cuda:0') mtp accept=0 prop=5771 top1=10861 accp=0.782 next=draft=5843 prop=10909 olap pair=109.8ms serial=195.3ms gain=85.4ms ratio=0.44 s0=3.9ms s1=191.3ms wait=0.1/48.3ms pred gate=device Token # 166: 114.672ms; value: next_token_ids=tensor([10909], device='cuda:0') mtp accept=1 prop=10909 top1=5843 accp=0.892 next=draft=9270 prop=9270 olap pair=109.3ms serial=194.0ms gain=84.7ms ratio=0.44 s0=3.8ms s1=190.2ms wait=0.1/48.5ms pred gate=device Token # 167: 3.720ms; value: next_token_ids=tensor([9270], device='cuda:0') mtp accept=1 prop=9270 top1=9270 accp=1.000 next=pair draft=5189 prop=5189 pred gate=device Token # 168: 114.235ms; value: next_token_ids=tensor([5189], device='cuda:0') mtp accept=1 prop=5189 top1=5189 accp=1.000 next=draft=94 prop=94 olap pair=109.1ms serial=193.7ms gain=84.6ms ratio=0.44 s0=4.0ms s1=189.6ms wait=0.1/48.1ms pred gate=device Token # 169: 3.700ms; value: next_token_ids=tensor([94], device='cuda:0') mtp accept=1 prop=94 top1=94 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 170: 115.167ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=10909 prop=10909 olap pair=109.9ms serial=195.4ms gain=85.4ms ratio=0.44 s0=4.6ms s1=190.7ms wait=0.1/47.5ms pred gate=device Token # 171: 3.716ms; value: next_token_ids=tensor([10909], device='cuda:0') mtp accept=1 prop=10909 top1=10909 accp=1.000 next=pair draft=369 prop=369 pred gate=device Token # 172: 114.256ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=0.994 next=draft=223 prop=223 olap pair=109.1ms serial=193.7ms gain=84.7ms ratio=0.44 s0=3.8ms s1=189.9ms wait=0.1/48.6ms pred gate=device Token # 173: 3.743ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=24292 prop=24292 pred gate=device Token # 174: 114.632ms; value: next_token_ids=tensor([24292], device='cuda:0') mtp accept=1 prop=24292 top1=24292 accp=1.000 next=draft=7640 prop=7640 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=3.9ms s1=190.4ms wait=0.1/48.5ms pred gate=device Token # 175: 3.712ms; value: next_token_ids=tensor([7640], device='cuda:0') mtp accept=1 prop=7640 top1=7640 accp=0.999 next=pair draft=94 prop=94 pred gate=device Token # 176: 115.227ms; value: next_token_ids=tensor([94], device='cuda:0') mtp accept=1 prop=94 top1=94 accp=1.000 next=draft=1313 prop=1313 olap pair=110.0ms serial=194.4ms gain=84.3ms ratio=0.43 s0=7.4ms s1=187.0ms wait=0.2/44.0ms pred gate=device Token # 177: 3.750ms; value: next_token_ids=tensor([1313], device='cuda:0') mtp accept=1 prop=1313 top1=1313 accp=1.000 next=pair draft=5013 prop=5013 pred gate=device Token # 178: 114.587ms; value: next_token_ids=tensor([5013], device='cuda:0') mtp accept=1 prop=5013 top1=5013 accp=1.000 next=draft=369 prop=369 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=3.9ms s1=190.4ms wait=0.1/48.3ms pred gate=device Token # 179: 3.774ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=1.000 next=pair draft=1313 prop=1313 pred gate=device Token # 180: 115.423ms; value: next_token_ids=tensor([1313], device='cuda:0') mtp accept=1 prop=1313 top1=1313 accp=1.000 next=draft=5013 prop=5013 olap pair=109.4ms serial=193.6ms gain=84.2ms ratio=0.43 s0=6.0ms s1=187.6ms wait=0.2/46.3ms pred gate=device Token # 181: 4.572ms; value: next_token_ids=tensor([5013], device='cuda:0') mtp accept=1 prop=5013 top1=5013 accp=1.000 next=pair draft=7640 prop=7640 pred gate=device Token # 182: 115.490ms; value: next_token_ids=tensor([7640], device='cuda:0') mtp accept=1 prop=7640 top1=7640 accp=1.000 next=draft=94 prop=94 olap pair=109.3ms serial=192.8ms gain=83.5ms ratio=0.43 s0=8.0ms s1=184.8ms wait=0.2/43.5ms pred gate=device Token # 183: 4.564ms; value: next_token_ids=tensor([94], device='cuda:0') mtp accept=1 prop=94 top1=94 accp=1.000 next=pair draft=2619 prop=2619 pred gate=device Token # 184: 114.907ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=draft=93130 prop=93130 olap pair=109.5ms serial=193.9ms gain=84.4ms ratio=0.44 s0=6.3ms s1=187.6ms wait=0.2/45.4ms pred gate=device Token # 185: 3.694ms; value: next_token_ids=tensor([93130], device='cuda:0') mtp accept=1 prop=93130 top1=93130 accp=1.000 next=pair draft=90974 prop=90974 pred gate=device Token # 186: 115.388ms; value: next_token_ids=tensor([90974], device='cuda:0') mtp accept=1 prop=90974 top1=90974 accp=1.000 next=draft=666 prop=666 olap pair=110.2ms serial=196.0ms gain=85.7ms ratio=0.44 s0=4.5ms s1=191.5ms wait=0.1/47.2ms pred gate=device Token # 187: 3.721ms; value: next_token_ids=tensor([1121], device='cuda:0') mtp accept=0 prop=666 top1=2295 accp=0.259 next=pair draft=666 prop=666 pred gate=device Token # 188: 114.796ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=369 prop=369 olap pair=109.5ms serial=194.4ms gain=84.9ms ratio=0.44 s0=4.8ms s1=189.6ms wait=0.1/46.8ms pred gate=device Token # 189: 3.697ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=1.000 next=pair draft=2619 prop=2619 pred gate=device Token # 190: 114.597ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=2619 top1=223 accp=0.146 next=draft=856 prop=856 olap pair=109.3ms serial=193.4ms gain=84.0ms ratio=0.43 s0=4.7ms s1=188.7ms wait=0.1/46.9ms pred gate=device Token # 191: 114.992ms; value: next_token_ids=tensor([856], device='cuda:0') mtp accept=1 prop=856 top1=856 accp=1.000 next=draft=16 prop=16 olap pair=109.6ms serial=194.5ms gain=84.9ms ratio=0.44 s0=4.5ms s1=189.9ms wait=0.1/47.2ms pred gate=device Token # 192: 3.757ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=pair draft=14882 prop=14882 pred gate=device Token # 193: 115.493ms; value: next_token_ids=tensor([14882], device='cuda:0') mtp accept=1 prop=14882 top1=14882 accp=1.000 next=draft=28828 prop=28828 olap pair=110.2ms serial=196.0ms gain=85.8ms ratio=0.44 s0=4.4ms s1=191.6ms wait=0.1/47.1ms pred gate=device Token # 194: 3.778ms; value: next_token_ids=tensor([17840], device='cuda:0') mtp accept=0 prop=28828 top1=17840 accp=0.316 next=pair draft=2283 prop=2283 pred gate=device Token # 195: 115.691ms; value: next_token_ids=tensor([17], device='cuda:0') mtp accept=0 prop=2283 top1=17 accp=0.003 next=draft=11274 prop=11274 olap pair=110.4ms serial=196.3ms gain=85.9ms ratio=0.44 s0=4.3ms s1=192.0ms wait=0.1/47.5ms pred gate=device Token # 196: 114.958ms; value: next_token_ids=tensor([11274], device='cuda:0') mtp accept=1 prop=11274 top1=11274 accp=1.000 next=draft=7640 prop=7640 olap pair=109.6ms serial=194.8ms gain=85.2ms ratio=0.44 s0=3.8ms s1=191.0ms wait=0.1/48.5ms pred gate=device Token # 197: 3.750ms; value: next_token_ids=tensor([7640], device='cuda:0') mtp accept=1 prop=7640 top1=7640 accp=1.000 next=pair draft=94 prop=94 pred gate=device Token # 198: 114.675ms; value: next_token_ids=tensor([94], device='cuda:0') mtp accept=1 prop=94 top1=94 accp=1.000 next=draft=2619 prop=2619 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.3ms s1=190.0ms wait=0.1/47.3ms pred gate=device Token # 199: 3.727ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=pair draft=2793 prop=2793 pred gate=device Token # 200: 115.641ms; value: next_token_ids=tensor([2793], device='cuda:0') mtp accept=1 prop=2793 top1=2793 accp=0.999 next=draft=4055 prop=4055 olap pair=110.5ms serial=196.4ms gain=85.9ms ratio=0.44 s0=3.9ms s1=192.5ms wait=0.1/48.4ms pred gate=device Token # 201: 3.717ms; value: next_token_ids=tensor([47948], device='cuda:0') mtp accept=0 prop=4055 top1=47948 accp=0.005 next=pair draft=223 prop=223 pred gate=device Token # 202: 114.805ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=22827 prop=22827 olap pair=109.5ms serial=193.5ms gain=84.0ms ratio=0.43 s0=5.7ms s1=187.8ms wait=0.2/45.9ms pred gate=device Token # 203: 3.716ms; value: next_token_ids=tensor([2130], device='cuda:0') mtp accept=0 prop=22827 top1=45045 accp=0.133 next=pair draft=666 prop=666 pred gate=device Token # 204: 114.640ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=0.994 next=draft=369 prop=369 olap pair=109.4ms serial=194.5ms gain=85.1ms ratio=0.44 s0=3.9ms s1=190.7ms wait=0.1/48.6ms pred gate=device Token # 205: 3.742ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 206: 115.084ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.996 next=draft=18 prop=18 olap pair=109.9ms serial=195.2ms gain=85.3ms ratio=0.44 s0=3.9ms s1=191.3ms wait=0.1/48.5ms pred gate=device Token # 207: 3.865ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 208: 114.547ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=27521 prop=27521 olap pair=109.3ms serial=194.3ms gain=84.9ms ratio=0.44 s0=3.9ms s1=190.4ms wait=0.1/48.4ms pred gate=device Token # 209: 3.716ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 210: 115.114ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=11274 prop=11274 olap pair=109.8ms serial=195.1ms gain=85.3ms ratio=0.44 s0=4.0ms s1=191.1ms wait=0.1/48.3ms pred gate=device Token # 211: 3.738ms; value: next_token_ids=tensor([11274], device='cuda:0') mtp accept=1 prop=11274 top1=11274 accp=1.000 next=pair draft=7640 prop=7640 pred gate=device Token # 212: 115.926ms; value: next_token_ids=tensor([7640], device='cuda:0') mtp accept=1 prop=7640 top1=7640 accp=1.000 next=draft=94 prop=94 olap pair=109.9ms serial=194.7ms gain=84.8ms ratio=0.44 s0=6.2ms s1=188.5ms wait=0.2/45.7ms pred gate=device Token # 213: 4.584ms; value: next_token_ids=tensor([94], device='cuda:0') mtp accept=1 prop=94 top1=94 accp=1.000 next=pair draft=2619 prop=2619 pred gate=device Token # 214: 114.827ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=draft=1833 prop=1833 olap pair=109.4ms serial=194.5ms gain=85.1ms ratio=0.44 s0=4.1ms s1=190.4ms wait=0.1/48.3ms pred gate=device Token # 215: 3.763ms; value: next_token_ids=tensor([1833], device='cuda:0') mtp accept=1 prop=1833 top1=1833 accp=0.670 next=pair draft=47948 prop=47948 pred gate=device Token # 216: 114.839ms; value: next_token_ids=tensor([47948], device='cuda:0') mtp accept=1 prop=47948 top1=47948 accp=0.999 next=draft=223 prop=223 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=3.8ms s1=190.9ms wait=0.1/48.5ms pred gate=device Token # 217: 3.738ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=109835 prop=2130 pred gate=device Token # 218: 114.197ms; value: next_token_ids=tensor([2130], device='cuda:0') mtp accept=1 prop=2130 top1=2130 accp=0.485 next=draft=666 prop=666 olap pair=109.0ms serial=193.3ms gain=84.3ms ratio=0.44 s0=4.6ms s1=188.7ms wait=0.1/47.0ms pred gate=device Token # 219: 3.666ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=pair draft=369 prop=369 pred gate=device Token # 220: 114.219ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=1.000 next=draft=223 prop=223 olap pair=109.0ms serial=193.4ms gain=84.5ms ratio=0.44 s0=4.4ms s1=189.0ms wait=0.1/47.2ms pred gate=device Token # 221: 3.803ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=2738 prop=2738 pred gate=device Token # 222: 115.200ms; value: next_token_ids=tensor([2738], device='cuda:0') mtp accept=1 prop=2738 top1=2738 accp=1.000 next=draft=223 prop=223 olap pair=110.0ms serial=194.4ms gain=84.4ms ratio=0.43 s0=4.5ms s1=189.9ms wait=0.1/47.4ms pred gate=device Token # 223: 3.780ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.922 next=pair draft=7016 prop=7016 pred gate=device Token # 224: 115.264ms; value: next_token_ids=tensor([7016], device='cuda:0') mtp accept=1 prop=7016 top1=7016 accp=1.000 next=draft=11274 prop=11274 olap pair=109.9ms serial=195.3ms gain=85.4ms ratio=0.44 s0=4.6ms s1=190.7ms wait=0.1/47.1ms pred gate=device Token # 225: 3.706ms; value: next_token_ids=tensor([11274], device='cuda:0') mtp accept=1 prop=11274 top1=11274 accp=1.000 next=pair draft=7640 prop=7640 pred gate=device Token # 226: 115.386ms; value: next_token_ids=tensor([7640], device='cuda:0') mtp accept=1 prop=7640 top1=7640 accp=1.000 next=draft=94 prop=94 olap pair=110.2ms serial=196.0ms gain=85.8ms ratio=0.44 s0=4.0ms s1=191.9ms wait=0.1/48.1ms pred gate=device Token # 227: 3.759ms; value: next_token_ids=tensor([94], device='cuda:0') mtp accept=1 prop=94 top1=94 accp=1.000 next=pair draft=2619 prop=2619 pred gate=device Token # 228: 114.968ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.967 next=draft=47 prop=47 olap pair=109.7ms serial=195.1ms gain=85.4ms ratio=0.44 s0=3.8ms s1=191.3ms wait=0.1/48.5ms pred gate=device Token # 229: 3.796ms; value: next_token_ids=tensor([47], device='cuda:0') mtp accept=1 prop=47 top1=47 accp=1.000 next=pair draft=8835 prop=8835 pred gate=device Token # 230: 114.506ms; value: next_token_ids=tensor([8835], device='cuda:0') mtp accept=1 prop=8835 top1=8835 accp=1.000 next=draft=19 prop=19 olap pair=109.3ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.7ms s1=189.5ms wait=0.1/47.6ms pred gate=device Token # 231: 3.695ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=0.716 next=pair draft=4904 prop=4904 pred gate=device Token # 232: 114.495ms; value: next_token_ids=tensor([4904], device='cuda:0') mtp accept=1 prop=4904 top1=4904 accp=0.961 next=draft=31 prop=31 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.1ms s1=190.2ms wait=0.1/47.7ms pred gate=device Token # 233: 3.802ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=0.999 next=pair draft=19 prop=19 pred gate=device Token # 234: 114.460ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=223 prop=223 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=4.4ms s1=189.8ms wait=0.1/47.2ms pred gate=device Token # 235: 3.797ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.995 next=pair draft=10768 prop=47047 pred gate=device Token # 236: 114.914ms; value: next_token_ids=tensor([10768], device='cuda:0') mtp accept=0 prop=47047 top1=10768 accp=0.642 next=draft=666 prop=666 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=4.4ms s1=190.5ms wait=0.1/47.1ms pred gate=device Token # 237: 115.320ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=369 prop=369 olap pair=110.0ms serial=195.5ms gain=85.5ms ratio=0.44 s0=4.4ms s1=191.1ms wait=0.1/47.2ms pred gate=device Token # 238: 3.744ms; value: next_token_ids=tensor([369], device='cuda:0') mtp accept=1 prop=369 top1=369 accp=0.999 next=pair draft=223 prop=223 pred gate=device Token # 239: 114.459ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=31444 prop=31444 olap pair=109.2ms serial=194.0ms gain=84.8ms ratio=0.44 s0=4.4ms s1=189.6ms wait=0.1/47.2ms pred gate=device Token # 240: 3.716ms; value: next_token_ids=tensor([31444], device='cuda:0') mtp accept=1 prop=31444 top1=31444 accp=1.000 next=pair draft=1492 prop=1492 pred gate=device Token # 241: 115.105ms; value: next_token_ids=tensor([1492], device='cuda:0') mtp accept=1 prop=1492 top1=1492 accp=0.988 next=draft=223 prop=223 olap pair=109.8ms serial=195.2ms gain=85.4ms ratio=0.44 s0=4.4ms s1=190.8ms wait=0.1/47.2ms pred gate=device Token # 242: 3.778ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=5769 prop=5769 pred gate=device Token # 243: 115.643ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=1.000 next=draft=21 prop=21 olap pair=110.4ms serial=195.9ms gain=85.5ms ratio=0.44 s0=4.3ms s1=191.6ms wait=0.1/47.6ms pred gate=device Token # 244: 3.753ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=pair draft=35015 prop=35015 pred gate=device Token # 245: 114.486ms; value: next_token_ids=tensor([35015], device='cuda:0') mtp accept=1 prop=35015 top1=35015 accp=1.000 next=draft=223 prop=223 olap pair=109.3ms serial=194.1ms gain=84.8ms ratio=0.44 s0=3.9ms s1=190.1ms wait=0.1/48.3ms pred gate=device Token # 246: 3.753ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=5198 prop=5198 pred gate=device Token # 247: 114.884ms; value: next_token_ids=tensor([5198], device='cuda:0') mtp accept=1 prop=5198 top1=5198 accp=1.000 next=draft=16 prop=16 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=3.9ms s1=190.6ms wait=0.1/48.4ms pred gate=device Token # 248: 3.743ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 249: 114.907ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=7 prop=7 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=3.9ms s1=190.9ms wait=0.1/48.5ms pred gate=device Token # 250: 3.753ms; value: next_token_ids=tensor([7], device='cuda:0') mtp accept=1 prop=7 top1=7 accp=1.000 next=pair draft=25830 prop=25830 pred gate=device Token # 251: 114.372ms; value: next_token_ids=tensor([25830], device='cuda:0') mtp accept=1 prop=25830 top1=25830 accp=1.000 next=draft=12 prop=12 olap pair=109.2ms serial=193.8ms gain=84.6ms ratio=0.44 s0=4.2ms s1=189.6ms wait=0.1/48.0ms pred gate=device Token # 252: 3.718ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=666 accp=0.306 next=pair draft=8563 prop=8563 pred gate=device Token # 253: 114.702ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=0 prop=8563 top1=1237 accp=0.094 next=draft=33605 prop=33605 olap pair=109.5ms serial=194.1ms gain=84.6ms ratio=0.44 s0=6.0ms s1=188.2ms wait=0.2/45.8ms pred gate=device Token # 254: 114.853ms; value: next_token_ids=tensor([33605], device='cuda:0') mtp accept=1 prop=33605 top1=33605 accp=0.786 next=draft=76207 prop=76207 olap pair=109.6ms serial=194.6ms gain=85.1ms ratio=0.44 s0=3.8ms s1=190.8ms wait=0.1/48.5ms pred gate=device Token # 255: 3.762ms; value: next_token_ids=tensor([8738], device='cuda:0') mtp accept=0 prop=76207 top1=8738 accp=0.001 next=pair draft=76207 prop=76207 pred gate=device Token # 256: 114.870ms; value: next_token_ids=tensor([4498], device='cuda:0') mtp accept=0 prop=76207 top1=4498 accp=0.169 next=draft=76207 prop=76207 olap pair=109.5ms serial=193.8ms gain=84.3ms ratio=0.44 s0=4.2ms s1=189.6ms wait=0.1/47.8ms pred gate=device Token # 257: 114.579ms; value: next_token_ids=tensor([76207], device='cuda:0') mtp accept=1 prop=76207 top1=76207 accp=1.000 next=draft=303 prop=303 olap pair=109.2ms serial=194.0ms gain=84.8ms ratio=0.44 s0=3.9ms s1=190.1ms wait=0.1/48.3ms pred gate=device Token # 258: 3.713ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.880 next=pair draft=41800 prop=49449 pred gate=device Token # 259: 115.532ms; value: next_token_ids=tensor([7000], device='cuda:0') mtp accept=0 prop=49449 top1=7000 accp=0.315 next=draft=14590 prop=14590 olap pair=109.8ms serial=194.8ms gain=85.0ms ratio=0.44 s0=4.4ms s1=190.5ms wait=0.1/47.9ms pred gate=device Token # 260: 114.441ms; value: next_token_ids=tensor([14590], device='cuda:0') mtp accept=1 prop=14590 top1=14590 accp=0.919 next=draft=545 prop=545 olap pair=109.2ms serial=193.6ms gain=84.4ms ratio=0.44 s0=4.2ms s1=189.4ms wait=0.1/47.6ms pred gate=device Token # 261: 3.749ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=545 top1=223 accp=0.061 next=pair draft=939 prop=939 pred gate=device Token # 262: 114.895ms; value: next_token_ids=tensor([939], device='cuda:0') mtp accept=1 prop=939 top1=939 accp=1.000 next=draft=24 prop=24 olap pair=109.7ms serial=194.2ms gain=84.6ms ratio=0.44 s0=6.3ms s1=187.9ms wait=0.2/45.4ms pred gate=device Token # 263: 3.685ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=pair draft=15 prop=15 pred gate=device Token # 264: 114.916ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=3600 prop=3600 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=3.9ms s1=190.6ms wait=0.1/48.2ms pred gate=device Token # 265: 3.775ms; value: next_token_ids=tensor([3600], device='cuda:0') mtp accept=1 prop=3600 top1=3600 accp=1.000 next=pair draft=15 prop=15 pred gate=device Token # 266: 114.940ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=1059 prop=1059 olap pair=109.7ms serial=194.5ms gain=84.8ms ratio=0.44 s0=4.2ms s1=190.3ms wait=0.1/48.0ms pred gate=device Token # 267: 3.703ms; value: next_token_ids=tensor([1059], device='cuda:0') mtp accept=1 prop=1059 top1=1059 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 268: 114.585ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=5769 prop=93130 olap pair=109.4ms serial=194.2ms gain=84.8ms ratio=0.44 s0=4.2ms s1=190.0ms wait=0.1/47.7ms pred gate=device Token # 269: 3.713ms; value: next_token_ids=tensor([93130], device='cuda:0') mtp accept=1 prop=93130 top1=5769 accp=0.816 next=pair draft=90974 prop=90974 pred gate=device Token # 270: 114.710ms; value: next_token_ids=tensor([6742], device='cuda:0') mtp accept=0 prop=90974 top1=6742 accp=0.000 next=draft=223 prop=223 olap pair=109.5ms serial=194.3ms gain=84.9ms ratio=0.44 s0=3.8ms s1=190.5ms wait=0.1/48.6ms pred gate=device Token # 271: 115.364ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=856 prop=856 olap pair=110.0ms serial=195.4ms gain=85.4ms ratio=0.44 s0=3.9ms s1=191.5ms wait=0.1/48.4ms pred gate=device Token # 272: 3.747ms; value: next_token_ids=tensor([856], device='cuda:0') mtp accept=1 prop=856 top1=856 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 273: 115.126ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=14882 prop=14882 olap pair=110.0ms serial=195.4ms gain=85.5ms ratio=0.44 s0=3.9ms s1=191.6ms wait=0.1/48.4ms pred gate=device Token # 274: 3.672ms; value: next_token_ids=tensor([14882], device='cuda:0') mtp accept=1 prop=14882 top1=14882 accp=1.000 next=pair draft=28828 prop=17840 pred gate=device Token # 275: 114.923ms; value: next_token_ids=tensor([17840], device='cuda:0') mtp accept=1 prop=17840 top1=28828 accp=0.938 next=draft=17 prop=17 olap pair=109.7ms serial=194.9ms gain=85.3ms ratio=0.44 s0=3.9ms s1=191.1ms wait=0.1/48.5ms pred gate=device Token # 276: 3.838ms; value: next_token_ids=tensor([17], device='cuda:0') mtp accept=1 prop=17 top1=17 accp=0.965 next=pair draft=11274 prop=11274 pred gate=device Token # 277: 114.766ms; value: next_token_ids=tensor([11274], device='cuda:0') mtp accept=1 prop=11274 top1=11274 accp=1.000 next=draft=303 prop=303 olap pair=109.6ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.2ms s1=190.1ms wait=0.1/47.6ms pred gate=device Token # 278: 3.716ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=36060 prop=36060 pred gate=device Token # 279: 115.461ms; value: next_token_ids=tensor([36060], device='cuda:0') mtp accept=1 prop=36060 top1=36060 accp=1.000 next=draft=31 prop=31 olap pair=110.3ms serial=194.8ms gain=84.5ms ratio=0.43 s0=4.4ms s1=190.4ms wait=0.1/47.7ms pred gate=device Token # 280: 3.809ms; value: next_token_ids=tensor([31], device='cuda:0') mtp accept=1 prop=31 top1=31 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 281: 115.252ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=223 prop=223 olap pair=110.0ms serial=195.1ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.8ms wait=0.1/47.5ms pred gate=device Token # 282: 3.759ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=4003 prop=4003 pred gate=device Token # 283: 115.071ms; value: next_token_ids=tensor([4003], device='cuda:0') mtp accept=1 prop=4003 top1=4003 accp=1.000 next=draft=223 prop=223 olap pair=109.8ms serial=195.2ms gain=85.4ms ratio=0.44 s0=3.9ms s1=191.2ms wait=0.1/48.4ms pred gate=device Token # 284: 3.731ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=31444 prop=31444 pred gate=device Token # 285: 115.775ms; value: next_token_ids=tensor([31444], device='cuda:0') mtp accept=1 prop=31444 top1=31444 accp=1.000 next=draft=223 prop=223 olap pair=110.5ms serial=195.0ms gain=84.5ms ratio=0.43 s0=8.2ms s1=186.8ms wait=0.2/43.2ms pred gate=device Token # 286: 3.746ms; value: next_token_ids=tensor([17], device='cuda:0') mtp accept=0 prop=223 top1=223 accp=0.729 next=pair draft=5769 prop=5769 pred gate=device Token # 287: 115.442ms; value: next_token_ids=tensor([5769], device='cuda:0') mtp accept=1 prop=5769 top1=5769 accp=1.000 next=draft=21 prop=21 olap pair=110.1ms serial=195.5ms gain=85.5ms ratio=0.44 s0=4.0ms s1=191.5ms wait=0.1/48.1ms pred gate=device Token # 288: 3.685ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 289: 114.861ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=1293 prop=1293 olap pair=109.5ms serial=194.4ms gain=85.0ms ratio=0.44 s0=3.9ms s1=190.5ms wait=0.1/48.4ms pred gate=device Token # 290: 3.720ms; value: next_token_ids=tensor([1293], device='cuda:0') mtp accept=1 prop=1293 top1=1293 accp=1.000 next=pair draft=83340 prop=83340 pred gate=device Token # 291: 114.988ms; value: next_token_ids=tensor([83340], device='cuda:0') mtp accept=1 prop=83340 top1=83340 accp=0.913 next=draft=23668 prop=23668 olap pair=109.6ms serial=193.2ms gain=83.6ms ratio=0.43 s0=5.8ms s1=187.4ms wait=0.2/45.9ms pred gate=device Token # 292: 3.721ms; value: next_token_ids=tensor([23668], device='cuda:0') mtp accept=1 prop=23668 top1=23668 accp=1.000 next=pair draft=31826 prop=1959 pred gate=device Token # 293: 115.282ms; value: next_token_ids=tensor([1959], device='cuda:0') mtp accept=1 prop=1959 top1=1959 accp=0.619 next=draft=24292 prop=4498 olap pair=110.1ms serial=195.7ms gain=85.6ms ratio=0.44 s0=3.9ms s1=191.8ms wait=0.1/48.4ms pred gate=device Token # 294: 3.701ms; value: next_token_ids=tensor([4498], device='cuda:0') mtp accept=1 prop=4498 top1=4498 accp=0.121 next=pair draft=18317 prop=18317 pred gate=device Token # 295: 115.092ms; value: next_token_ids=tensor([18317], device='cuda:0') mtp accept=1 prop=18317 top1=18317 accp=0.628 next=draft=5960 prop=5960 olap pair=109.9ms serial=195.2ms gain=85.3ms ratio=0.44 s0=4.1ms s1=191.1ms wait=0.1/48.0ms pred gate=device Token # 296: 3.796ms; value: next_token_ids=tensor([5960], device='cuda:0') mtp accept=1 prop=5960 top1=5960 accp=1.000 next=pair draft=66307 prop=66307 pred gate=device Token # 297: 115.109ms; value: next_token_ids=tensor([66307], device='cuda:0') mtp accept=1 prop=66307 top1=66307 accp=1.000 next=draft=303 prop=303 olap pair=109.9ms serial=195.2ms gain=85.3ms ratio=0.44 s0=4.0ms s1=191.2ms wait=0.1/48.1ms pred gate=device Token # 298: 3.716ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.999 next=pair draft=728 prop=17369 pred gate=device Token # 299: 114.800ms; value: next_token_ids=tensor([17369], device='cuda:0') mtp accept=1 prop=17369 top1=1530 accp=0.274 next=draft=17899 prop=17899 olap pair=109.6ms serial=194.3ms gain=84.7ms ratio=0.44 s0=4.0ms s1=190.3ms wait=0.1/48.4ms pred gate=device Token # 300: 3.723ms; value: next_token_ids=tensor([17899], device='cuda:0') mtp accept=1 prop=17899 top1=17899 accp=0.995 next=pair draft=19476 prop=19476 pred gate=device Token # 301: 114.769ms; value: next_token_ids=tensor([19476], device='cuda:0') mtp accept=1 prop=19476 top1=19476 accp=0.820 next=draft=303 prop=303 olap pair=109.6ms serial=194.5ms gain=84.9ms ratio=0.44 s0=3.9ms s1=190.6ms wait=0.1/48.4ms pred gate=device Token # 302: 3.717ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.980 next=pair draft=123161 prop=123161 pred gate=device Token # 303: 115.605ms; value: next_token_ids=tensor([2369], device='cuda:0') mtp accept=0 prop=123161 top1=2369 accp=0.047 next=draft=1530 prop=1530 olap pair=110.4ms serial=195.1ms gain=84.7ms ratio=0.43 s0=4.1ms s1=191.1ms wait=0.1/48.2ms pred gate=device Token # 304: 115.813ms; value: next_token_ids=tensor([1530], device='cuda:0') mtp accept=1 prop=1530 top1=2556 accp=0.192 next=draft=117813 prop=117813 olap pair=110.5ms serial=196.4ms gain=85.8ms ratio=0.44 s0=4.2ms s1=192.2ms wait=0.1/47.8ms pred gate=device Token # 305: 3.751ms; value: next_token_ids=tensor([117813], device='cuda:0') mtp accept=1 prop=117813 top1=117813 accp=0.991 next=pair draft=5293 prop=5293 pred gate=device Token # 306: 114.995ms; value: next_token_ids=tensor([5293], device='cuda:0') mtp accept=1 prop=5293 top1=5293 accp=0.899 next=draft=4339 prop=4339 olap pair=109.7ms serial=194.6ms gain=84.9ms ratio=0.44 s0=4.4ms s1=190.3ms wait=0.1/47.3ms pred gate=device Token # 307: 3.748ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=0.948 next=pair draft=2827 prop=2827 pred gate=device Token # 308: 115.603ms; value: next_token_ids=tensor([2827], device='cuda:0') mtp accept=1 prop=2827 top1=2827 accp=0.982 next=draft=320 prop=320 olap pair=110.3ms serial=193.6ms gain=83.3ms ratio=0.43 s0=4.2ms s1=189.3ms wait=0.1/48.0ms pred gate=device Token # 309: 3.721ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=30869 prop=30869 pred gate=device Token # 310: 114.874ms; value: next_token_ids=tensor([30869], device='cuda:0') mtp accept=1 prop=30869 top1=30869 accp=0.961 next=draft=4618 prop=4618 olap pair=109.6ms serial=194.8ms gain=85.2ms ratio=0.44 s0=3.9ms s1=190.9ms wait=0.1/48.3ms pred gate=device Token # 311: 3.778ms; value: next_token_ids=tensor([4618], device='cuda:0') mtp accept=1 prop=4618 top1=3608 accp=0.343 next=pair draft=10626 prop=10626 pred gate=device Token # 312: 114.583ms; value: next_token_ids=tensor([10626], device='cuda:0') mtp accept=1 prop=10626 top1=10626 accp=1.000 next=draft=548 prop=548 olap pair=109.3ms serial=193.9ms gain=84.6ms ratio=0.44 s0=4.2ms s1=189.7ms wait=0.1/47.6ms pred gate=device Token # 313: 3.791ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=1 prop=548 top1=548 accp=0.931 next=pair draft=10070 prop=10070 pred gate=device Token # 314: 115.120ms; value: next_token_ids=tensor([10070], device='cuda:0') mtp accept=1 prop=10070 top1=10070 accp=0.999 next=draft=10626 prop=10626 olap pair=109.8ms serial=194.9ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.6ms wait=0.1/47.3ms pred gate=device Token # 315: 3.703ms; value: next_token_ids=tensor([10626], device='cuda:0') mtp accept=1 prop=10626 top1=10626 accp=0.987 next=pair draft=49764 prop=49764 pred gate=device Token # 316: 114.661ms; value: next_token_ids=tensor([49764], device='cuda:0') mtp accept=1 prop=49764 top1=49764 accp=0.971 next=draft=1481 prop=1481 olap pair=109.4ms serial=193.4ms gain=84.1ms ratio=0.43 s0=7.5ms s1=185.9ms wait=0.2/44.0ms pred gate=device Token # 317: 3.805ms; value: next_token_ids=tensor([1481], device='cuda:0') mtp accept=1 prop=1481 top1=1481 accp=0.873 next=pair draft=3592 prop=3592 pred gate=device Token # 318: 114.770ms; value: next_token_ids=tensor([111934], device='cuda:0') mtp accept=0 prop=3592 top1=111934 accp=0.052 next=draft=7018 prop=7018 olap pair=109.5ms serial=194.5ms gain=85.0ms ratio=0.44 s0=4.2ms s1=190.3ms wait=0.1/47.7ms pred gate=device Token # 319: 115.086ms; value: next_token_ids=tensor([3500], device='cuda:0') mtp accept=0 prop=7018 top1=3500 accp=0.001 next=draft=55074 prop=55074 olap pair=109.7ms serial=194.6ms gain=84.9ms ratio=0.44 s0=5.2ms s1=189.4ms wait=0.2/46.7ms pred gate=device Token # 320: 114.778ms; value: next_token_ids=tensor([55074], device='cuda:0') mtp accept=1 prop=55074 top1=55074 accp=0.823 next=draft=525 prop=525 olap pair=109.4ms serial=194.1ms gain=84.7ms ratio=0.44 s0=3.9ms s1=190.2ms wait=0.1/48.3ms pred gate=device Token # 321: 3.728ms; value: next_token_ids=tensor([525], device='cuda:0') mtp accept=1 prop=525 top1=525 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 322: 115.076ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=4569 prop=4569 olap pair=109.9ms serial=195.2ms gain=85.3ms ratio=0.44 s0=4.4ms s1=190.8ms wait=0.1/47.3ms pred gate=device Token # 323: 3.809ms; value: next_token_ids=tensor([4569], device='cuda:0') mtp accept=1 prop=4569 top1=4569 accp=0.966 next=pair draft=12519 prop=12519 pred gate=device Token # 324: 114.596ms; value: next_token_ids=tensor([12519], device='cuda:0') mtp accept=1 prop=12519 top1=12519 accp=0.960 next=draft=223 prop=301 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.4ms s1=189.9ms wait=0.1/47.3ms pred gate=device Token # 325: 3.777ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=301 top1=223 accp=0.698 next=pair draft=26139 prop=26139 pred gate=device Token # 326: 114.992ms; value: next_token_ids=tensor([26139], device='cuda:0') mtp accept=1 prop=26139 top1=26139 accp=1.000 next=draft=223 prop=223 olap pair=109.6ms serial=194.8ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.5ms wait=0.1/47.6ms pred gate=device Token # 327: 3.774ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=1923 prop=1923 pred gate=device Token # 328: 114.419ms; value: next_token_ids=tensor([1923], device='cuda:0') mtp accept=1 prop=1923 top1=1923 accp=0.979 next=draft=60893 prop=60893 olap pair=109.2ms serial=194.0ms gain=84.8ms ratio=0.44 s0=3.8ms s1=190.2ms wait=0.1/48.6ms pred gate=device Token # 329: 3.725ms; value: next_token_ids=tensor([60893], device='cuda:0') mtp accept=1 prop=60893 top1=60893 accp=1.000 next=pair draft=25024 prop=25024 pred gate=device Token # 330: 115.253ms; value: next_token_ids=tensor([25024], device='cuda:0') mtp accept=1 prop=25024 top1=25024 accp=0.593 next=draft=301 prop=301 olap pair=110.0ms serial=195.5ms gain=85.5ms ratio=0.44 s0=3.9ms s1=191.7ms wait=0.1/48.6ms pred gate=device Token # 331: 3.784ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=1.000 next=pair draft=10301 prop=10301 pred gate=device Token # 332: 115.110ms; value: next_token_ids=tensor([10301], device='cuda:0') mtp accept=1 prop=10301 top1=10301 accp=0.995 next=draft=85449 prop=85449 olap pair=109.8ms serial=195.2ms gain=85.4ms ratio=0.44 s0=3.7ms s1=191.4ms wait=0.1/48.6ms pred gate=device Token # 333: 3.872ms; value: next_token_ids=tensor([66450], device='cuda:0') mtp accept=0 prop=85449 top1=66450 accp=0.079 next=pair draft=478 prop=478 pred gate=device Token # 334: 115.067ms; value: next_token_ids=tensor([49754], device='cuda:0') mtp accept=0 prop=478 top1=49754 accp=0.075 next=draft=20495 prop=20495 olap pair=109.7ms serial=194.4ms gain=84.6ms ratio=0.44 s0=4.2ms s1=190.2ms wait=0.1/47.7ms pred gate=device Token # 335: 117.017ms; value: next_token_ids=tensor([88353], device='cuda:0') mtp accept=0 prop=20495 top1=88353 accp=0.171 next=draft=478 prop=478 olap pair=110.7ms serial=196.0ms gain=85.3ms ratio=0.44 s0=7.4ms s1=188.6ms wait=0.2/44.4ms pred gate=device Token # 336: 114.981ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=draft=8353 prop=8353 olap pair=109.5ms serial=193.3ms gain=83.8ms ratio=0.43 s0=8.8ms s1=184.5ms wait=0.2/42.6ms pred gate=device Token # 337: 3.801ms; value: next_token_ids=tensor([8353], device='cuda:0') mtp accept=1 prop=8353 top1=8353 accp=0.829 next=pair draft=303 prop=303 pred gate=device Token # 338: 115.411ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=23273 prop=23273 olap pair=110.2ms serial=195.6ms gain=85.4ms ratio=0.44 s0=4.2ms s1=191.3ms wait=0.1/47.6ms pred gate=device Token # 339: 3.700ms; value: next_token_ids=tensor([98422], device='cuda:0') mtp accept=0 prop=23273 top1=98422 accp=0.383 next=pair draft=43411 prop=43411 pred gate=device Token # 340: 115.043ms; value: next_token_ids=tensor([43411], device='cuda:0') mtp accept=1 prop=43411 top1=18317 accp=0.164 next=draft=4101 prop=4101 olap pair=109.8ms serial=194.7ms gain=85.0ms ratio=0.44 s0=4.3ms s1=190.4ms wait=0.1/47.3ms pred gate=device Token # 341: 3.734ms; value: next_token_ids=tensor([4101], device='cuda:0') mtp accept=1 prop=4101 top1=4101 accp=1.000 next=pair draft=18317 prop=18317 pred gate=device Token # 342: 114.444ms; value: next_token_ids=tensor([18317], device='cuda:0') mtp accept=1 prop=18317 top1=18317 accp=1.000 next=draft=547 prop=547 olap pair=109.2ms serial=193.7ms gain=84.5ms ratio=0.44 s0=4.3ms s1=189.4ms wait=0.1/47.3ms pred gate=device Token # 343: 3.763ms; value: next_token_ids=tensor([547], device='cuda:0') mtp accept=1 prop=547 top1=547 accp=1.000 next=pair draft=8842 prop=8842 pred gate=device Token # 344: 115.945ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=0.992 next=draft=303 prop=303 olap pair=110.2ms serial=195.2ms gain=85.1ms ratio=0.44 s0=5.4ms s1=189.9ms wait=0.2/46.4ms pred gate=device Token # 345: 3.788ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=7601 prop=7601 pred gate=device Token # 346: 115.437ms; value: next_token_ids=tensor([7601], device='cuda:0') mtp accept=1 prop=7601 top1=7601 accp=1.000 next=draft=428 prop=428 olap pair=110.2ms serial=195.1ms gain=84.9ms ratio=0.44 s0=4.2ms s1=190.8ms wait=0.1/47.7ms pred gate=device Token # 347: 3.741ms; value: next_token_ids=tensor([1255], device='cuda:0') mtp accept=0 prop=428 top1=1255 accp=0.171 next=pair draft=428 prop=428 pred gate=device Token # 348: 114.687ms; value: next_token_ids=tensor([428], device='cuda:0') mtp accept=1 prop=428 top1=428 accp=1.000 next=draft=112016 prop=112016 olap pair=109.3ms serial=193.9ms gain=84.5ms ratio=0.44 s0=4.2ms s1=189.6ms wait=0.1/47.6ms pred gate=device Token # 349: 3.782ms; value: next_token_ids=tensor([112016], device='cuda:0') mtp accept=1 prop=112016 top1=112016 accp=1.000 next=pair draft=24268 prop=24268 pred gate=device Token # 350: 114.867ms; value: next_token_ids=tensor([24268], device='cuda:0') mtp accept=1 prop=24268 top1=24268 accp=1.000 next=draft=430 prop=430 olap pair=109.5ms serial=194.3ms gain=84.8ms ratio=0.44 s0=4.2ms s1=190.2ms wait=0.1/47.9ms pred gate=device Token # 351: 3.774ms; value: next_token_ids=tensor([430], device='cuda:0') mtp accept=1 prop=430 top1=430 accp=1.000 next=pair draft=548 prop=548 pred gate=device Token # 352: 114.731ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=1 prop=548 top1=548 accp=0.523 next=draft=32652 prop=32652 olap pair=109.5ms serial=194.4ms gain=85.0ms ratio=0.44 s0=3.9ms s1=190.5ms wait=0.1/48.3ms pred gate=device Token # 353: 3.863ms; value: next_token_ids=tensor([32652], device='cuda:0') mtp accept=1 prop=32652 top1=32652 accp=0.982 next=pair draft=6221 prop=6221 pred gate=device Token # 354: 115.479ms; value: next_token_ids=tensor([6221], device='cuda:0') mtp accept=1 prop=6221 top1=6221 accp=1.000 next=draft=2073 prop=2073 olap pair=110.3ms serial=195.9ms gain=85.7ms ratio=0.44 s0=4.4ms s1=191.5ms wait=0.1/47.1ms pred gate=device Token # 355: 3.768ms; value: next_token_ids=tensor([2073], device='cuda:0') mtp accept=1 prop=2073 top1=2073 accp=1.000 next=pair draft=55779 prop=55779 pred gate=device Token # 356: 115.509ms; value: next_token_ids=tensor([55779], device='cuda:0') mtp accept=1 prop=55779 top1=55779 accp=1.000 next=draft=478 prop=478 olap pair=110.2ms serial=195.9ms gain=85.7ms ratio=0.44 s0=4.4ms s1=191.5ms wait=0.1/47.2ms pred gate=device Token # 357: 3.788ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=pair draft=666 prop=666 pred gate=device Token # 358: 115.036ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=0.996 next=draft=89276 prop=89276 olap pair=109.8ms serial=195.1ms gain=85.3ms ratio=0.44 s0=4.2ms s1=190.9ms wait=0.1/47.8ms pred gate=device Token # 359: 3.694ms; value: next_token_ids=tensor([31487], device='cuda:0') mtp accept=0 prop=89276 top1=31487 accp=0.082 next=pair draft=768 prop=768 pred gate=device Token # 360: 115.213ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=18317 accp=0.444 next=draft=112016 prop=112016 olap pair=109.8ms serial=194.4ms gain=84.6ms ratio=0.44 s0=6.2ms s1=188.3ms wait=0.2/45.7ms pred gate=device Token # 361: 3.788ms; value: next_token_ids=tensor([112016], device='cuda:0') mtp accept=1 prop=112016 top1=112016 accp=1.000 next=pair draft=24268 prop=24268 pred gate=device Token # 362: 115.134ms; value: next_token_ids=tensor([24268], device='cuda:0') mtp accept=1 prop=24268 top1=24268 accp=1.000 next=draft=303 prop=303 olap pair=109.9ms serial=195.0ms gain=85.1ms ratio=0.44 s0=4.1ms s1=190.9ms wait=0.1/47.8ms pred gate=device Token # 363: 3.681ms; value: next_token_ids=tensor([947], device='cuda:0') mtp accept=0 prop=303 top1=947 accp=0.580 next=pair draft=32652 prop=32652 pred gate=device Token # 364: 115.259ms; value: next_token_ids=tensor([32652], device='cuda:0') mtp accept=1 prop=32652 top1=32652 accp=0.997 next=draft=6221 prop=6221 olap pair=110.0ms serial=195.3ms gain=85.3ms ratio=0.44 s0=4.8ms s1=190.5ms wait=0.1/47.2ms pred gate=device Token # 365: 3.675ms; value: next_token_ids=tensor([6221], device='cuda:0') mtp accept=1 prop=6221 top1=6221 accp=1.000 next=pair draft=2073 prop=2073 pred gate=device Token # 366: 115.236ms; value: next_token_ids=tensor([2073], device='cuda:0') mtp accept=1 prop=2073 top1=2073 accp=1.000 next=draft=5866 prop=5866 olap pair=109.9ms serial=194.9ms gain=85.0ms ratio=0.44 s0=4.0ms s1=190.9ms wait=0.1/48.3ms pred gate=device Token # 367: 3.743ms; value: next_token_ids=tensor([5866], device='cuda:0') mtp accept=1 prop=5866 top1=5866 accp=1.000 next=pair draft=939 prop=939 pred gate=device Token # 368: 116.080ms; value: next_token_ids=tensor([939], device='cuda:0') mtp accept=1 prop=939 top1=939 accp=0.993 next=draft=22 prop=22 olap pair=110.8ms serial=196.2ms gain=85.4ms ratio=0.44 s0=4.1ms s1=192.1ms wait=0.1/48.3ms pred gate=device Token # 369: 3.773ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=pair draft=695 prop=695 pred gate=device Token # 370: 115.149ms; value: next_token_ids=tensor([695], device='cuda:0') mtp accept=1 prop=695 top1=695 accp=0.994 next=draft=736 prop=736 olap pair=109.8ms serial=194.7ms gain=84.9ms ratio=0.44 s0=4.1ms s1=190.6ms wait=0.1/48.4ms pred gate=device Token # 371: 3.762ms; value: next_token_ids=tensor([736], device='cuda:0') mtp accept=1 prop=736 top1=736 accp=1.000 next=pair draft=1266 prop=1266 pred gate=device Token # 372: 115.712ms; value: next_token_ids=tensor([1266], device='cuda:0') mtp accept=1 prop=1266 top1=1266 accp=1.000 next=draft=303 prop=303 olap pair=110.4ms serial=196.2ms gain=85.8ms ratio=0.44 s0=3.8ms s1=192.4ms wait=0.1/48.6ms pred gate=device Token # 373: 3.757ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=531 prop=531 pred gate=device Token # 374: 115.233ms; value: next_token_ids=tensor([531], device='cuda:0') mtp accept=1 prop=531 top1=531 accp=1.000 next=draft=18381 prop=18381 olap pair=109.9ms serial=195.4ms gain=85.4ms ratio=0.44 s0=3.7ms s1=191.6ms wait=0.1/48.6ms pred gate=device Token # 375: 3.776ms; value: next_token_ids=tensor([18381], device='cuda:0') mtp accept=1 prop=18381 top1=18381 accp=1.000 next=pair draft=69043 prop=69043 pred gate=device Token # 376: 115.304ms; value: next_token_ids=tensor([69043], device='cuda:0') mtp accept=1 prop=69043 top1=69043 accp=0.999 next=draft=38775 prop=38775 olap pair=110.0ms serial=195.2ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.9ms wait=0.1/47.6ms pred gate=device Token # 377: 3.776ms; value: next_token_ids=tensor([38775], device='cuda:0') mtp accept=1 prop=38775 top1=38775 accp=0.913 next=pair draft=471 prop=471 pred gate=device Token # 378: 115.157ms; value: next_token_ids=tensor([471], device='cuda:0') mtp accept=1 prop=471 top1=471 accp=0.995 next=draft=1457 prop=1457 olap pair=109.8ms serial=195.0ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.7ms wait=0.1/47.5ms pred gate=device Token # 379: 3.777ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 380: 115.294ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.999 next=draft=2126 prop=2126 olap pair=110.0ms serial=195.4ms gain=85.4ms ratio=0.44 s0=4.4ms s1=191.0ms wait=0.1/47.8ms pred gate=device Token # 381: 3.790ms; value: next_token_ids=tensor([114219], device='cuda:0') mtp accept=0 prop=2126 top1=2126 accp=0.902 next=pair draft=41727 prop=41727 pred gate=device Token # 382: 114.895ms; value: next_token_ids=tensor([41727], device='cuda:0') mtp accept=1 prop=41727 top1=41727 accp=0.996 next=draft=2206 prop=2206 olap pair=109.7ms serial=194.8ms gain=85.2ms ratio=0.44 s0=3.9ms s1=190.9ms wait=0.1/48.3ms pred gate=device Token # 383: 3.691ms; value: next_token_ids=tensor([2206], device='cuda:0') mtp accept=1 prop=2206 top1=2206 accp=0.999 next=pair draft=31511 prop=31511 pred gate=device Token # 384: 115.359ms; value: next_token_ids=tensor([9048], device='cuda:0') mtp accept=0 prop=31511 top1=9048 accp=0.409 next=draft=389 prop=389 olap pair=110.2ms serial=195.7ms gain=85.5ms ratio=0.44 s0=3.9ms s1=191.8ms wait=0.1/48.3ms pred gate=device Token # 385: 117.514ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=0.902 next=draft=2619 prop=2619 olap pair=111.3ms serial=197.6ms gain=86.3ms ratio=0.44 s0=5.5ms s1=192.1ms wait=0.2/46.5ms pred gate=device Token # 386: 4.203ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.993 next=pair draft=1457 prop=1457 pred gate=device Token # 387: 114.711ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=draft=2426 prop=2426 olap pair=109.5ms serial=194.1ms gain=84.6ms ratio=0.44 s0=4.8ms s1=189.3ms wait=0.1/47.0ms pred gate=device Token # 388: 3.741ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=2426 top1=223 accp=0.311 next=pair draft=2426 prop=2426 pred gate=device Token # 389: 115.192ms; value: next_token_ids=tensor([2426], device='cuda:0') mtp accept=1 prop=2426 top1=2426 accp=1.000 next=draft=2126 prop=2126 olap pair=109.9ms serial=195.2ms gain=85.3ms ratio=0.44 s0=4.4ms s1=190.8ms wait=0.1/47.1ms pred gate=device Token # 390: 3.751ms; value: next_token_ids=tensor([2126], device='cuda:0') mtp accept=1 prop=2126 top1=2126 accp=1.000 next=pair draft=2971 prop=2971 pred gate=device Token # 391: 114.648ms; value: next_token_ids=tensor([2971], device='cuda:0') mtp accept=1 prop=2971 top1=2971 accp=1.000 next=draft=940 prop=940 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.0ms s1=190.4ms wait=0.1/48.3ms pred gate=device Token # 392: 3.718ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=0 prop=940 top1=666 accp=0.097 next=pair draft=223 prop=223 pred gate=device Token # 393: 115.252ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.999 next=draft=1107 prop=17906 olap pair=109.9ms serial=195.5ms gain=85.5ms ratio=0.44 s0=3.8ms s1=191.7ms wait=0.1/48.7ms pred gate=device Token # 394: 3.747ms; value: next_token_ids=tensor([12065], device='cuda:0') mtp accept=0 prop=17906 top1=1107 accp=0.570 next=pair draft=2619 prop=2619 pred gate=device Token # 395: 115.040ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=draft=1457 prop=1457 olap pair=109.8ms serial=195.2ms gain=85.4ms ratio=0.44 s0=3.9ms s1=191.3ms wait=0.1/48.3ms pred gate=device Token # 396: 3.687ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 397: 115.947ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=1923 prop=1923 olap pair=110.7ms serial=195.2ms gain=84.5ms ratio=0.43 s0=4.2ms s1=191.0ms wait=0.1/48.2ms pred gate=device Token # 398: 3.726ms; value: next_token_ids=tensor([1923], device='cuda:0') mtp accept=1 prop=1923 top1=1923 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 399: 115.323ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=2111 prop=2111 olap pair=110.0ms serial=194.6ms gain=84.6ms ratio=0.43 s0=5.1ms s1=189.6ms wait=0.1/46.9ms pred gate=device Token # 400: 3.766ms; value: next_token_ids=tensor([2111], device='cuda:0') mtp accept=1 prop=2111 top1=2111 accp=1.000 next=pair draft=27962 prop=27962 pred gate=device Token # 401: 115.373ms; value: next_token_ids=tensor([8948], device='cuda:0') mtp accept=0 prop=27962 top1=8948 accp=0.358 next=draft=81054 prop=81054 olap pair=110.1ms serial=195.5ms gain=85.4ms ratio=0.44 s0=4.3ms s1=191.2ms wait=0.1/47.5ms pred gate=device Token # 402: 115.339ms; value: next_token_ids=tensor([81054], device='cuda:0') mtp accept=1 prop=81054 top1=81054 accp=1.000 next=draft=22 prop=22 olap pair=110.0ms serial=194.9ms gain=84.9ms ratio=0.44 s0=4.1ms s1=190.8ms wait=0.1/48.1ms pred gate=device Token # 403: 3.671ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=pair draft=15 prop=15 pred gate=device Token # 404: 115.043ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=223 prop=12715 olap pair=109.9ms serial=193.6ms gain=83.7ms ratio=0.43 s0=6.5ms s1=187.1ms wait=0.2/45.1ms pred gate=device Token # 405: 3.783ms; value: next_token_ids=tensor([12715], device='cuda:0') mtp accept=1 prop=12715 top1=12715 accp=0.096 next=pair draft=18 prop=18 pred gate=device Token # 406: 115.218ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=223 prop=223 olap pair=110.0ms serial=195.0ms gain=85.0ms ratio=0.44 s0=4.0ms s1=191.0ms wait=0.1/48.2ms pred gate=device Token # 407: 3.759ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.997 next=pair draft=26127 prop=26127 pred gate=device Token # 408: 115.236ms; value: next_token_ids=tensor([26127], device='cuda:0') mtp accept=1 prop=26127 top1=26127 accp=1.000 next=draft=666 prop=666 olap pair=110.0ms serial=194.6ms gain=84.6ms ratio=0.43 s0=7.3ms s1=187.3ms wait=0.2/44.3ms pred gate=device Token # 409: 3.777ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=0.989 next=pair draft=320 prop=320 pred gate=device Token # 410: 115.317ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=82708 prop=82708 olap pair=110.0ms serial=195.5ms gain=85.5ms ratio=0.44 s0=4.0ms s1=191.5ms wait=0.1/48.2ms pred gate=device Token # 411: 3.843ms; value: next_token_ids=tensor([34357], device='cuda:0') mtp accept=0 prop=82708 top1=34357 accp=0.296 next=pair draft=82708 prop=82708 pred gate=device Token # 412: 115.680ms; value: next_token_ids=tensor([26996], device='cuda:0') mtp accept=0 prop=82708 top1=26996 accp=0.244 next=draft=545 prop=545 olap pair=110.4ms serial=196.0ms gain=85.6ms ratio=0.44 s0=4.3ms s1=191.7ms wait=0.1/47.5ms pred gate=device Token # 413: 115.602ms; value: next_token_ids=tensor([42515], device='cuda:0') mtp accept=0 prop=545 top1=545 accp=0.796 next=draft=57038 prop=57038 olap pair=110.2ms serial=195.7ms gain=85.5ms ratio=0.44 s0=4.0ms s1=191.7ms wait=0.1/48.2ms pred gate=device Token # 414: 116.024ms; value: next_token_ids=tensor([57038], device='cuda:0') mtp accept=1 prop=57038 top1=15102 accp=0.258 next=draft=18596 prop=18596 olap pair=110.6ms serial=196.5ms gain=85.9ms ratio=0.44 s0=4.0ms s1=192.6ms wait=0.1/48.3ms pred gate=device Token # 415: 4.901ms; value: next_token_ids=tensor([18596], device='cuda:0') mtp accept=1 prop=18596 top1=18596 accp=0.998 next=pair draft=112016 prop=112016 pred gate=device Token # 416: 116.045ms; value: next_token_ids=tensor([112016], device='cuda:0') mtp accept=1 prop=112016 top1=112016 accp=1.000 next=draft=2971 prop=2971 olap pair=110.2ms serial=192.8ms gain=82.6ms ratio=0.43 s0=8.9ms s1=184.0ms wait=0.2/42.2ms pred gate=device Token # 417: 3.772ms; value: next_token_ids=tensor([12676], device='cuda:0') mtp accept=0 prop=2971 top1=12676 accp=0.097 next=pair draft=2619 prop=2619 pred gate=device Token # 418: 115.134ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.994 next=draft=109792 prop=109792 olap pair=109.9ms serial=195.1ms gain=85.3ms ratio=0.44 s0=3.9ms s1=191.2ms wait=0.1/48.5ms pred gate=device Token # 419: 3.819ms; value: next_token_ids=tensor([109792], device='cuda:0') mtp accept=1 prop=109792 top1=109792 accp=0.984 next=pair draft=38 prop=38 pred gate=device Token # 420: 114.952ms; value: next_token_ids=tensor([38], device='cuda:0') mtp accept=1 prop=38 top1=38 accp=1.000 next=draft=3951 prop=3951 olap pair=109.7ms serial=194.9ms gain=85.2ms ratio=0.44 s0=3.8ms s1=191.2ms wait=0.1/48.6ms pred gate=device Token # 421: 3.710ms; value: next_token_ids=tensor([3951], device='cuda:0') mtp accept=1 prop=3951 top1=3951 accp=1.000 next=pair draft=1237 prop=1237 pred gate=device Token # 422: 114.565ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=0.841 next=draft=112016 prop=112016 olap pair=109.4ms serial=194.4ms gain=85.0ms ratio=0.44 s0=4.2ms s1=190.2ms wait=0.1/47.7ms pred gate=device Token # 423: 3.783ms; value: next_token_ids=tensor([112016], device='cuda:0') mtp accept=1 prop=112016 top1=112016 accp=1.000 next=pair draft=24268 prop=24268 pred gate=device Token # 424: 115.025ms; value: next_token_ids=tensor([24268], device='cuda:0') mtp accept=1 prop=24268 top1=24268 accp=1.000 next=draft=1227 prop=303 olap pair=109.8ms serial=195.0ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.7ms wait=0.1/47.6ms pred gate=device Token # 425: 3.754ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=1227 accp=0.948 next=pair draft=75172 prop=75172 pred gate=device Token # 426: 114.675ms; value: next_token_ids=tensor([75172], device='cuda:0') mtp accept=1 prop=75172 top1=75172 accp=1.000 next=draft=35964 prop=35964 olap pair=109.4ms serial=194.3ms gain=84.9ms ratio=0.44 s0=4.2ms s1=190.1ms wait=0.1/47.7ms pred gate=device Token # 427: 3.684ms; value: next_token_ids=tensor([35964], device='cuda:0') mtp accept=1 prop=35964 top1=35964 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 428: 114.878ms; value: next_token_ids=tensor([83649], device='cuda:0') mtp accept=0 prop=223 top1=83649 accp=0.230 next=draft=1227 prop=1227 olap pair=109.6ms serial=194.8ms gain=85.2ms ratio=0.44 s0=3.8ms s1=191.0ms wait=0.1/48.6ms pred gate=device Token # 429: 115.189ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=1 prop=1227 top1=1227 accp=1.000 next=draft=666 prop=666 olap pair=109.8ms serial=195.0ms gain=85.2ms ratio=0.44 s0=3.9ms s1=191.1ms wait=0.1/48.4ms pred gate=device Token # 430: 3.731ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 431: 115.067ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=78441 prop=78441 olap pair=109.8ms serial=195.0ms gain=85.2ms ratio=0.44 s0=3.8ms s1=191.2ms wait=0.1/48.8ms pred gate=device Token # 432: 3.740ms; value: next_token_ids=tensor([78441], device='cuda:0') mtp accept=1 prop=78441 top1=114434 accp=0.282 next=pair draft=112016 prop=112016 pred gate=device Token # 433: 115.426ms; value: next_token_ids=tensor([112016], device='cuda:0') mtp accept=1 prop=112016 top1=112016 accp=0.999 next=draft=1147 prop=1147 olap pair=110.2ms serial=195.8ms gain=85.6ms ratio=0.44 s0=3.8ms s1=192.0ms wait=0.1/48.6ms pred gate=device Token # 434: 3.740ms; value: next_token_ids=tensor([15227], device='cuda:0') mtp accept=0 prop=1147 top1=15227 accp=0.171 next=pair draft=320 prop=320 pred gate=device Token # 435: 115.446ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.919 next=draft=2920 prop=2920 olap pair=110.1ms serial=195.8ms gain=85.7ms ratio=0.44 s0=3.9ms s1=191.9ms wait=0.1/48.5ms pred gate=device Token # 436: 3.780ms; value: next_token_ids=tensor([2920], device='cuda:0') mtp accept=1 prop=2920 top1=2920 accp=0.930 next=pair draft=15227 prop=15227 pred gate=device Token # 437: 115.511ms; value: next_token_ids=tensor([15227], device='cuda:0') mtp accept=1 prop=15227 top1=15227 accp=0.950 next=draft=8738 prop=8738 olap pair=110.2ms serial=195.8ms gain=85.6ms ratio=0.44 s0=4.1ms s1=191.7ms wait=0.1/47.9ms pred gate=device Token # 438: 3.751ms; value: next_token_ids=tensor([8738], device='cuda:0') mtp accept=1 prop=8738 top1=8738 accp=1.000 next=pair draft=748 prop=1300 pred gate=device Token # 439: 114.967ms; value: next_token_ids=tensor([1300], device='cuda:0') mtp accept=1 prop=1300 top1=1300 accp=0.418 next=draft=748 prop=748 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.5ms s1=190.4ms wait=0.1/47.2ms pred gate=device Token # 440: 3.754ms; value: next_token_ids=tensor([748], device='cuda:0') mtp accept=1 prop=748 top1=748 accp=0.995 next=pair draft=1965 prop=1965 pred gate=device Token # 441: 115.177ms; value: next_token_ids=tensor([1965], device='cuda:0') mtp accept=1 prop=1965 top1=1965 accp=0.836 next=draft=301 prop=301 olap pair=109.8ms serial=195.0ms gain=85.2ms ratio=0.44 s0=4.4ms s1=190.6ms wait=0.1/47.1ms pred gate=device Token # 442: 3.778ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=0.619 next=pair draft=2619 prop=2619 pred gate=device Token # 443: 115.769ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=0.997 next=draft=109792 prop=109792 olap pair=110.5ms serial=196.2ms gain=85.7ms ratio=0.44 s0=4.4ms s1=191.8ms wait=0.1/47.3ms pred gate=device Token # 444: 3.817ms; value: next_token_ids=tensor([109792], device='cuda:0') mtp accept=1 prop=109792 top1=109792 accp=1.000 next=pair draft=38 prop=38 pred gate=device Token # 445: 115.620ms; value: next_token_ids=tensor([38], device='cuda:0') mtp accept=1 prop=38 top1=38 accp=1.000 next=draft=3951 prop=3951 olap pair=110.3ms serial=196.2ms gain=85.9ms ratio=0.44 s0=3.9ms s1=192.3ms wait=0.1/48.3ms pred gate=device Token # 446: 3.745ms; value: next_token_ids=tensor([3951], device='cuda:0') mtp accept=1 prop=3951 top1=3951 accp=1.000 next=pair draft=15 prop=15 pred gate=device Token # 447: 115.362ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=4687 prop=4687 olap pair=110.1ms serial=194.5ms gain=84.4ms ratio=0.43 s0=4.2ms s1=190.3ms wait=0.1/48.1ms pred gate=device Token # 448: 3.733ms; value: next_token_ids=tensor([4687], device='cuda:0') mtp accept=1 prop=4687 top1=4687 accp=1.000 next=pair draft=666 prop=666 pred gate=device Token # 449: 115.303ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=0.986 next=draft=223 prop=223 olap pair=110.0ms serial=195.5ms gain=85.5ms ratio=0.44 s0=3.9ms s1=191.6ms wait=0.1/48.6ms pred gate=device Token # 450: 3.752ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.998 next=pair draft=48084 prop=48084 pred gate=device Token # 451: 115.202ms; value: next_token_ids=tensor([48084], device='cuda:0') mtp accept=1 prop=48084 top1=48084 accp=1.000 next=draft=303 prop=303 olap pair=109.9ms serial=195.2ms gain=85.3ms ratio=0.44 s0=4.0ms s1=191.2ms wait=0.1/48.1ms pred gate=device Token # 452: 3.791ms; value: next_token_ids=tensor([2206], device='cuda:0') mtp accept=0 prop=303 top1=2206 accp=0.509 next=pair draft=9459 prop=9459 pred gate=device Token # 453: 117.673ms; value: next_token_ids=tensor([7085], device='cuda:0') mtp accept=0 prop=9459 top1=7085 accp=0.382 next=draft=99017 prop=99017 olap pair=111.6ms serial=197.7ms gain=86.1ms ratio=0.44 s0=4.9ms s1=192.8ms wait=0.2/46.7ms pred gate=device Token # 454: 115.666ms; value: next_token_ids=tensor([99017], device='cuda:0') mtp accept=1 prop=99017 top1=99017 accp=0.760 next=draft=8842 prop=8842 olap pair=110.1ms serial=195.3ms gain=85.2ms ratio=0.44 s0=6.3ms s1=189.0ms wait=0.2/45.2ms pred gate=device Token # 455: 3.726ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=1.000 next=pair draft=19252 prop=19252 pred gate=device Token # 456: 114.854ms; value: next_token_ids=tensor([19252], device='cuda:0') mtp accept=1 prop=19252 top1=19252 accp=0.986 next=draft=25962 prop=25962 olap pair=109.6ms serial=194.4ms gain=84.8ms ratio=0.44 s0=4.9ms s1=189.4ms wait=0.1/47.2ms pred gate=device Token # 457: 3.837ms; value: next_token_ids=tensor([25962], device='cuda:0') mtp accept=1 prop=25962 top1=25962 accp=0.892 next=pair draft=478 prop=478 pred gate=device Token # 458: 114.703ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=draft=48 prop=48 olap pair=109.3ms serial=193.7ms gain=84.4ms ratio=0.44 s0=4.7ms s1=189.0ms wait=0.1/47.5ms pred gate=device Token # 459: 3.821ms; value: next_token_ids=tensor([48], device='cuda:0') mtp accept=1 prop=48 top1=48 accp=0.878 next=pair draft=1457 prop=1457 pred gate=device Token # 460: 115.029ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=draft=223 prop=223 olap pair=109.8ms serial=194.9ms gain=85.1ms ratio=0.44 s0=4.1ms s1=190.8ms wait=0.1/48.0ms pred gate=device Token # 461: 3.840ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.999 next=pair draft=27600 prop=102797 pred gate=device Token # 462: 114.924ms; value: next_token_ids=tensor([27600], device='cuda:0') mtp accept=0 prop=102797 top1=27600 accp=0.630 next=draft=25864 prop=25864 olap pair=109.6ms serial=194.8ms gain=85.3ms ratio=0.44 s0=3.9ms s1=190.9ms wait=0.1/48.4ms pred gate=device Token # 463: 115.163ms; value: next_token_ids=tensor([25864], device='cuda:0') mtp accept=1 prop=25864 top1=43423 accp=0.146 next=draft=445 prop=445 olap pair=109.8ms serial=194.9ms gain=85.0ms ratio=0.44 s0=4.6ms s1=190.3ms wait=0.1/47.0ms pred gate=device Token # 464: 3.741ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=1 prop=445 top1=445 accp=1.000 next=pair draft=15680 prop=15680 pred gate=device Token # 465: 115.062ms; value: next_token_ids=tensor([15680], device='cuda:0') mtp accept=1 prop=15680 top1=15680 accp=0.986 next=draft=17533 prop=17533 olap pair=109.8ms serial=195.1ms gain=85.2ms ratio=0.44 s0=4.1ms s1=191.0ms wait=0.1/48.1ms pred gate=device Token # 466: 3.750ms; value: next_token_ids=tensor([10051], device='cuda:0') mtp accept=0 prop=17533 top1=10051 accp=0.007 next=pair draft=525 prop=525 pred gate=device Token # 467: 114.795ms; value: next_token_ids=tensor([525], device='cuda:0') mtp accept=1 prop=525 top1=525 accp=0.994 next=draft=62127 prop=62127 olap pair=109.5ms serial=194.6ms gain=85.1ms ratio=0.44 s0=4.5ms s1=190.1ms wait=0.1/47.0ms pred gate=device Token # 468: 3.739ms; value: next_token_ids=tensor([62127], device='cuda:0') mtp accept=1 prop=62127 top1=62127 accp=0.935 next=pair draft=57046 prop=57046 pred gate=device Token # 469: 115.794ms; value: next_token_ids=tensor([3835], device='cuda:0') mtp accept=0 prop=57046 top1=3835 accp=0.082 next=draft=320 prop=320 olap pair=109.8ms serial=194.8ms gain=85.0ms ratio=0.44 s0=5.7ms s1=189.1ms wait=0.2/46.1ms pred gate=device Token # 470: 115.215ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=1877 prop=1877 olap pair=109.6ms serial=194.6ms gain=84.9ms ratio=0.44 s0=4.8ms s1=189.8ms wait=0.1/47.3ms pred gate=device Token # 471: 3.711ms; value: next_token_ids=tensor([1877], device='cuda:0') mtp accept=1 prop=1877 top1=1877 accp=1.000 next=pair draft=3003 prop=3003 pred gate=device Token # 472: 115.283ms; value: next_token_ids=tensor([3003], device='cuda:0') mtp accept=1 prop=3003 top1=3003 accp=0.761 next=draft=10893 prop=10893 olap pair=109.9ms serial=194.9ms gain=85.0ms ratio=0.44 s0=4.4ms s1=190.5ms wait=0.1/47.6ms pred gate=device Token # 473: 3.738ms; value: next_token_ids=tensor([10893], device='cuda:0') mtp accept=1 prop=10893 top1=10893 accp=1.000 next=pair draft=2467 prop=2467 pred gate=device Token # 474: 115.224ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=0 prop=2467 top1=2619 accp=0.080 next=draft=24 prop=24 olap pair=109.9ms serial=195.0ms gain=85.1ms ratio=0.44 s0=4.5ms s1=190.6ms wait=0.1/47.1ms pred gate=device Token # 475: 115.333ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=1 prop=24 top1=24 accp=1.000 next=draft=57 prop=57 olap pair=110.0ms serial=195.4ms gain=85.4ms ratio=0.44 s0=4.4ms s1=191.0ms wait=0.1/47.2ms pred gate=device Token # 476: 3.772ms; value: next_token_ids=tensor([57], device='cuda:0') mtp accept=1 prop=57 top1=57 accp=1.000 next=pair draft=330 prop=330 pred gate=device Token # 477: 114.864ms; value: next_token_ids=tensor([330], device='cuda:0') mtp accept=1 prop=330 top1=330 accp=0.999 next=draft=9422 prop=9422 olap pair=109.5ms serial=194.6ms gain=85.1ms ratio=0.44 s0=4.3ms s1=190.2ms wait=0.1/47.2ms pred gate=device Token # 478: 3.745ms; value: next_token_ids=tensor([9422], device='cuda:0') mtp accept=1 prop=9422 top1=9422 accp=1.000 next=pair draft=666 prop=666 pred gate=device Token # 479: 115.523ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=0.999 next=draft=410 prop=410 olap pair=110.2ms serial=195.9ms gain=85.6ms ratio=0.44 s0=4.3ms s1=191.5ms wait=0.1/47.3ms pred gate=device Token # 480: 3.780ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=1.000 next=pair draft=10176 prop=10176 pred gate=device Token # 481: 115.113ms; value: next_token_ids=tensor([3440], device='cuda:0') mtp accept=0 prop=10176 top1=3440 accp=0.196 next=draft=2619 prop=2619 olap pair=109.9ms serial=195.0ms gain=85.2ms ratio=0.44 s0=4.3ms s1=190.7ms wait=0.1/47.2ms pred gate=device Token # 482: 115.560ms; value: next_token_ids=tensor([3910], device='cuda:0') mtp accept=0 prop=2619 top1=12519 accp=0.002 next=draft=2619 prop=2619 olap pair=110.3ms serial=196.0ms gain=85.8ms ratio=0.44 s0=4.1ms s1=191.9ms wait=0.1/47.8ms pred gate=device Token # 483: 115.417ms; value: next_token_ids=tensor([2619], device='cuda:0') mtp accept=1 prop=2619 top1=2619 accp=1.000 next=draft=22 prop=22 olap pair=110.1ms serial=195.5ms gain=85.4ms ratio=0.44 s0=4.3ms s1=191.2ms wait=0.1/47.3ms pred gate=device Token # 484: 3.730ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=pair draft=558 prop=558 pred gate=device Token # 485: 115.780ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=558 top1=223 accp=0.350 next=draft=558 prop=558 olap pair=110.5ms serial=196.4ms gain=85.9ms ratio=0.44 s0=4.3ms s1=192.0ms wait=0.1/47.3ms pred gate=device Token # 486: 115.333ms; value: next_token_ids=tensor([558], device='cuda:0') mtp accept=1 prop=558 top1=558 accp=1.000 next=draft=659 prop=659 olap pair=110.0ms serial=195.4ms gain=85.4ms ratio=0.44 s0=4.4ms s1=191.0ms wait=0.1/47.2ms pred gate=device Token # 487: 3.853ms; value: next_token_ids=tensor([659], device='cuda:0') mtp accept=1 prop=659 top1=659 accp=0.971 next=pair draft=1954 prop=1954 pred gate=device Token # 488: 115.748ms; value: next_token_ids=tensor([1954], device='cuda:0') mtp accept=1 prop=1954 top1=1954 accp=1.000 next=draft=3448 prop=3448 olap pair=110.4ms serial=196.1ms gain=85.7ms ratio=0.44 s0=4.4ms s1=191.7ms wait=0.1/47.4ms pred gate=device Token # 489: 3.766ms; value: next_token_ids=tensor([3448], device='cuda:0') mtp accept=1 prop=3448 top1=3448 accp=0.996 next=pair draft=1237 prop=1237 pred gate=device Token # 490: 115.700ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=0 prop=1237 top1=666 accp=0.215 next=draft=1237 prop=1237 olap pair=110.4ms serial=195.5ms gain=85.1ms ratio=0.44 s0=4.8ms s1=190.7ms wait=0.1/46.8ms pred gate=device Token # 491: 115.019ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=0.991 next=draft=35 prop=35 olap pair=109.7ms serial=194.9ms gain=85.1ms ratio=0.44 s0=4.1ms s1=190.7ms wait=0.1/47.8ms pred gate=device Token # 492: 3.730ms; value: next_token_ids=tensor([11799], device='cuda:0') mtp accept=0 prop=35 top1=11799 accp=0.188 next=pair draft=334 prop=334 pred gate=device Token # 493: 115.737ms; value: next_token_ids=tensor([334], device='cuda:0') mtp accept=1 prop=334 top1=334 accp=0.997 next=draft=14929 prop=14929 olap pair=110.4ms serial=196.2ms gain=85.8ms ratio=0.44 s0=4.0ms s1=192.2ms wait=0.1/48.5ms pred gate=device Token # 494: 3.724ms; value: next_token_ids=tensor([14929], device='cuda:0') mtp accept=1 prop=14929 top1=14929 accp=1.000 next=pair draft=12867 prop=12867 pred gate=device Token # 495: 115.245ms; value: next_token_ids=tensor([12867], device='cuda:0') mtp accept=1 prop=12867 top1=12867 accp=1.000 next=draft=10275 prop=10275 olap pair=110.0ms serial=194.3ms gain=84.3ms ratio=0.43 s0=4.3ms s1=190.0ms wait=0.1/48.1ms pred gate=device Token # 496: 3.753ms; value: next_token_ids=tensor([10275], device='cuda:0') mtp accept=1 prop=10275 top1=10275 accp=1.000 next=pair draft=223 prop=303 pred gate=device Token # 497: 116.096ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=303 top1=223 accp=0.955 next=draft=35991 prop=35991 olap pair=109.9ms serial=194.4ms gain=84.5ms ratio=0.43 s0=4.4ms s1=190.0ms wait=0.1/48.0ms pred gate=device Token # 498: 115.608ms; value: next_token_ids=tensor([35991], device='cuda:0') mtp accept=1 prop=35991 top1=35991 accp=0.998 next=draft=303 prop=303 olap pair=110.0ms serial=195.1ms gain=85.2ms ratio=0.44 s0=5.0ms s1=190.1ms wait=0.1/47.2ms pred gate=device Token # 499: 3.751ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=0 prop=303 top1=301 accp=0.112 next=pair draft=5570 prop=5570 pred gate=device Token # 500: 116.328ms; value: next_token_ids=tensor([5570], device='cuda:0') mtp accept=1 prop=5570 top1=5570 accp=1.000 next=draft=4607 prop=4607 olap pair=111.0ms serial=196.8ms gain=85.8ms ratio=0.44 s0=4.2ms s1=192.6ms wait=0.1/47.9ms pred gate=device Token # 501: 3.789ms; value: next_token_ids=tensor([4607], device='cuda:0') mtp accept=1 prop=4607 top1=4607 accp=1.000 next=pair draft=1039 prop=1039 pred gate=device Token # 502: 115.526ms; value: next_token_ids=tensor([1039], device='cuda:0') mtp accept=1 prop=1039 top1=1039 accp=1.000 next=draft=223 prop=223 olap pair=110.2ms serial=195.8ms gain=85.5ms ratio=0.44 s0=4.2ms s1=191.6ms wait=0.1/47.8ms pred gate=device Token # 503: 3.829ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.895 next=pair draft=9139 prop=9139 pred gate=device Token # 504: 115.538ms; value: next_token_ids=tensor([9139], device='cuda:0') mtp accept=1 prop=9139 top1=9139 accp=0.967 next=draft=13968 prop=13968 olap pair=110.2ms serial=194.3ms gain=84.0ms ratio=0.43 s0=4.3ms s1=190.0ms wait=0.1/47.8ms pred gate=device Token # 505: 3.749ms; value: next_token_ids=tensor([13968], device='cuda:0') mtp accept=1 prop=13968 top1=13968 accp=1.000 next=pair draft=5852 prop=3440 pred gate=device Token # 506: 116.299ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=0 prop=3440 top1=5852 accp=0.561 next=draft=1644 prop=1644 olap pair=111.0ms serial=197.2ms gain=86.2ms ratio=0.44 s0=4.1ms s1=193.1ms wait=0.1/48.1ms pred gate=device Token # 507: 115.799ms; value: next_token_ids=tensor([1644], device='cuda:0') mtp accept=1 prop=1644 top1=1644 accp=1.000 next=draft=17309 prop=17309 olap pair=110.4ms serial=196.3ms gain=85.9ms ratio=0.44 s0=4.1ms s1=192.1ms wait=0.1/48.0ms pred gate=device Token # 508: 3.702ms; value: next_token_ids=tensor([17309], device='cuda:0') mtp accept=1 prop=17309 top1=17309 accp=1.000 next=pair draft=26127 prop=26127 pred gate=device Token # 509: 115.424ms; value: next_token_ids=tensor([26127], device='cuda:0') mtp accept=1 prop=26127 top1=26127 accp=0.999 next=draft=666 prop=666 olap pair=110.2ms serial=195.4ms gain=85.3ms ratio=0.44 s0=4.4ms s1=191.0ms wait=0.1/47.2ms pred gate=device Token # 510: 3.728ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 511: 114.787ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.995 next=draft=63115 prop=63115 olap pair=109.4ms serial=193.6ms gain=84.2ms ratio=0.44 s0=4.3ms s1=189.3ms wait=0.1/47.5ms pred gate=device Token # 512: 3.760ms; value: next_token_ids=tensor([63115], device='cuda:0') mtp accept=1 prop=63115 top1=63115 accp=1.000 next=pair draft=99924 prop=99924 pred gate=device Token # 513: 115.922ms; value: next_token_ids=tensor([99924], device='cuda:0') mtp accept=1 prop=99924 top1=99924 accp=1.000 next=draft=1527 prop=1527 olap pair=110.6ms serial=196.4ms gain=85.7ms ratio=0.44 s0=4.3ms s1=192.0ms wait=0.1/47.3ms pred gate=device Token # 514: 3.806ms; value: next_token_ids=tensor([1527], device='cuda:0') mtp accept=1 prop=1527 top1=1527 accp=1.000 next=pair draft=5926 prop=5926 pred gate=device Token # 515: 115.630ms; value: next_token_ids=tensor([5926], device='cuda:0') mtp accept=1 prop=5926 top1=5926 accp=1.000 next=draft=223 prop=223 olap pair=110.3ms serial=194.8ms gain=84.5ms ratio=0.43 s0=4.2ms s1=190.5ms wait=0.1/47.9ms pred gate=device Token # 516: 3.863ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=43423 prop=43423 pred gate=device Token # 517: 115.313ms; value: next_token_ids=tensor([43423], device='cuda:0') mtp accept=1 prop=43423 top1=43423 accp=1.000 next=draft=303 prop=320 olap pair=110.0ms serial=195.6ms gain=85.5ms ratio=0.44 s0=4.0ms s1=191.6ms wait=0.1/48.2ms pred gate=device Token # 518: 3.770ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=320 top1=303 accp=0.839 next=pair draft=94353 prop=94353 pred gate=device Token # 519: 115.113ms; value: next_token_ids=tensor([94353], device='cuda:0') mtp accept=1 prop=94353 top1=94353 accp=0.994 next=draft=223 prop=223 olap pair=109.8ms serial=195.2ms gain=85.4ms ratio=0.44 s0=3.9ms s1=191.3ms wait=0.1/48.5ms pred gate=device Token # 520: 3.793ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.829 next=pair draft=20155 prop=20155 pred gate=device Token # 521: 115.424ms; value: next_token_ids=tensor([20155], device='cuda:0') mtp accept=1 prop=20155 top1=20155 accp=0.995 next=draft=31205 prop=31205 olap pair=110.1ms serial=195.8ms gain=85.6ms ratio=0.44 s0=3.9ms s1=191.8ms wait=0.1/48.4ms pred gate=device Token # 522: 3.752ms; value: next_token_ids=tensor([31205], device='cuda:0') mtp accept=1 prop=31205 top1=31205 accp=0.999 next=pair draft=1146 prop=1146 pred gate=device Token # 523: 115.050ms; value: next_token_ids=tensor([1146], device='cuda:0') mtp accept=1 prop=1146 top1=1146 accp=0.907 next=draft=17839 prop=17839 olap pair=109.8ms serial=194.8ms gain=85.0ms ratio=0.44 s0=4.0ms s1=190.8ms wait=0.1/48.2ms pred gate=device Token # 524: 3.760ms; value: next_token_ids=tensor([17839], device='cuda:0') mtp accept=1 prop=17839 top1=17839 accp=0.660 next=pair draft=728 prop=6182 pred gate=device Token # 525: 115.602ms; value: next_token_ids=tensor([11642], device='cuda:0') mtp accept=0 prop=6182 top1=11642 accp=0.064 next=draft=428 prop=428 olap pair=110.4ms serial=195.3ms gain=84.9ms ratio=0.43 s0=4.1ms s1=191.2ms wait=0.1/48.2ms pred gate=device Token # 526: 115.385ms; value: next_token_ids=tensor([33037], device='cuda:0') mtp accept=0 prop=428 top1=33037 accp=0.125 next=draft=428 prop=428 olap pair=110.1ms serial=195.5ms gain=85.4ms ratio=0.44 s0=4.0ms s1=191.5ms wait=0.1/48.3ms pred gate=device Token # 527: 116.282ms; value: next_token_ids=tensor([659], device='cuda:0') mtp accept=0 prop=428 top1=659 accp=0.112 next=draft=79771 prop=4398 olap pair=110.0ms serial=194.6ms gain=84.6ms ratio=0.43 s0=7.0ms s1=187.6ms wait=0.2/44.6ms pred gate=device Token # 528: 115.363ms; value: next_token_ids=tensor([12411], device='cuda:0') mtp accept=0 prop=4398 top1=79771 accp=0.833 next=draft=428 prop=428 olap pair=110.0ms serial=194.7ms gain=84.7ms ratio=0.43 s0=6.3ms s1=188.4ms wait=0.2/45.3ms pred gate=device Token # 529: 116.872ms; value: next_token_ids=tensor([1081], device='cuda:0') mtp accept=0 prop=428 top1=1081 accp=0.004 next=draft=41535 prop=41535 olap pair=110.7ms serial=196.0ms gain=85.3ms ratio=0.44 s0=4.3ms s1=191.7ms wait=0.1/47.8ms pred gate=device Token # 530: 116.704ms; value: next_token_ids=tensor([4673], device='cuda:0') mtp accept=0 prop=41535 top1=4673 accp=0.209 next=draft=320 prop=320 olap pair=110.3ms serial=194.6ms gain=84.3ms ratio=0.43 s0=8.8ms s1=185.8ms wait=0.2/42.3ms pred gate=device Token # 531: 116.685ms; value: next_token_ids=tensor([6533], device='cuda:0') mtp accept=0 prop=320 top1=6533 accp=0.023 next=draft=320 prop=320 olap pair=111.1ms serial=196.9ms gain=85.8ms ratio=0.44 s0=6.3ms s1=190.7ms wait=0.2/45.6ms pred gate=device Token # 532: 116.014ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.998 next=draft=939 prop=939 olap pair=110.6ms serial=196.8ms gain=86.2ms ratio=0.44 s0=4.4ms s1=192.4ms wait=0.1/47.2ms pred gate=device Token # 533: 3.792ms; value: next_token_ids=tensor([83617], device='cuda:0') mtp accept=0 prop=939 top1=83617 accp=0.124 next=pair draft=939 prop=939 pred gate=device Token # 534: 115.578ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=939 top1=223 accp=0.197 next=draft=939 prop=939 olap pair=110.2ms serial=196.0ms gain=85.7ms ratio=0.44 s0=4.3ms s1=191.7ms wait=0.1/47.6ms pred gate=device Token # 535: 116.470ms; value: next_token_ids=tensor([939], device='cuda:0') mtp accept=1 prop=939 top1=939 accp=1.000 next=draft=23 prop=23 olap pair=110.3ms serial=195.7ms gain=85.4ms ratio=0.44 s0=5.4ms s1=190.3ms wait=0.2/46.3ms pred gate=device Token # 536: 4.552ms; value: next_token_ids=tensor([23], device='cuda:0') mtp accept=1 prop=23 top1=23 accp=1.000 next=pair draft=695 prop=695 pred gate=device Token # 537: 115.690ms; value: next_token_ids=tensor([695], device='cuda:0') mtp accept=1 prop=695 top1=223 accp=0.092 next=draft=303 prop=303 olap pair=110.4ms serial=196.1ms gain=85.8ms ratio=0.44 s0=4.5ms s1=191.7ms wait=0.1/47.2ms pred gate=device Token # 538: 3.769ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=4025 prop=4025 pred gate=device Token # 539: 115.430ms; value: next_token_ids=tensor([48518], device='cuda:0') mtp accept=0 prop=4025 top1=48518 accp=0.451 next=draft=8407 prop=13243 olap pair=110.2ms serial=195.9ms gain=85.7ms ratio=0.44 s0=4.5ms s1=191.4ms wait=0.1/47.2ms pred gate=device Token # 540: 115.544ms; value: next_token_ids=tensor([8407], device='cuda:0') mtp accept=0 prop=13243 top1=8407 accp=0.610 next=draft=10491 prop=10491 olap pair=110.1ms serial=194.8ms gain=84.7ms ratio=0.43 s0=4.2ms s1=190.6ms wait=0.1/48.0ms pred gate=device Token # 541: 116.320ms; value: next_token_ids=tensor([6640], device='cuda:0') mtp accept=0 prop=10491 top1=6640 accp=0.015 next=draft=38014 prop=38014 olap pair=110.9ms serial=196.1ms gain=85.2ms ratio=0.43 s0=7.4ms s1=188.7ms wait=0.2/43.8ms pred gate=device Token # 542: 116.050ms; value: next_token_ids=tensor([38014], device='cuda:0') mtp accept=1 prop=38014 top1=38014 accp=1.000 next=draft=471 prop=471 olap pair=110.5ms serial=196.4ms gain=85.8ms ratio=0.44 s0=4.0ms s1=192.4ms wait=0.1/48.3ms pred gate=device Token # 543: 4.309ms; value: next_token_ids=tensor([471], device='cuda:0') mtp accept=1 prop=471 top1=471 accp=0.994 next=pair draft=1457 prop=1457 pred gate=device Token # 544: 115.911ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=draft=223 prop=223 olap pair=110.3ms serial=195.4ms gain=85.1ms ratio=0.44 s0=5.3ms s1=190.1ms wait=0.1/46.7ms pred gate=device Token # 545: 3.821ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=5402 prop=58788 pred gate=device Token # 546: 116.526ms; value: next_token_ids=tensor([5402], device='cuda:0') mtp accept=0 prop=58788 top1=5402 accp=0.817 next=draft=10447 prop=10447 olap pair=110.3ms serial=195.6ms gain=85.3ms ratio=0.44 s0=5.9ms s1=189.6ms wait=0.2/46.0ms pred gate=device Token # 547: 115.679ms; value: next_token_ids=tensor([10447], device='cuda:0') mtp accept=1 prop=10447 top1=10447 accp=0.999 next=draft=3592 prop=3592 olap pair=110.0ms serial=195.0ms gain=85.0ms ratio=0.44 s0=4.8ms s1=190.3ms wait=0.1/47.1ms pred gate=device Token # 548: 3.756ms; value: next_token_ids=tensor([3592], device='cuda:0') mtp accept=1 prop=3592 top1=3592 accp=1.000 next=pair draft=112016 prop=112016 pred gate=device Token # 549: 115.512ms; value: next_token_ids=tensor([112016], device='cuda:0') mtp accept=1 prop=112016 top1=112016 accp=1.000 next=draft=15227 prop=15227 olap pair=110.2ms serial=195.4ms gain=85.2ms ratio=0.44 s0=6.2ms s1=189.2ms wait=0.2/45.5ms pred gate=device Token # 550: 3.973ms; value: next_token_ids=tensor([15227], device='cuda:0') mtp accept=1 prop=15227 top1=15227 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 551: 115.707ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=2897 prop=2897 olap pair=110.4ms serial=196.3ms gain=85.9ms ratio=0.44 s0=4.2ms s1=192.1ms wait=0.1/47.7ms pred gate=device Token # 552: 3.796ms; value: next_token_ids=tensor([2897], device='cuda:0') mtp accept=1 prop=2897 top1=48247 accp=0.387 next=pair draft=38092 prop=38092 pred gate=device Token # 553: 115.230ms; value: next_token_ids=tensor([38092], device='cuda:0') mtp accept=1 prop=38092 top1=38092 accp=0.895 next=draft=950 prop=950 olap pair=109.9ms serial=195.0ms gain=85.1ms ratio=0.44 s0=4.1ms s1=190.9ms wait=0.1/47.9ms pred gate=device Token # 554: 3.824ms; value: next_token_ids=tensor([950], device='cuda:0') mtp accept=1 prop=950 top1=950 accp=0.987 next=pair draft=55273 prop=55273 pred gate=device Token # 555: 116.527ms; value: next_token_ids=tensor([55273], device='cuda:0') mtp accept=1 prop=55273 top1=55273 accp=1.000 next=draft=410 prop=410 olap pair=111.1ms serial=197.1ms gain=86.0ms ratio=0.44 s0=5.4ms s1=191.7ms wait=0.2/46.3ms pred gate=device Token # 556: 3.793ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=1.000 next=pair draft=3695 prop=3695 pred gate=device Token # 557: 115.804ms; value: next_token_ids=tensor([3695], device='cuda:0') mtp accept=1 prop=3695 top1=3695 accp=1.000 next=draft=5058 prop=5058 olap pair=110.4ms serial=196.2ms gain=85.8ms ratio=0.44 s0=4.0ms s1=192.2ms wait=0.1/48.3ms pred gate=device Token # 558: 3.789ms; value: next_token_ids=tensor([5058], device='cuda:0') mtp accept=1 prop=5058 top1=5058 accp=1.000 next=pair draft=410 prop=410 pred gate=device Token # 559: 115.395ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=1.000 next=draft=58603 prop=58603 olap pair=109.9ms serial=194.1ms gain=84.2ms ratio=0.43 s0=4.8ms s1=189.3ms wait=0.1/47.3ms pred gate=device Token # 560: 3.771ms; value: next_token_ids=tensor([58603], device='cuda:0') mtp accept=1 prop=58603 top1=58603 accp=1.000 next=pair draft=12052 prop=23590 pred gate=device Token # 561: 115.542ms; value: next_token_ids=tensor([10861], device='cuda:0') mtp accept=0 prop=23590 top1=10861 accp=0.011 next=draft=3515 prop=3515 olap pair=110.2ms serial=195.0ms gain=84.8ms ratio=0.44 s0=7.0ms s1=188.0ms wait=0.2/44.6ms pred gate=device Token # 562: 117.098ms; value: next_token_ids=tensor([3515], device='cuda:0') mtp accept=1 prop=3515 top1=3515 accp=1.000 next=draft=545 prop=545 olap pair=110.9ms serial=196.3ms gain=85.4ms ratio=0.44 s0=7.1ms s1=189.2ms wait=0.2/44.5ms pred gate=device Token # 563: 4.683ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=0 prop=545 top1=223 accp=0.079 next=pair draft=2111 prop=2111 pred gate=device Token # 564: 115.857ms; value: next_token_ids=tensor([2111], device='cuda:0') mtp accept=1 prop=2111 top1=2111 accp=1.000 next=draft=8948 prop=8948 olap pair=110.4ms serial=195.2ms gain=84.8ms ratio=0.43 s0=5.9ms s1=189.3ms wait=0.2/45.9ms pred gate=device Token # 565: 3.728ms; value: next_token_ids=tensor([8948], device='cuda:0') mtp accept=1 prop=8948 top1=8948 accp=0.989 next=pair draft=223 prop=223 pred gate=device Token # 566: 116.493ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=26127 prop=26127 olap pair=110.2ms serial=194.8ms gain=84.5ms ratio=0.43 s0=4.4ms s1=190.4ms wait=0.1/47.9ms pred gate=device Token # 567: 4.814ms; value: next_token_ids=tensor([26127], device='cuda:0') mtp accept=1 prop=26127 top1=26127 accp=1.000 next=pair draft=12145 prop=12145 pred gate=device Token # 568: 116.159ms; value: next_token_ids=tensor([12145], device='cuda:0') mtp accept=1 prop=12145 top1=12145 accp=1.000 next=draft=301 prop=301 olap pair=110.8ms serial=196.4ms gain=85.6ms ratio=0.44 s0=4.1ms s1=192.3ms wait=0.1/48.3ms pred gate=device Token # 569: 3.803ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=0.998 next=pair draft=75778 prop=75778 pred gate=device Token # 570: 116.062ms; value: next_token_ids=tensor([75778], device='cuda:0') mtp accept=1 prop=75778 top1=75778 accp=1.000 next=draft=478 prop=478 olap pair=110.7ms serial=196.9ms gain=86.1ms ratio=0.44 s0=4.0ms s1=192.9ms wait=0.1/48.4ms pred gate=device Token # 571: 3.819ms; value: next_token_ids=tensor([2206], device='cuda:0') mtp accept=0 prop=478 top1=2206 accp=0.000 next=pair draft=15680 prop=15680 pred gate=device Token # 572: 115.522ms; value: next_token_ids=tensor([15680], device='cuda:0') mtp accept=1 prop=15680 top1=15680 accp=0.954 next=draft=15241 prop=15241 olap pair=110.2ms serial=195.8ms gain=85.6ms ratio=0.44 s0=3.9ms s1=191.8ms wait=0.1/48.4ms pred gate=device Token # 573: 3.787ms; value: next_token_ids=tensor([15241], device='cuda:0') mtp accept=1 prop=15241 top1=15241 accp=0.999 next=pair draft=4538 prop=4538 pred gate=device Token # 574: 116.324ms; value: next_token_ids=tensor([4538], device='cuda:0') mtp accept=1 prop=4538 top1=4538 accp=0.901 next=draft=1877 prop=1877 olap pair=110.8ms serial=196.9ms gain=86.1ms ratio=0.44 s0=5.5ms s1=191.4ms wait=0.2/46.3ms pred gate=device Token # 575: 3.791ms; value: next_token_ids=tensor([1877], device='cuda:0') mtp accept=1 prop=1877 top1=1877 accp=0.903 next=pair draft=428 prop=428 pred gate=device Token # 576: 115.662ms; value: next_token_ids=tensor([428], device='cuda:0') mtp accept=1 prop=428 top1=428 accp=1.000 next=draft=12145 prop=12145 olap pair=110.3ms serial=196.0ms gain=85.7ms ratio=0.44 s0=4.0ms s1=192.0ms wait=0.1/48.2ms pred gate=device Token # 577: 3.793ms; value: next_token_ids=tensor([12145], device='cuda:0') mtp accept=1 prop=12145 top1=12145 accp=0.958 next=pair draft=98622 prop=98622 pred gate=device Token # 578: 115.621ms; value: next_token_ids=tensor([98622], device='cuda:0') mtp accept=1 prop=98622 top1=98622 accp=0.976 next=draft=410 prop=410 olap pair=110.3ms serial=196.0ms gain=85.7ms ratio=0.44 s0=4.1ms s1=191.9ms wait=0.1/48.2ms pred gate=device Token # 579: 3.754ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=0.982 next=pair draft=6280 prop=6280 pred gate=device Token # 580: 115.730ms; value: next_token_ids=tensor([6280], device='cuda:0') mtp accept=1 prop=6280 top1=6280 accp=0.936 next=draft=5137 prop=5137 olap pair=110.4ms serial=196.2ms gain=85.8ms ratio=0.44 s0=4.0ms s1=192.2ms wait=0.1/48.3ms pred gate=device Token # 581: 3.860ms; value: next_token_ids=tensor([5137], device='cuda:0') mtp accept=1 prop=5137 top1=5137 accp=1.000 next=pair draft=11177 prop=11177 pred gate=device Token # 582: 116.027ms; value: next_token_ids=tensor([11177], device='cuda:0') mtp accept=1 prop=11177 top1=11177 accp=1.000 next=draft=3728 prop=3728 olap pair=110.7ms serial=196.8ms gain=86.1ms ratio=0.44 s0=4.0ms s1=192.8ms wait=0.1/48.1ms pred gate=device Token # 583: 3.721ms; value: next_token_ids=tensor([3728], device='cuda:0') mtp accept=1 prop=3728 top1=3728 accp=0.579 next=pair draft=4899 prop=4899 pred gate=device Token # 584: 115.246ms; value: next_token_ids=tensor([4899], device='cuda:0') mtp accept=1 prop=4899 top1=4899 accp=0.981 next=draft=10251 prop=10251 olap pair=110.0ms serial=194.7ms gain=84.8ms ratio=0.44 s0=4.5ms s1=190.2ms wait=0.1/47.6ms pred gate=device Token # 585: 3.744ms; value: next_token_ids=tensor([9144], device='cuda:0') mtp accept=0 prop=10251 top1=9144 accp=0.350 next=pair draft=389 prop=389 pred gate=device Token # 586: 115.633ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=1.000 next=draft=303 prop=303 olap pair=110.3ms serial=195.4ms gain=85.1ms ratio=0.44 s0=4.1ms s1=191.3ms wait=0.1/48.0ms pred gate=device Token # 587: 3.787ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.608 next=pair draft=7206 prop=7206 pred gate=device Token # 588: 114.949ms; value: next_token_ids=tensor([7206], device='cuda:0') mtp accept=1 prop=7206 top1=7206 accp=1.000 next=draft=2619 prop=35185 olap pair=109.7ms serial=194.8ms gain=85.1ms ratio=0.44 s0=4.2ms s1=190.6ms wait=0.1/47.6ms pred gate=device Token # 589: 3.853ms; value: next_token_ids=tensor([35185], device='cuda:0') mtp accept=1 prop=35185 top1=2619 accp=0.538 next=pair draft=90044 prop=90044 pred gate=device Token # 590: 115.126ms; value: next_token_ids=tensor([90044], device='cuda:0') mtp accept=1 prop=90044 top1=2619 accp=0.158 next=draft=223 prop=223 olap pair=109.9ms serial=195.1ms gain=85.2ms ratio=0.44 s0=4.0ms s1=191.1ms wait=0.1/48.1ms pred gate=device Token # 591: 3.757ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=1863 prop=1863 pred gate=device Token # 592: 115.552ms; value: next_token_ids=tensor([1863], device='cuda:0') mtp accept=1 prop=1863 top1=1863 accp=0.996 next=draft=2752 prop=2752 olap pair=110.3ms serial=195.9ms gain=85.6ms ratio=0.44 s0=5.1ms s1=190.8ms wait=0.1/47.0ms pred gate=device Token # 593: 3.800ms; value: next_token_ids=tensor([2752], device='cuda:0') mtp accept=1 prop=2752 top1=2752 accp=0.955 next=pair draft=1237 prop=1237 pred gate=device Token # 594: 115.749ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=draft=7590 prop=7590 olap pair=110.4ms serial=195.6ms gain=85.2ms ratio=0.44 s0=4.1ms s1=191.5ms wait=0.1/48.2ms pred gate=device Token # 595: 3.749ms; value: next_token_ids=tensor([7590], device='cuda:0') mtp accept=1 prop=7590 top1=7590 accp=0.996 next=pair draft=111230 prop=111230 pred gate=device Token # 596: 115.409ms; value: next_token_ids=tensor([111230], device='cuda:0') mtp accept=1 prop=111230 top1=111230 accp=0.872 next=draft=223 prop=223 olap pair=110.2ms serial=195.7ms gain=85.5ms ratio=0.44 s0=4.0ms s1=191.6ms wait=0.1/48.1ms pred gate=device Token # 597: 3.790ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.805 next=pair draft=8122 prop=8122 pred gate=device Token # 598: 115.241ms; value: next_token_ids=tensor([8122], device='cuda:0') mtp accept=1 prop=8122 top1=8122 accp=0.784 next=draft=5058 prop=5058 olap pair=109.8ms serial=195.1ms gain=85.3ms ratio=0.44 s0=4.0ms s1=191.1ms wait=0.1/48.3ms pred gate=device Token # 599: 3.839ms; value: next_token_ids=tensor([5058], device='cuda:0') mtp accept=1 prop=5058 top1=5058 accp=1.000 next=pair draft=11113 prop=11113 pred gate=device Token # 600: 115.249ms; value: next_token_ids=tensor([11113], device='cuda:0') mtp accept=1 prop=11113 top1=11113 accp=1.000 next=draft=410 prop=410 olap pair=109.9ms serial=195.1ms gain=85.2ms ratio=0.44 s0=4.1ms s1=191.0ms wait=0.1/48.1ms pred gate=device Token # 601: 3.712ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=1 prop=410 top1=410 accp=0.983 next=pair draft=53314 prop=53314 pred gate=device Token # 602: 115.629ms; value: next_token_ids=tensor([53314], device='cuda:0') mtp accept=1 prop=53314 top1=53314 accp=0.617 next=draft=99924 prop=99924 olap pair=110.3ms serial=196.1ms gain=85.8ms ratio=0.44 s0=4.1ms s1=192.0ms wait=0.1/47.8ms pred gate=device Token # 603: 3.784ms; value: next_token_ids=tensor([15188], device='cuda:0') mtp accept=0 prop=99924 top1=15188 accp=0.000 next=pair draft=19 prop=19 pred gate=device Token # 604: 115.351ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=17 prop=17 olap pair=110.0ms serial=195.5ms gain=85.5ms ratio=0.44 s0=3.9ms s1=191.6ms wait=0.1/48.5ms pred gate=device Token # 605: 3.824ms; value: next_token_ids=tensor([17], device='cuda:0') mtp accept=1 prop=17 top1=17 accp=1.000 next=pair draft=9259 prop=9259 pred gate=device Token # 606: 115.485ms; value: next_token_ids=tensor([9259], device='cuda:0') mtp accept=1 prop=9259 top1=9259 accp=1.000 next=draft=20 prop=20 olap pair=110.1ms serial=195.7ms gain=85.6ms ratio=0.44 s0=3.8ms s1=191.9ms wait=0.1/48.8ms pred gate=device Token # 607: 3.712ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 608: 115.766ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=99924 prop=99924 olap pair=110.4ms serial=196.3ms gain=85.9ms ratio=0.44 s0=4.1ms s1=192.2ms wait=0.1/47.8ms pred gate=device Token # 609: 3.767ms; value: next_token_ids=tensor([99924], device='cuda:0') mtp accept=1 prop=99924 top1=99924 accp=1.000 next=pair draft=8459 prop=8459 pred gate=device Token # 610: 115.375ms; value: next_token_ids=tensor([8459], device='cuda:0') mtp accept=1 prop=8459 top1=8459 accp=0.949 next=draft=1227 prop=1227 olap pair=110.0ms serial=195.5ms gain=85.5ms ratio=0.44 s0=4.4ms s1=191.1ms wait=0.1/47.3ms pred gate=device Token # 611: 3.769ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=1 prop=1227 top1=1227 accp=1.000 next=pair draft=776 prop=776 pred gate=device Token # 612: 115.542ms; value: next_token_ids=tensor([776], device='cuda:0') mtp accept=1 prop=776 top1=776 accp=1.000 next=draft=303 prop=303 olap pair=110.2ms serial=195.8ms gain=85.6ms ratio=0.44 s0=4.4ms s1=191.4ms wait=0.1/47.4ms pred gate=device Token # 613: 3.727ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=48 prop=48 pred gate=device Token # 614: 115.086ms; value: next_token_ids=tensor([48], device='cuda:0') mtp accept=1 prop=48 top1=48 accp=0.998 next=draft=1457 prop=1457 olap pair=109.8ms serial=194.4ms gain=84.6ms ratio=0.44 s0=6.8ms s1=187.6ms wait=0.2/45.3ms pred gate=device Token # 615: 3.756ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 616: 115.058ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.978 next=draft=69484 prop=301 olap pair=109.8ms serial=195.2ms gain=85.4ms ratio=0.44 s0=4.0ms s1=191.3ms wait=0.1/48.4ms pred gate=device Token # 617: 3.716ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=1644 accp=0.586 next=pair draft=666 prop=666 pred gate=device Token # 618: 115.393ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=0.933 next=draft=1644 prop=1644 olap pair=110.1ms serial=195.6ms gain=85.5ms ratio=0.44 s0=3.9ms s1=191.8ms wait=0.1/48.5ms pred gate=device Token # 619: 3.740ms; value: next_token_ids=tensor([1644], device='cuda:0') mtp accept=1 prop=1644 top1=1644 accp=1.000 next=pair draft=3448 prop=3448 pred gate=device Token # 620: 115.965ms; value: next_token_ids=tensor([3448], device='cuda:0') mtp accept=1 prop=3448 top1=3448 accp=1.000 next=draft=17529 prop=666 olap pair=110.6ms serial=195.4ms gain=84.8ms ratio=0.43 s0=4.3ms s1=191.2ms wait=0.1/48.0ms pred gate=device Token # 621: 4.870ms; value: next_token_ids=tensor([17529], device='cuda:0') mtp accept=0 prop=666 top1=17529 accp=0.540 next=pair draft=1263 prop=1263 pred gate=device Token # 622: 115.582ms; value: next_token_ids=tensor([1263], device='cuda:0') mtp accept=1 prop=1263 top1=1263 accp=0.998 next=draft=8467 prop=8467 olap pair=110.0ms serial=195.2ms gain=85.2ms ratio=0.44 s0=6.1ms s1=189.1ms wait=0.2/45.5ms pred gate=device Token # 623: 3.729ms; value: next_token_ids=tensor([8467], device='cuda:0') mtp accept=1 prop=8467 top1=8467 accp=1.000 next=pair draft=6772 prop=6772 pred gate=device Token # 624: 115.203ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=0 prop=6772 top1=445 accp=0.168 next=draft=20155 prop=20155 olap pair=110.0ms serial=195.6ms gain=85.6ms ratio=0.44 s0=4.1ms s1=191.5ms wait=0.1/48.1ms pred gate=device Token # 625: 115.696ms; value: next_token_ids=tensor([20155], device='cuda:0') mtp accept=1 prop=20155 top1=20155 accp=0.999 next=draft=1701 prop=1701 olap pair=110.3ms serial=195.9ms gain=85.6ms ratio=0.44 s0=3.9ms s1=192.0ms wait=0.1/48.6ms pred gate=device Token # 626: 3.775ms; value: next_token_ids=tensor([1701], device='cuda:0') mtp accept=1 prop=1701 top1=1701 accp=1.000 next=pair draft=2858 prop=2858 pred gate=device Token # 627: 115.043ms; value: next_token_ids=tensor([2858], device='cuda:0') mtp accept=1 prop=2858 top1=2858 accp=1.000 next=draft=301 prop=301 olap pair=109.7ms serial=194.9ms gain=85.1ms ratio=0.44 s0=4.1ms s1=190.7ms wait=0.1/48.3ms pred gate=device Token # 628: 3.707ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 629: 115.430ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=21 prop=21 olap pair=110.2ms serial=195.9ms gain=85.6ms ratio=0.44 s0=3.9ms s1=192.0ms wait=0.1/48.5ms pred gate=device Token # 630: 3.766ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 631: 115.357ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=22 prop=22 olap pair=110.1ms serial=194.8ms gain=84.7ms ratio=0.43 s0=8.1ms s1=186.8ms wait=0.2/43.3ms pred gate=device Token # 632: 3.777ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=pair draft=53520 prop=53520 pred gate=device Token # 633: 115.475ms; value: next_token_ids=tensor([53520], device='cuda:0') mtp accept=1 prop=53520 top1=53520 accp=0.999 next=draft=666 prop=666 olap pair=110.2ms serial=195.8ms gain=85.7ms ratio=0.44 s0=4.3ms s1=191.6ms wait=0.1/47.4ms pred gate=device Token # 634: 3.781ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 635: 115.138ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=59717 prop=59717 olap pair=109.9ms serial=195.3ms gain=85.4ms ratio=0.44 s0=4.2ms s1=191.0ms wait=0.1/47.7ms pred gate=device Token # 636: 3.774ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=0 prop=59717 top1=666 accp=0.120 next=pair draft=984 prop=984 pred gate=device Token # 637: 114.988ms; value: next_token_ids=tensor([984], device='cuda:0') mtp accept=1 prop=984 top1=984 accp=0.981 next=draft=3448 prop=3448 olap pair=109.7ms serial=195.0ms gain=85.3ms ratio=0.44 s0=3.8ms s1=191.2ms wait=0.1/48.5ms pred gate=device Token # 638: 3.778ms; value: next_token_ids=tensor([3448], device='cuda:0') mtp accept=1 prop=3448 top1=3448 accp=0.984 next=pair draft=7635 prop=7635 pred gate=device Token # 639: 116.087ms; value: next_token_ids=tensor([7635], device='cuda:0') mtp accept=1 prop=7635 top1=7635 accp=0.996 next=draft=41535 prop=41535 olap pair=110.8ms serial=196.9ms gain=86.1ms ratio=0.44 s0=3.9ms s1=192.9ms wait=0.1/48.3ms pred gate=device Token # 640: 3.770ms; value: next_token_ids=tensor([41535], device='cuda:0') mtp accept=1 prop=41535 top1=41535 accp=0.997 next=pair draft=713 prop=713 pred gate=device Token # 641: 115.744ms; value: next_token_ids=tensor([713], device='cuda:0') mtp accept=1 prop=713 top1=713 accp=0.998 next=draft=15841 prop=15841 olap pair=110.4ms serial=195.7ms gain=85.3ms ratio=0.44 s0=4.1ms s1=191.6ms wait=0.1/48.1ms pred gate=device Token # 642: 3.797ms; value: next_token_ids=tensor([15841], device='cuda:0') mtp accept=1 prop=15841 top1=15841 accp=1.000 next=pair draft=121886 prop=121886 pred gate=device Token # 643: 114.953ms; value: next_token_ids=tensor([121886], device='cuda:0') mtp accept=1 prop=121886 top1=121886 accp=0.953 next=draft=223 prop=223 olap pair=109.6ms serial=194.7ms gain=85.1ms ratio=0.44 s0=4.4ms s1=190.3ms wait=0.1/47.1ms pred gate=device Token # 644: 3.809ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.896 next=pair draft=20 prop=20 pred gate=device Token # 645: 115.345ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=16 prop=16 olap pair=110.1ms serial=195.7ms gain=85.6ms ratio=0.44 s0=4.3ms s1=191.3ms wait=0.1/47.4ms pred gate=device Token # 646: 3.787ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=pair draft=27 prop=27 pred gate=device Token # 647: 115.041ms; value: next_token_ids=tensor([27], device='cuda:0') mtp accept=1 prop=27 top1=27 accp=1.000 next=draft=53520 prop=53520 olap pair=109.8ms serial=195.1ms gain=85.3ms ratio=0.44 s0=3.8ms s1=191.4ms wait=0.1/48.6ms pred gate=device Token # 648: 3.710ms; value: next_token_ids=tensor([53520], device='cuda:0') mtp accept=1 prop=53520 top1=53520 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 649: 115.711ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.995 next=draft=7671 prop=7671 olap pair=110.3ms serial=196.1ms gain=85.8ms ratio=0.44 s0=3.8ms s1=192.3ms wait=0.1/48.7ms pred gate=device Token # 650: 3.788ms; value: next_token_ids=tensor([7671], device='cuda:0') mtp accept=1 prop=7671 top1=7671 accp=1.000 next=pair draft=666 prop=666 pred gate=device Token # 651: 115.475ms; value: next_token_ids=tensor([666], device='cuda:0') mtp accept=1 prop=666 top1=666 accp=1.000 next=draft=320 prop=320 olap pair=110.1ms serial=195.7ms gain=85.6ms ratio=0.44 s0=3.8ms s1=192.0ms wait=0.1/48.7ms pred gate=device Token # 652: 3.792ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=pair draft=7085 prop=7085 pred gate=device Token # 653: 115.041ms; value: next_token_ids=tensor([7085], device='cuda:0') mtp accept=1 prop=7085 top1=7085 accp=0.776 next=draft=545 prop=545 olap pair=109.7ms serial=194.7ms gain=84.9ms ratio=0.44 s0=4.9ms s1=189.7ms wait=0.1/46.6ms pred gate=device Token # 654: 3.795ms; value: next_token_ids=tensor([545], device='cuda:0') mtp accept=1 prop=545 top1=545 accp=0.570 next=pair draft=1833 prop=1833 pred gate=device Token # 655: 115.958ms; value: next_token_ids=tensor([1833], device='cuda:0') mtp accept=1 prop=1833 top1=1833 accp=1.000 next=draft=2426 prop=2426 olap pair=110.5ms serial=196.2ms gain=85.7ms ratio=0.44 s0=4.9ms s1=191.3ms wait=0.2/46.5ms pred gate=device Token # 656: 3.730ms; value: next_token_ids=tensor([2426], device='cuda:0') mtp accept=1 prop=2426 top1=2426 accp=1.000 next=pair draft=428 prop=428 pred gate=device Token # 657: 115.537ms; value: next_token_ids=tensor([428], device='cuda:0') mtp accept=1 prop=428 top1=428 accp=1.000 next=draft=844 prop=844 olap pair=110.2ms serial=195.7ms gain=85.4ms ratio=0.44 s0=4.6ms s1=191.1ms wait=0.1/47.3ms pred gate=device Token # 658: 3.758ms; value: next_token_ids=tensor([24268], device='cuda:0') mtp accept=0 prop=844 top1=24268 accp=0.125 next=pair draft=430 prop=430 pred gate=device Token # 659: 115.789ms; value: next_token_ids=tensor([430], device='cuda:0') mtp accept=1 prop=430 top1=430 accp=0.997 next=draft=31618 prop=31618 olap pair=110.5ms serial=196.4ms gain=85.9ms ratio=0.44 s0=3.8ms s1=192.6ms wait=0.1/48.6ms pred gate=device Token # 660: 3.735ms; value: next_token_ids=tensor([31618], device='cuda:0') mtp accept=1 prop=31618 top1=31618 accp=0.955 next=pair draft=223 prop=223 pred gate=device Token # 661: 115.518ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.992 next=draft=2111 prop=2111 olap pair=110.2ms serial=195.7ms gain=85.5ms ratio=0.44 s0=4.0ms s1=191.7ms wait=0.1/48.2ms pred gate=device Token # 662: 3.788ms; value: next_token_ids=tensor([2111], device='cuda:0') mtp accept=1 prop=2111 top1=2111 accp=1.000 next=pair draft=8948 prop=8948 pred gate=device Token # 663: 115.665ms; value: next_token_ids=tensor([8948], device='cuda:0') mtp accept=1 prop=8948 top1=8948 accp=0.974 next=draft=81054 prop=223 olap pair=110.4ms serial=196.2ms gain=85.8ms ratio=0.44 s0=4.1ms s1=192.2ms wait=0.1/48.1ms pred gate=device Token # 664: 3.828ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.589 next=pair draft=26127 prop=26127 pred gate=device Token # 665: 115.130ms; value: next_token_ids=tensor([26127], device='cuda:0') mtp accept=1 prop=26127 top1=26127 accp=1.000 next=draft=303 prop=303 olap pair=109.7ms serial=195.0ms gain=85.2ms ratio=0.44 s0=3.9ms s1=191.1ms wait=0.1/48.6ms pred gate=device Token # 666: 3.791ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=38304 prop=38304 pred gate=device Token # 667: 115.178ms; value: next_token_ids=tensor([38304], device='cuda:0') mtp accept=1 prop=38304 top1=38304 accp=0.950 next=draft=389 prop=389 olap pair=109.9ms serial=195.3ms gain=85.4ms ratio=0.44 s0=4.0ms s1=191.3ms wait=0.1/48.4ms pred gate=device Token # 668: 3.781ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=0.998 next=pair draft=30326 prop=30326 pred gate=device Token # 669: 115.515ms; value: next_token_ids=tensor([30326], device='cuda:0') mtp accept=1 prop=30326 top1=30326 accp=1.000 next=draft=428 prop=428 olap pair=110.2ms serial=195.7ms gain=85.5ms ratio=0.44 s0=3.9ms s1=191.7ms wait=0.1/48.4ms pred gate=device Token # 670: 3.690ms; value: next_token_ids=tensor([428], device='cuda:0') mtp accept=1 prop=428 top1=428 accp=1.000 next=pair draft=26127 prop=26127 pred gate=device Token # 671: 115.237ms; value: next_token_ids=tensor([26127], device='cuda:0') mtp accept=1 prop=26127 top1=26127 accp=0.926 next=draft=5820 prop=422 olap pair=110.0ms serial=195.5ms gain=85.5ms ratio=0.44 s0=3.8ms s1=191.7ms wait=0.1/48.6ms pred gate=device Token # 672: 3.843ms; value: next_token_ids=tensor([422], device='cuda:0') mtp accept=1 prop=422 top1=5820 accp=0.713 next=pair draft=5783 prop=5783 pred gate=device Token # 673: 115.209ms; value: next_token_ids=tensor([5783], device='cuda:0') mtp accept=1 prop=5783 top1=5783 accp=1.000 next=draft=303 prop=303 olap pair=110.0ms serial=195.4ms gain=85.4ms ratio=0.44 s0=4.4ms s1=191.0ms wait=0.1/48.0ms pred gate=device Token # 674: 3.778ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=799 prop=799 pred gate=device Token # 675: 115.833ms; value: next_token_ids=tensor([799], device='cuda:0') mtp accept=1 prop=799 top1=799 accp=0.977 next=draft=2648 prop=2648 olap pair=110.6ms serial=196.5ms gain=85.9ms ratio=0.44 s0=3.8ms s1=192.7ms wait=0.1/48.6ms pred gate=device Token # 676: 3.779ms; value: next_token_ids=tensor([2648], device='cuda:0') mtp accept=1 prop=2648 top1=2648 accp=0.728 next=pair draft=985 prop=985 pred gate=device Token # 677: 115.470ms; value: next_token_ids=tensor([985], device='cuda:0') mtp accept=1 prop=985 top1=985 accp=0.998 next=draft=1860 prop=1860 olap pair=110.2ms serial=195.2ms gain=85.1ms ratio=0.44 s0=4.1ms s1=191.2ms wait=0.1/48.2ms pred gate=device Token # 678: 3.765ms; value: next_token_ids=tensor([1860], device='cuda:0') mtp accept=1 prop=1860 top1=1860 accp=0.677 next=pair draft=1108 prop=1108 pred gate=device Token # 679: 115.711ms; value: next_token_ids=tensor([1108], device='cuda:0') mtp accept=1 prop=1108 top1=1108 accp=0.941 next=draft=6753 prop=6753 olap pair=110.4ms serial=196.2ms gain=85.9ms ratio=0.44 s0=3.8ms s1=192.4ms wait=0.1/48.6ms pred gate=device Token # 680: 3.781ms; value: next_token_ids=tensor([6753], device='cuda:0') mtp accept=1 prop=6753 top1=6753 accp=1.000 next=pair draft=1651 prop=1651 pred gate=device Token # 681: 115.561ms; value: next_token_ids=tensor([1651], device='cuda:0') mtp accept=1 prop=1651 top1=1651 accp=1.000 next=draft=430 prop=430 olap pair=110.2ms serial=195.9ms gain=85.7ms ratio=0.44 s0=3.8ms s1=192.1ms wait=0.1/48.7ms pred gate=device Token # 682: 3.773ms; value: next_token_ids=tensor([430], device='cuda:0') mtp accept=1 prop=430 top1=430 accp=1.000 next=pair draft=39222 prop=39222 pred gate=device Token # 683: 115.691ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=0 prop=39222 top1=301 accp=0.149 next=draft=7157 prop=7157 olap pair=110.4ms serial=196.3ms gain=85.9ms ratio=0.44 s0=3.7ms s1=192.6ms wait=0.1/48.7ms pred gate=device Token # 684: 116.441ms; value: next_token_ids=tensor([7157], device='cuda:0') mtp accept=1 prop=7157 top1=7157 accp=0.974 next=draft=303 prop=303 olap pair=111.0ms serial=196.9ms gain=85.9ms ratio=0.44 s0=3.9ms s1=193.0ms wait=0.1/48.5ms pred gate=device Token # 685: 3.725ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=18317 prop=18317 pred gate=device Token # 686: 116.045ms; value: next_token_ids=tensor([18317], device='cuda:0') mtp accept=1 prop=18317 top1=18317 accp=0.824 next=draft=545 prop=545 olap pair=109.9ms serial=193.5ms gain=83.6ms ratio=0.43 s0=7.7ms s1=185.8ms wait=0.2/44.2ms pred gate=device Token # 687: 3.890ms; value: next_token_ids=tensor([545], device='cuda:0') mtp accept=1 prop=545 top1=545 accp=0.864 next=pair draft=5293 prop=5293 pred gate=device Token # 688: 115.238ms; value: next_token_ids=tensor([5293], device='cuda:0') mtp accept=1 prop=5293 top1=5293 accp=0.993 next=draft=33792 prop=33792 olap pair=110.0ms serial=194.9ms gain=84.9ms ratio=0.44 s0=5.0ms s1=189.9ms wait=0.1/47.0ms pred gate=device Token # 689: 3.796ms; value: next_token_ids=tensor([33792], device='cuda:0') mtp accept=1 prop=33792 top1=547 accp=0.260 next=pair draft=47 prop=47 pred gate=device Token # 690: 115.075ms; value: next_token_ids=tensor([47], device='cuda:0') mtp accept=1 prop=47 top1=47 accp=1.000 next=draft=223 prop=223 olap pair=109.7ms serial=194.4ms gain=84.7ms ratio=0.44 s0=4.1ms s1=190.4ms wait=0.1/48.1ms pred gate=device Token # 691: 3.765ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=5130 prop=5130 pred gate=device Token # 692: 116.102ms; value: next_token_ids=tensor([5130], device='cuda:0') mtp accept=1 prop=5130 top1=5130 accp=0.983 next=draft=11760 prop=11760 olap pair=110.8ms serial=196.0ms gain=85.2ms ratio=0.43 s0=3.8ms s1=192.2ms wait=0.1/48.7ms pred gate=device Token # 693: 3.822ms; value: next_token_ids=tensor([11760], device='cuda:0') mtp accept=1 prop=11760 top1=11760 accp=0.830 next=pair draft=1629 prop=1629 pred gate=device Token # 694: 115.278ms; value: next_token_ids=tensor([1629], device='cuda:0') mtp accept=1 prop=1629 top1=1629 accp=1.000 next=draft=478 prop=478 olap pair=110.0ms serial=195.5ms gain=85.5ms ratio=0.44 s0=4.0ms s1=191.4ms wait=0.1/48.1ms pred gate=device Token # 695: 3.740ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=pair draft=94268 prop=94268 pred gate=device Token # 696: 115.201ms; value: next_token_ids=tensor([94268], device='cuda:0') mtp accept=1 prop=94268 top1=94268 accp=1.000 next=draft=24268 prop=24268 olap pair=110.0ms serial=195.4ms gain=85.4ms ratio=0.44 s0=5.2ms s1=190.1ms wait=0.2/46.6ms pred gate=device Token # 697: 3.745ms; value: next_token_ids=tensor([428], device='cuda:0') mtp accept=0 prop=24268 top1=428 accp=0.001 next=pair draft=24268 prop=24268 pred gate=device Token # 698: 115.372ms; value: next_token_ids=tensor([99026], device='cuda:0') mtp accept=0 prop=24268 top1=99026 accp=0.008 next=draft=430 prop=430 olap pair=110.0ms serial=195.0ms gain=85.1ms ratio=0.44 s0=5.3ms s1=189.7ms wait=0.1/46.5ms pred gate=device Token # 699: 115.777ms; value: next_token_ids=tensor([430], device='cuda:0') mtp accept=1 prop=430 top1=430 accp=1.000 next=draft=301 prop=301 olap pair=110.5ms serial=194.8ms gain=84.3ms ratio=0.43 s0=4.3ms s1=190.5ms wait=0.1/47.9ms pred gate=device Token # 700: 3.758ms; value: next_token_ids=tensor([1255], device='cuda:0') mtp accept=0 prop=301 top1=1255 accp=0.116 next=pair draft=18381 prop=18381 pred gate=device Token # 701: 115.297ms; value: next_token_ids=tensor([18381], device='cuda:0') mtp accept=1 prop=18381 top1=18381 accp=1.000 next=draft=621 prop=621 olap pair=110.0ms serial=195.6ms gain=85.6ms ratio=0.44 s0=3.9ms s1=191.8ms wait=0.1/48.4ms pred gate=device Token # 702: 3.762ms; value: next_token_ids=tensor([621], device='cuda:0') mtp accept=1 prop=621 top1=621 accp=1.000 next=pair draft=89267 prop=89267 pred gate=device Token # 703: 115.246ms; value: next_token_ids=tensor([89267], device='cuda:0') mtp accept=1 prop=89267 top1=62894 accp=0.286 next=draft=303 prop=303 olap pair=110.0ms serial=195.1ms gain=85.0ms ratio=0.44 s0=6.2ms s1=188.9ms wait=0.2/45.8ms pred gate=device Token # 704: 3.783ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.903 next=pair draft=39822 prop=39822 pred gate=device Token # 705: 115.340ms; value: next_token_ids=tensor([39822], device='cuda:0') mtp accept=1 prop=39822 top1=39822 accp=0.881 next=draft=40101 prop=40101 olap pair=110.0ms serial=195.6ms gain=85.6ms ratio=0.44 s0=4.0ms s1=191.6ms wait=0.1/48.2ms pred gate=device Token # 706: 3.724ms; value: next_token_ids=tensor([40101], device='cuda:0') mtp accept=1 prop=40101 top1=40101 accp=0.986 next=pair draft=5667 prop=5667 pred gate=device Token # 707: 115.747ms; value: next_token_ids=tensor([4715], device='cuda:0') mtp accept=0 prop=5667 top1=4715 accp=0.026 next=draft=5189 prop=5189 olap pair=110.5ms serial=195.3ms gain=84.8ms ratio=0.43 s0=4.2ms s1=191.1ms wait=0.1/48.1ms pred gate=device Token # 708: 115.188ms; value: next_token_ids=tensor([7524], device='cuda:0') mtp accept=0 prop=5189 top1=5189 accp=0.557 next=draft=12 prop=12 olap pair=109.9ms serial=195.4ms gain=85.5ms ratio=0.44 s0=3.9ms s1=191.5ms wait=0.1/48.4ms pred gate=device Token # 709: 116.808ms; value: next_token_ids=tensor([12], device='cuda:0') mtp accept=1 prop=12 top1=12 accp=1.000 next=draft=1237 prop=1237 olap pair=110.6ms serial=195.2ms gain=84.6ms ratio=0.43 s0=4.4ms s1=190.7ms wait=0.1/48.0ms pred gate=device Token # 710: 4.288ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=14529 prop=14529 pred gate=device Token # 711: 115.159ms; value: next_token_ids=tensor([14529], device='cuda:0') mtp accept=1 prop=14529 top1=14529 accp=0.772 next=draft=768 prop=768 olap pair=109.9ms serial=195.3ms gain=85.4ms ratio=0.44 s0=3.8ms s1=191.4ms wait=0.1/48.5ms pred gate=device