[2026-04-08 08:45:55.683233 INFO duck_llm] 这是一条信息日志 [2026-04-08 08:45:55.683264 WARN duck_llm] 这是一条警告日志 [2026-04-08 08:45:55.683267 ERROR duck_llm] 这是一条错误日志 [2026-04-08 08:45:55.683474 INFO utils] Selected DPDK lcores: master=0, workers=[2, 4, 6, 8], all_performance_core_representatives=[0, 2, 4, 6, 8, 10, 12, 14] EAL: Detected CPU lcores: 32 EAL: Detected NUMA nodes: 1 EAL: Detected shared linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: VFIO support initialized EAL: Using IOMMU type 1 (Type 1) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) ICE_INIT: ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package (single VLAN mode) [2026-04-08 08:45:57.763980 INFO dpdk_workers] DPDK initialized successfully. Found 4 ports. [2026-04-08 08:45:57.763996 INFO dpdk_workers] Port 0 device name: 0000:01:00.0 [2026-04-08 08:45:57.763998 INFO dpdk_workers] Port 0 IP address: 10.21.1.1 [2026-04-08 08:45:57.764000 INFO dpdk_workers] Port 0 Broadcast address: 10.21.1.255 [2026-04-08 08:45:57.764002 INFO dpdk_workers] Port 1 device name: 0000:01:00.1 [2026-04-08 08:45:57.764004 INFO dpdk_workers] Port 1 IP address: 10.21.2.1 [2026-04-08 08:45:57.764005 INFO dpdk_workers] Port 1 Broadcast address: 10.21.2.255 [2026-04-08 08:45:57.764007 INFO dpdk_workers] Port 2 device name: 0000:01:00.2 [2026-04-08 08:45:57.764008 INFO dpdk_workers] Port 2 IP address: 10.21.3.1 [2026-04-08 08:45:57.764009 INFO dpdk_workers] Port 2 Broadcast address: 10.21.3.255 [2026-04-08 08:45:57.764016 INFO dpdk_workers] Port 3 device name: 0000:01:00.3 [2026-04-08 08:45:57.764017 INFO dpdk_workers] Port 3 IP address: 10.21.4.1 [2026-04-08 08:45:57.764018 INFO dpdk_workers] Port 3 Broadcast address: 10.21.4.255 [2026-04-08 08:45:57.764020 INFO dpdk_workers] Available netifs list: [(10.21.1.255, 0, 10.21.1.1), (10.21.2.255, 1, 10.21.2.1), (10.21.3.255, 2, 10.21.3.1), (10.21.4.255, 3, 10.21.4.1)] [2026-04-08 08:45:57.764029 INFO dpdk_workers] Starting worker #0: (bcast_ip: 10.21.1.255, port_id: 0, lcore_id: 2, host_ip: 10.21.1.1) [2026-04-08 08:45:57.764065 INFO dpdk_workers] Initializing worker port 0 on lcore 2... [2026-04-08 08:45:57.765781 INFO dpdk_workers] Starting worker #1: (bcast_ip: 10.21.2.255, port_id: 1, lcore_id: 4, host_ip: 10.21.2.1) [2026-04-08 08:45:57.765805 INFO dpdk_workers] Starting worker #2: (bcast_ip: 10.21.3.255, port_id: 2, lcore_id: 6, host_ip: 10.21.3.1) [2026-04-08 08:45:57.765816 INFO dpdk_workers] Starting worker #3: (bcast_ip: 10.21.4.255, port_id: 3, lcore_id: 8, host_ip: 10.21.4.1) [2026-04-08 08:45:57.765846 INFO dpdk_workers] Initializing worker port 1 on lcore 4... [2026-04-08 08:45:57.767804 INFO dpdk_workers] Initializing worker port 2 on lcore 6... [2026-04-08 08:45:57.769796 INFO dpdk_workers] Initializing worker port 3 on lcore 8... ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 0). ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 1). ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 2). ICE_DRIVER: ice_set_rx_function(): Using Vector AVX2 (port 3). [2026-04-08 08:46:01.314754 INFO dpdk_workers] Worker port 1 initialized successfully. [2026-04-08 08:46:02.135199 INFO dpdk_workers] Worker port 0 initialized successfully. [2026-04-08 08:46:02.140170 INFO dpdk_workers] Worker port 2 initialized successfully. [2026-04-08 08:46:02.141078 INFO dpdk_workers] Worker port 3 initialized successfully. [2026-04-08 08:46:02.141097 INFO dpdk_workers] Workers initialized successfully. 4 workers running. [2026-04-08 08:46:02.141336 INFO utils] Binding master thread to cores (excluding workers): [0, 1, 3, 5, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31] [2026-04-08 08:46:02.141345 INFO utils] set_thread_affinity(tid 1368756, cores [0, 1, 3, 5, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]): 0 [2026-04-08 08:46:02.142293 INFO dpdk_workers] Run command Ping all time: send 1.1 us, recv 939.3 us [2026-04-08 08:46:02.192350 INFO dpdk_workers] Run command Ping all time: send 0.3 us, recv 0.4 us [2026-04-08 08:46:02.242406 INFO dpdk_workers] Run command Ping all time: send 0.3 us, recv 0.4 us [2026-04-08 08:46:02.292462 INFO dpdk_workers] Run command Ping all time: send 0.2 us, recv 0.4 us [2026-04-08 08:46:02.342519 INFO dpdk_workers] Run command Ping all time: send 0.4 us, recv 0.4 us [2026-04-08 08:46:02.392574 INFO dpdk_workers] Run command Ping all time: send 0.2 us, recv 0.5 us [2026-04-08 08:46:02.442630 INFO dpdk_workers] Run command Ping all time: send 0.2 us, recv 0.5 us [2026-04-08 08:46:02.492695 INFO dpdk_workers] Run command Ping all time: send 0.9 us, recv 1.3 us [2026-04-08 08:46:02.542773 INFO dpdk_workers] Run command Ping all time: send 1.0 us, recv 1.2 us [2026-04-08 08:46:02.592858 INFO dpdk_workers] Run command Ping all time: send 0.4 us, recv 0.4 us [2026-04-08 08:46:02.642932 INFO dpdk_workers] Found 32 ducks in duck-ips-multi-netifs.txt [2026-04-08 08:46:02.642935 INFO dpdk_workers] Duck #0: 10.21.1.101 (bcast_ip: 10.21.1.255) [2026-04-08 08:46:02.642937 INFO dpdk_workers] Duck #1: 10.21.1.102 (bcast_ip: 10.21.1.255) [2026-04-08 08:46:02.642939 INFO dpdk_workers] Duck #2: 10.21.1.103 (bcast_ip: 10.21.1.255) [2026-04-08 08:46:02.642941 INFO dpdk_workers] Duck #3: 10.21.1.104 (bcast_ip: 10.21.1.255) [2026-04-08 08:46:02.642943 INFO dpdk_workers] Duck #4: 10.21.1.105 (bcast_ip: 10.21.1.255) [2026-04-08 08:46:02.642945 INFO dpdk_workers] Duck #5: 10.21.1.106 (bcast_ip: 10.21.1.255) [2026-04-08 08:46:02.642947 INFO dpdk_workers] Duck #6: 10.21.1.107 (bcast_ip: 10.21.1.255) [2026-04-08 08:46:02.642949 INFO dpdk_workers] Duck #7: 10.21.1.108 (bcast_ip: 10.21.1.255) [2026-04-08 08:46:02.642950 INFO dpdk_workers] Duck #8: 10.21.2.101 (bcast_ip: 10.21.2.255) [2026-04-08 08:46:02.642952 INFO dpdk_workers] Duck #9: 10.21.2.102 (bcast_ip: 10.21.2.255) [2026-04-08 08:46:02.642954 INFO dpdk_workers] Duck #10: 10.21.2.103 (bcast_ip: 10.21.2.255) [2026-04-08 08:46:02.642956 INFO dpdk_workers] Duck #11: 10.21.2.104 (bcast_ip: 10.21.2.255) [2026-04-08 08:46:02.642958 INFO dpdk_workers] Duck #12: 10.21.2.105 (bcast_ip: 10.21.2.255) [2026-04-08 08:46:02.642960 INFO dpdk_workers] Duck #13: 10.21.2.106 (bcast_ip: 10.21.2.255) [2026-04-08 08:46:02.642962 INFO dpdk_workers] Duck #14: 10.21.2.107 (bcast_ip: 10.21.2.255) [2026-04-08 08:46:02.642964 INFO dpdk_workers] Duck #15: 10.21.2.108 (bcast_ip: 10.21.2.255) [2026-04-08 08:46:02.642967 INFO dpdk_workers] Duck #16: 10.21.3.101 (bcast_ip: 10.21.3.255) [2026-04-08 08:46:02.642968 INFO dpdk_workers] Duck #17: 10.21.3.102 (bcast_ip: 10.21.3.255) [2026-04-08 08:46:02.642970 INFO dpdk_workers] Duck #18: 10.21.3.103 (bcast_ip: 10.21.3.255) [2026-04-08 08:46:02.642971 INFO dpdk_workers] Duck #19: 10.21.3.104 (bcast_ip: 10.21.3.255) [2026-04-08 08:46:02.642973 INFO dpdk_workers] Duck #20: 10.21.3.105 (bcast_ip: 10.21.3.255) [2026-04-08 08:46:02.642974 INFO dpdk_workers] Duck #21: 10.21.3.106 (bcast_ip: 10.21.3.255) [2026-04-08 08:46:02.642976 INFO dpdk_workers] Duck #22: 10.21.3.107 (bcast_ip: 10.21.3.255) [2026-04-08 08:46:02.642978 INFO dpdk_workers] Duck #23: 10.21.3.108 (bcast_ip: 10.21.3.255) [2026-04-08 08:46:02.642979 INFO dpdk_workers] Duck #24: 10.21.4.101 (bcast_ip: 10.21.4.255) [2026-04-08 08:46:02.642981 INFO dpdk_workers] Duck #25: 10.21.4.102 (bcast_ip: 10.21.4.255) [2026-04-08 08:46:02.642982 INFO dpdk_workers] Duck #26: 10.21.4.103 (bcast_ip: 10.21.4.255) [2026-04-08 08:46:02.642983 INFO dpdk_workers] Duck #27: 10.21.4.104 (bcast_ip: 10.21.4.255) [2026-04-08 08:46:02.642985 INFO dpdk_workers] Duck #28: 10.21.4.105 (bcast_ip: 10.21.4.255) [2026-04-08 08:46:02.642986 INFO dpdk_workers] Duck #29: 10.21.4.106 (bcast_ip: 10.21.4.255) [2026-04-08 08:46:02.642988 INFO dpdk_workers] Duck #30: 10.21.4.107 (bcast_ip: 10.21.4.255) [2026-04-08 08:46:02.642992 INFO dpdk_workers] Duck #31: 10.21.4.108 (bcast_ip: 10.21.4.255) [2026-04-08 08:46:02.744922 INFO dpdk_workers] [Worker 0]: 10.21.1.101 [2026-04-08 08:46:02.744938 INFO dpdk_workers] [Worker 0]: 10.21.1.102 [2026-04-08 08:46:02.744949 INFO dpdk_workers] [Worker 0]: 10.21.1.103 [2026-04-08 08:46:02.744961 INFO dpdk_workers] [Worker 0]: 10.21.1.104 [2026-04-08 08:46:02.744972 INFO dpdk_workers] [Worker 0]: 10.21.1.105 [2026-04-08 08:46:02.744974 INFO dpdk_workers] [Worker 0]: 10.21.1.106 [2026-04-08 08:46:02.744975 INFO dpdk_workers] [Worker 0]: 10.21.1.107 [2026-04-08 08:46:02.744978 INFO dpdk_workers] [Worker 0]: 10.21.1.108 [2026-04-08 08:46:02.744981 INFO dpdk_workers] [Worker 1]: 10.21.2.101 [2026-04-08 08:46:02.744982 INFO dpdk_workers] [Worker 1]: 10.21.2.102 [2026-04-08 08:46:02.744984 INFO dpdk_workers] [Worker 1]: 10.21.2.103 [2026-04-08 08:46:02.744986 INFO dpdk_workers] [Worker 1]: 10.21.2.104 [2026-04-08 08:46:02.744988 INFO dpdk_workers] [Worker 1]: 10.21.2.105 [2026-04-08 08:46:02.744990 INFO dpdk_workers] [Worker 1]: 10.21.2.106 [2026-04-08 08:46:02.744991 INFO dpdk_workers] [Worker 1]: 10.21.2.107 [2026-04-08 08:46:02.744993 INFO dpdk_workers] [Worker 1]: 10.21.2.108 [2026-04-08 08:46:02.745506 INFO dpdk_workers] [Worker 2]: 10.21.3.101 [2026-04-08 08:46:02.745509 INFO dpdk_workers] [Worker 2]: 10.21.3.102 [2026-04-08 08:46:02.745510 INFO dpdk_workers] [Worker 2]: 10.21.3.103 [2026-04-08 08:46:02.745512 INFO dpdk_workers] [Worker 2]: 10.21.3.104 [2026-04-08 08:46:02.745513 INFO dpdk_workers] [Worker 2]: 10.21.3.105 [2026-04-08 08:46:02.745514 INFO dpdk_workers] [Worker 2]: 10.21.3.106 [2026-04-08 08:46:02.745516 INFO dpdk_workers] [Worker 2]: 10.21.3.107 [2026-04-08 08:46:02.745517 INFO dpdk_workers] [Worker 2]: 10.21.3.108 [2026-04-08 08:46:02.745520 INFO dpdk_workers] [Worker 3]: 10.21.4.101 [2026-04-08 08:46:02.745521 INFO dpdk_workers] [Worker 3]: 10.21.4.102 [2026-04-08 08:46:02.745523 INFO dpdk_workers] [Worker 3]: 10.21.4.103 [2026-04-08 08:46:02.745524 INFO dpdk_workers] [Worker 3]: 10.21.4.104 [2026-04-08 08:46:02.745525 INFO dpdk_workers] [Worker 3]: 10.21.4.105 [2026-04-08 08:46:02.745527 INFO dpdk_workers] [Worker 3]: 10.21.4.106 [2026-04-08 08:46:02.745528 INFO dpdk_workers] [Worker 3]: 10.21.4.107 [2026-04-08 08:46:02.745530 INFO dpdk_workers] [Worker 3]: 10.21.4.108 [2026-04-08 08:46:02.745533 INFO dpdk_workers] init_ducks done [2026-04-08 08:46:02.748061 INFO dpdk_ducks] Initialized 4 DPDK duck workers [2026-04-08 08:46:02.748064 INFO dpdk_ducks] DPDK duck worker 0: DpdkDuckWorker { worker_idx: 0, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (0, 8) } [2026-04-08 08:46:02.748068 INFO dpdk_ducks] DPDK duck worker 1: DpdkDuckWorker { worker_idx: 1, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (8, 16) } [2026-04-08 08:46:02.748071 INFO dpdk_ducks] DPDK duck worker 2: DpdkDuckWorker { worker_idx: 2, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (16, 24) } [2026-04-08 08:46:02.748073 INFO dpdk_ducks] DPDK duck worker 3: DpdkDuckWorker { worker_idx: 3, ducks: [DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }, DpdkDuck { buffer_size: 32212254720 }], all_ranks: [0, 1, 2, 3, 4, 5, 6, 7], tp_rank_range: (24, 32) } [2026-04-08 08:46:02.748078 INFO buffer_manager] Initializing buffer manager [2026-04-08 08:46:02.748080 INFO buffer_manager] Buffer manager initialized: ELF BufferAllocator { begin: 0, end: 10485760, current: 0 }, input BufferAllocator { begin: 10485760, end: 104857600, current: 10485760 }, weights BufferAllocator { begin: 104923136, end: 32212254720, current: 104923136 } [2026-04-08 08:46:02.748083 INFO fp8_dpdk_common] fp9 persistent judge enabled by default; set DUCK_FP9_PERSISTENT_JUDGE=0 to disable [2026-04-08 08:46:02.748501 INFO buffer_manager] Added kernel fp9_kernels at (0, 91664) [2026-04-08 08:46:02.748535 INFO fp8_dpdk_common] fp9 persistent judge: opened 32 sessions [2026-04-08 08:46:02.748538 INFO fp8_dpdk_common] fp9 persistent judge: force-opened 32 fresh sessions for new init [2026-04-08 08:46:02.748540 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init(tp_size=32) [2026-04-08 08:46:02.748542 INFO fp8_moe_dpdk] fp8_moe_dpdk: init(tp_size=32) [2026-04-08 08:46:03.134644 INFO weight_cache] weight_cache: header hit tp_size=32 num_slots=62 finished_slots=62 [2026-04-08 08:46:03.461701 INFO buffer_manager] Allocated weights buffer at (104923136, 0) [2026-04-08 08:46:03.461729 INFO buffer_manager] Allocated weights buffer at (104923136, 4128768) [2026-04-08 08:46:03.461731 INFO buffer_manager] Allocated weights buffer at (109051904, 516096) [2026-04-08 08:46:03.461733 INFO buffer_manager] Allocated weights buffer at (109568000, 2016) [2026-04-08 08:46:03.461735 INFO buffer_manager] Allocated weights buffer at (109572096, 4128768) [2026-04-08 08:46:03.461736 INFO buffer_manager] Allocated weights buffer at (113700864, 516096) [2026-04-08 08:46:03.461738 INFO buffer_manager] Allocated weights buffer at (114216960, 2016) [2026-04-08 08:46:03.461739 INFO buffer_manager] Allocated weights buffer at (114221056, 4128768) [2026-04-08 08:46:03.461741 INFO buffer_manager] Allocated weights buffer at (118349824, 516096) [2026-04-08 08:46:03.461742 INFO buffer_manager] Allocated weights buffer at (118865920, 2016) [2026-04-08 08:46:03.461744 INFO buffer_manager] Allocated weights buffer at (118870016, 0) [2026-04-08 08:46:03.461745 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init_layer_cached(layer_idx=0, cache_slot=0) planned desc only [2026-04-08 08:46:03.554870 INFO buffer_manager] Allocated weights buffer at (118870016, 0) [2026-04-08 08:46:03.554892 INFO buffer_manager] Allocated weights buffer at (118870016, 4128768) [2026-04-08 08:46:03.554894 INFO buffer_manager] Allocated weights buffer at (122998784, 516096) [2026-04-08 08:46:03.554896 INFO buffer_manager] Allocated weights buffer at (123514880, 2016) [2026-04-08 08:46:03.554898 INFO buffer_manager] Allocated weights buffer at (123518976, 4128768) [2026-04-08 08:46:03.554899 INFO buffer_manager] Allocated weights buffer at (127647744, 516096) [2026-04-08 08:46:03.554900 INFO buffer_manager] Allocated weights buffer at (128163840, 2016) [2026-04-08 08:46:03.554902 INFO buffer_manager] Allocated weights buffer at (128167936, 4128768) [2026-04-08 08:46:03.554903 INFO buffer_manager] Allocated weights buffer at (132296704, 516096) [2026-04-08 08:46:03.554905 INFO buffer_manager] Allocated weights buffer at (132812800, 2016) [2026-04-08 08:46:03.554906 INFO buffer_manager] Allocated weights buffer at (132816896, 0) [2026-04-08 08:46:03.554908 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init_layer_cached(layer_idx=1, cache_slot=1) planned desc only [2026-04-08 08:46:03.641515 INFO buffer_manager] Allocated weights buffer at (132816896, 0) [2026-04-08 08:46:03.641537 INFO buffer_manager] Allocated weights buffer at (132816896, 4128768) [2026-04-08 08:46:03.641540 INFO buffer_manager] Allocated weights buffer at (136945664, 516096) [2026-04-08 08:46:03.641541 INFO buffer_manager] Allocated weights buffer at (137461760, 2016) [2026-04-08 08:46:03.641553 INFO buffer_manager] Allocated weights buffer at (137465856, 4128768) [2026-04-08 08:46:03.641554 INFO buffer_manager] Allocated weights buffer at (141594624, 516096) [2026-04-08 08:46:03.641556 INFO buffer_manager] Allocated weights buffer at (142110720, 2016) [2026-04-08 08:46:03.641557 INFO buffer_manager] Allocated weights buffer at (142114816, 4128768) [2026-04-08 08:46:03.641559 INFO buffer_manager] Allocated weights buffer at (146243584, 516096) [2026-04-08 08:46:03.641561 INFO buffer_manager] Allocated weights buffer at (146759680, 2016) [2026-04-08 08:46:03.641562 INFO buffer_manager] Allocated weights buffer at (146763776, 0) [2026-04-08 08:46:03.641564 INFO fp8_mlp_dpdk] fp8_mlp_dpdk: init_layer_cached(layer_idx=2, cache_slot=2) planned desc only [2026-04-08 08:46:03.670117 INFO buffer_manager] Allocated weights buffer at (146763776, 0) [2026-04-08 08:46:03.670132 INFO buffer_manager] Allocated weights buffer at (146763776, 132120576) [2026-04-08 08:46:03.670134 INFO buffer_manager] Allocated weights buffer at (278884352, 57344) [2026-04-08 08:46:03.670136 INFO buffer_manager] Allocated weights buffer at (278941696, 132120576) [2026-04-08 08:46:03.670137 INFO buffer_manager] Allocated weights buffer at (411062272, 57344) [2026-04-08 08:46:03.670139 INFO buffer_manager] Allocated weights buffer at (411119616, 132120576) [2026-04-08 08:46:03.670140 INFO buffer_manager] Allocated weights buffer at (543240192, 57344) [2026-04-08 08:46:03.670141 INFO buffer_manager] Allocated weights buffer at (543297536, 0) [2026-04-08 08:46:03.670144 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=3, cache_slot=3) planned desc only [2026-04-08 08:46:03.706604 INFO buffer_manager] Allocated weights buffer at (543297536, 0) [2026-04-08 08:46:03.706618 INFO buffer_manager] Allocated weights buffer at (543297536, 132120576) [2026-04-08 08:46:03.706621 INFO buffer_manager] Allocated weights buffer at (675418112, 57344) [2026-04-08 08:46:03.706622 INFO buffer_manager] Allocated weights buffer at (675475456, 132120576) [2026-04-08 08:46:03.706624 INFO buffer_manager] Allocated weights buffer at (807596032, 57344) [2026-04-08 08:46:03.706625 INFO buffer_manager] Allocated weights buffer at (807653376, 132120576) [2026-04-08 08:46:03.706627 INFO buffer_manager] Allocated weights buffer at (939773952, 57344) [2026-04-08 08:46:03.706628 INFO buffer_manager] Allocated weights buffer at (939831296, 0) [2026-04-08 08:46:03.706630 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=4, cache_slot=4) planned desc only [2026-04-08 08:46:03.743106 INFO buffer_manager] Allocated weights buffer at (939831296, 0) [2026-04-08 08:46:03.743120 INFO buffer_manager] Allocated weights buffer at (939831296, 132120576) [2026-04-08 08:46:03.743122 INFO buffer_manager] Allocated weights buffer at (1071951872, 57344) [2026-04-08 08:46:03.743124 INFO buffer_manager] Allocated weights buffer at (1072009216, 132120576) [2026-04-08 08:46:03.743125 INFO buffer_manager] Allocated weights buffer at (1204129792, 57344) [2026-04-08 08:46:03.743127 INFO buffer_manager] Allocated weights buffer at (1204187136, 132120576) [2026-04-08 08:46:03.743128 INFO buffer_manager] Allocated weights buffer at (1336307712, 57344) [2026-04-08 08:46:03.743130 INFO buffer_manager] Allocated weights buffer at (1336365056, 0) [2026-04-08 08:46:03.743131 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=5, cache_slot=5) planned desc only [2026-04-08 08:46:03.779438 INFO buffer_manager] Allocated weights buffer at (1336365056, 0) [2026-04-08 08:46:03.779451 INFO buffer_manager] Allocated weights buffer at (1336365056, 132120576) [2026-04-08 08:46:03.779454 INFO buffer_manager] Allocated weights buffer at (1468485632, 57344) [2026-04-08 08:46:03.779455 INFO buffer_manager] Allocated weights buffer at (1468542976, 132120576) [2026-04-08 08:46:03.779457 INFO buffer_manager] Allocated weights buffer at (1600663552, 57344) [2026-04-08 08:46:03.779458 INFO buffer_manager] Allocated weights buffer at (1600720896, 132120576) [2026-04-08 08:46:03.779464 INFO buffer_manager] Allocated weights buffer at (1732841472, 57344) [2026-04-08 08:46:03.779465 INFO buffer_manager] Allocated weights buffer at (1732898816, 0) [2026-04-08 08:46:03.779467 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=6, cache_slot=6) planned desc only [2026-04-08 08:46:03.815774 INFO buffer_manager] Allocated weights buffer at (1732898816, 0) [2026-04-08 08:46:03.815791 INFO buffer_manager] Allocated weights buffer at (1732898816, 132120576) [2026-04-08 08:46:03.815793 INFO buffer_manager] Allocated weights buffer at (1865019392, 57344) [2026-04-08 08:46:03.815795 INFO buffer_manager] Allocated weights buffer at (1865076736, 132120576) [2026-04-08 08:46:03.815796 INFO buffer_manager] Allocated weights buffer at (1997197312, 57344) [2026-04-08 08:46:03.815798 INFO buffer_manager] Allocated weights buffer at (1997254656, 132120576) [2026-04-08 08:46:03.815799 INFO buffer_manager] Allocated weights buffer at (2129375232, 57344) [2026-04-08 08:46:03.815800 INFO buffer_manager] Allocated weights buffer at (2129432576, 0) [2026-04-08 08:46:03.815802 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=7, cache_slot=7) planned desc only [2026-04-08 08:46:03.852275 INFO buffer_manager] Allocated weights buffer at (2129432576, 0) [2026-04-08 08:46:03.852290 INFO buffer_manager] Allocated weights buffer at (2129432576, 132120576) [2026-04-08 08:46:03.852292 INFO buffer_manager] Allocated weights buffer at (2261553152, 57344) [2026-04-08 08:46:03.852294 INFO buffer_manager] Allocated weights buffer at (2261610496, 132120576) [2026-04-08 08:46:03.852295 INFO buffer_manager] Allocated weights buffer at (2393731072, 57344) [2026-04-08 08:46:03.852297 INFO buffer_manager] Allocated weights buffer at (2393788416, 132120576) [2026-04-08 08:46:03.852299 INFO buffer_manager] Allocated weights buffer at (2525908992, 57344) [2026-04-08 08:46:03.852301 INFO buffer_manager] Allocated weights buffer at (2525966336, 0) [2026-04-08 08:46:03.852303 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=8, cache_slot=8) planned desc only [2026-04-08 08:46:03.888609 INFO buffer_manager] Allocated weights buffer at (2525966336, 0) [2026-04-08 08:46:03.888623 INFO buffer_manager] Allocated weights buffer at (2525966336, 132120576) [2026-04-08 08:46:03.888625 INFO buffer_manager] Allocated weights buffer at (2658086912, 57344) [2026-04-08 08:46:03.888626 INFO buffer_manager] Allocated weights buffer at (2658144256, 132120576) [2026-04-08 08:46:03.888628 INFO buffer_manager] Allocated weights buffer at (2790264832, 57344) [2026-04-08 08:46:03.888629 INFO buffer_manager] Allocated weights buffer at (2790322176, 132120576) [2026-04-08 08:46:03.888631 INFO buffer_manager] Allocated weights buffer at (2922442752, 57344) [2026-04-08 08:46:03.888632 INFO buffer_manager] Allocated weights buffer at (2922500096, 0) [2026-04-08 08:46:03.888634 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=9, cache_slot=9) planned desc only [2026-04-08 08:46:03.924980 INFO buffer_manager] Allocated weights buffer at (2922500096, 0) [2026-04-08 08:46:03.924995 INFO buffer_manager] Allocated weights buffer at (2922500096, 132120576) [2026-04-08 08:46:03.924998 INFO buffer_manager] Allocated weights buffer at (3054620672, 57344) [2026-04-08 08:46:03.924999 INFO buffer_manager] Allocated weights buffer at (3054678016, 132120576) [2026-04-08 08:46:03.925001 INFO buffer_manager] Allocated weights buffer at (3186798592, 57344) [2026-04-08 08:46:03.925002 INFO buffer_manager] Allocated weights buffer at (3186855936, 132120576) [2026-04-08 08:46:03.925004 INFO buffer_manager] Allocated weights buffer at (3318976512, 57344) [2026-04-08 08:46:03.925005 INFO buffer_manager] Allocated weights buffer at (3319033856, 0) [2026-04-08 08:46:03.925007 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=10, cache_slot=10) planned desc only [2026-04-08 08:46:03.961345 INFO buffer_manager] Allocated weights buffer at (3319033856, 0) [2026-04-08 08:46:03.961358 INFO buffer_manager] Allocated weights buffer at (3319033856, 132120576) [2026-04-08 08:46:03.961364 INFO buffer_manager] Allocated weights buffer at (3451154432, 57344) [2026-04-08 08:46:03.961365 INFO buffer_manager] Allocated weights buffer at (3451211776, 132120576) [2026-04-08 08:46:03.961367 INFO buffer_manager] Allocated weights buffer at (3583332352, 57344) [2026-04-08 08:46:03.961368 INFO buffer_manager] Allocated weights buffer at (3583389696, 132120576) [2026-04-08 08:46:03.961370 INFO buffer_manager] Allocated weights buffer at (3715510272, 57344) [2026-04-08 08:46:03.961371 INFO buffer_manager] Allocated weights buffer at (3715567616, 0) [2026-04-08 08:46:03.961373 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=11, cache_slot=11) planned desc only [2026-04-08 08:46:03.997610 INFO buffer_manager] Allocated weights buffer at (3715567616, 0) [2026-04-08 08:46:03.997623 INFO buffer_manager] Allocated weights buffer at (3715567616, 132120576) [2026-04-08 08:46:03.997625 INFO buffer_manager] Allocated weights buffer at (3847688192, 57344) [2026-04-08 08:46:03.997627 INFO buffer_manager] Allocated weights buffer at (3847745536, 132120576) [2026-04-08 08:46:03.997629 INFO buffer_manager] Allocated weights buffer at (3979866112, 57344) [2026-04-08 08:46:03.997630 INFO buffer_manager] Allocated weights buffer at (3979923456, 132120576) [2026-04-08 08:46:03.997631 INFO buffer_manager] Allocated weights buffer at (4112044032, 57344) [2026-04-08 08:46:03.997633 INFO buffer_manager] Allocated weights buffer at (4112101376, 0) [2026-04-08 08:46:03.997634 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=12, cache_slot=12) planned desc only [2026-04-08 08:46:04.033800 INFO buffer_manager] Allocated weights buffer at (4112101376, 0) [2026-04-08 08:46:04.033814 INFO buffer_manager] Allocated weights buffer at (4112101376, 132120576) [2026-04-08 08:46:04.033817 INFO buffer_manager] Allocated weights buffer at (4244221952, 57344) [2026-04-08 08:46:04.033818 INFO buffer_manager] Allocated weights buffer at (4244279296, 132120576) [2026-04-08 08:46:04.033820 INFO buffer_manager] Allocated weights buffer at (4376399872, 57344) [2026-04-08 08:46:04.033821 INFO buffer_manager] Allocated weights buffer at (4376457216, 132120576) [2026-04-08 08:46:04.033823 INFO buffer_manager] Allocated weights buffer at (4508577792, 57344) [2026-04-08 08:46:04.033824 INFO buffer_manager] Allocated weights buffer at (4508635136, 0) [2026-04-08 08:46:04.033825 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=13, cache_slot=13) planned desc only [2026-04-08 08:46:04.069969 INFO buffer_manager] Allocated weights buffer at (4508635136, 0) [2026-04-08 08:46:04.069982 INFO buffer_manager] Allocated weights buffer at (4508635136, 132120576) [2026-04-08 08:46:04.069984 INFO buffer_manager] Allocated weights buffer at (4640755712, 57344) [2026-04-08 08:46:04.069986 INFO buffer_manager] Allocated weights buffer at (4640813056, 132120576) [2026-04-08 08:46:04.069987 INFO buffer_manager] Allocated weights buffer at (4772933632, 57344) [2026-04-08 08:46:04.069988 INFO buffer_manager] Allocated weights buffer at (4772990976, 132120576) [2026-04-08 08:46:04.069990 INFO buffer_manager] Allocated weights buffer at (4905111552, 57344) [2026-04-08 08:46:04.069991 INFO buffer_manager] Allocated weights buffer at (4905168896, 0) [2026-04-08 08:46:04.069993 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=14, cache_slot=14) planned desc only [2026-04-08 08:46:04.106134 INFO buffer_manager] Allocated weights buffer at (4905168896, 0) [2026-04-08 08:46:04.106147 INFO buffer_manager] Allocated weights buffer at (4905168896, 132120576) [2026-04-08 08:46:04.106149 INFO buffer_manager] Allocated weights buffer at (5037289472, 57344) [2026-04-08 08:46:04.106151 INFO buffer_manager] Allocated weights buffer at (5037346816, 132120576) [2026-04-08 08:46:04.106152 INFO buffer_manager] Allocated weights buffer at (5169467392, 57344) [2026-04-08 08:46:04.106154 INFO buffer_manager] Allocated weights buffer at (5169524736, 132120576) [2026-04-08 08:46:04.106155 INFO buffer_manager] Allocated weights buffer at (5301645312, 57344) [2026-04-08 08:46:04.106161 INFO buffer_manager] Allocated weights buffer at (5301702656, 0) [2026-04-08 08:46:04.106163 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=15, cache_slot=15) planned desc only [2026-04-08 08:46:04.142393 INFO buffer_manager] Allocated weights buffer at (5301702656, 0) [2026-04-08 08:46:04.142408 INFO buffer_manager] Allocated weights buffer at (5301702656, 132120576) [2026-04-08 08:46:04.142410 INFO buffer_manager] Allocated weights buffer at (5433823232, 57344) [2026-04-08 08:46:04.142412 INFO buffer_manager] Allocated weights buffer at (5433880576, 132120576) [2026-04-08 08:46:04.142413 INFO buffer_manager] Allocated weights buffer at (5566001152, 57344) [2026-04-08 08:46:04.142415 INFO buffer_manager] Allocated weights buffer at (5566058496, 132120576) [2026-04-08 08:46:04.142416 INFO buffer_manager] Allocated weights buffer at (5698179072, 57344) [2026-04-08 08:46:04.142418 INFO buffer_manager] Allocated weights buffer at (5698236416, 0) [2026-04-08 08:46:04.142423 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=16, cache_slot=16) planned desc only [2026-04-08 08:46:04.178567 INFO buffer_manager] Allocated weights buffer at (5698236416, 0) [2026-04-08 08:46:04.178580 INFO buffer_manager] Allocated weights buffer at (5698236416, 132120576) [2026-04-08 08:46:04.178583 INFO buffer_manager] Allocated weights buffer at (5830356992, 57344) [2026-04-08 08:46:04.178584 INFO buffer_manager] Allocated weights buffer at (5830414336, 132120576) [2026-04-08 08:46:04.178586 INFO buffer_manager] Allocated weights buffer at (5962534912, 57344) [2026-04-08 08:46:04.178587 INFO buffer_manager] Allocated weights buffer at (5962592256, 132120576) [2026-04-08 08:46:04.178593 INFO buffer_manager] Allocated weights buffer at (6094712832, 57344) [2026-04-08 08:46:04.178595 INFO buffer_manager] Allocated weights buffer at (6094770176, 0) [2026-04-08 08:46:04.178596 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=17, cache_slot=17) planned desc only [2026-04-08 08:46:04.214767 INFO buffer_manager] Allocated weights buffer at (6094770176, 0) [2026-04-08 08:46:04.214783 INFO buffer_manager] Allocated weights buffer at (6094770176, 132120576) [2026-04-08 08:46:04.214786 INFO buffer_manager] Allocated weights buffer at (6226890752, 57344) [2026-04-08 08:46:04.214787 INFO buffer_manager] Allocated weights buffer at (6226948096, 132120576) [2026-04-08 08:46:04.214789 INFO buffer_manager] Allocated weights buffer at (6359068672, 57344) [2026-04-08 08:46:04.214790 INFO buffer_manager] Allocated weights buffer at (6359126016, 132120576) [2026-04-08 08:46:04.214792 INFO buffer_manager] Allocated weights buffer at (6491246592, 57344) [2026-04-08 08:46:04.214793 INFO buffer_manager] Allocated weights buffer at (6491303936, 0) [2026-04-08 08:46:04.214795 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=18, cache_slot=18) planned desc only [2026-04-08 08:46:04.250913 INFO buffer_manager] Allocated weights buffer at (6491303936, 0) [2026-04-08 08:46:04.250927 INFO buffer_manager] Allocated weights buffer at (6491303936, 132120576) [2026-04-08 08:46:04.250929 INFO buffer_manager] Allocated weights buffer at (6623424512, 57344) [2026-04-08 08:46:04.250930 INFO buffer_manager] Allocated weights buffer at (6623481856, 132120576) [2026-04-08 08:46:04.250932 INFO buffer_manager] Allocated weights buffer at (6755602432, 57344) [2026-04-08 08:46:04.250933 INFO buffer_manager] Allocated weights buffer at (6755659776, 132120576) [2026-04-08 08:46:04.250935 INFO buffer_manager] Allocated weights buffer at (6887780352, 57344) [2026-04-08 08:46:04.250936 INFO buffer_manager] Allocated weights buffer at (6887837696, 0) [2026-04-08 08:46:04.250938 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=19, cache_slot=19) planned desc only [2026-04-08 08:46:04.287087 INFO buffer_manager] Allocated weights buffer at (6887837696, 0) [2026-04-08 08:46:04.287100 INFO buffer_manager] Allocated weights buffer at (6887837696, 132120576) [2026-04-08 08:46:04.287106 INFO buffer_manager] Allocated weights buffer at (7019958272, 57344) [2026-04-08 08:46:04.287107 INFO buffer_manager] Allocated weights buffer at (7020015616, 132120576) [2026-04-08 08:46:04.287109 INFO buffer_manager] Allocated weights buffer at (7152136192, 57344) [2026-04-08 08:46:04.287110 INFO buffer_manager] Allocated weights buffer at (7152193536, 132120576) [2026-04-08 08:46:04.287112 INFO buffer_manager] Allocated weights buffer at (7284314112, 57344) [2026-04-08 08:46:04.287113 INFO buffer_manager] Allocated weights buffer at (7284371456, 0) [2026-04-08 08:46:04.287115 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=20, cache_slot=20) planned desc only [2026-04-08 08:46:04.323269 INFO buffer_manager] Allocated weights buffer at (7284371456, 0) [2026-04-08 08:46:04.323282 INFO buffer_manager] Allocated weights buffer at (7284371456, 132120576) [2026-04-08 08:46:04.323284 INFO buffer_manager] Allocated weights buffer at (7416492032, 57344) [2026-04-08 08:46:04.323285 INFO buffer_manager] Allocated weights buffer at (7416549376, 132120576) [2026-04-08 08:46:04.323287 INFO buffer_manager] Allocated weights buffer at (7548669952, 57344) [2026-04-08 08:46:04.323288 INFO buffer_manager] Allocated weights buffer at (7548727296, 132120576) [2026-04-08 08:46:04.323289 INFO buffer_manager] Allocated weights buffer at (7680847872, 57344) [2026-04-08 08:46:04.323291 INFO buffer_manager] Allocated weights buffer at (7680905216, 0) [2026-04-08 08:46:04.323292 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=21, cache_slot=21) planned desc only [2026-04-08 08:46:04.359450 INFO buffer_manager] Allocated weights buffer at (7680905216, 0) [2026-04-08 08:46:04.359463 INFO buffer_manager] Allocated weights buffer at (7680905216, 132120576) [2026-04-08 08:46:04.359466 INFO buffer_manager] Allocated weights buffer at (7813025792, 57344) [2026-04-08 08:46:04.359467 INFO buffer_manager] Allocated weights buffer at (7813083136, 132120576) [2026-04-08 08:46:04.359469 INFO buffer_manager] Allocated weights buffer at (7945203712, 57344) [2026-04-08 08:46:04.359470 INFO buffer_manager] Allocated weights buffer at (7945261056, 132120576) [2026-04-08 08:46:04.359472 INFO buffer_manager] Allocated weights buffer at (8077381632, 57344) [2026-04-08 08:46:04.359473 INFO buffer_manager] Allocated weights buffer at (8077438976, 0) [2026-04-08 08:46:04.359475 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=22, cache_slot=22) planned desc only [2026-04-08 08:46:04.395866 INFO buffer_manager] Allocated weights buffer at (8077438976, 0) [2026-04-08 08:46:04.395882 INFO buffer_manager] Allocated weights buffer at (8077438976, 132120576) [2026-04-08 08:46:04.395884 INFO buffer_manager] Allocated weights buffer at (8209559552, 57344) [2026-04-08 08:46:04.395886 INFO buffer_manager] Allocated weights buffer at (8209616896, 132120576) [2026-04-08 08:46:04.395887 INFO buffer_manager] Allocated weights buffer at (8341737472, 57344) [2026-04-08 08:46:04.395889 INFO buffer_manager] Allocated weights buffer at (8341794816, 132120576) [2026-04-08 08:46:04.395890 INFO buffer_manager] Allocated weights buffer at (8473915392, 57344) [2026-04-08 08:46:04.395892 INFO buffer_manager] Allocated weights buffer at (8473972736, 0) [2026-04-08 08:46:04.395893 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=23, cache_slot=23) planned desc only [2026-04-08 08:46:04.432112 INFO buffer_manager] Allocated weights buffer at (8473972736, 0) [2026-04-08 08:46:04.432126 INFO buffer_manager] Allocated weights buffer at (8473972736, 132120576) [2026-04-08 08:46:04.432128 INFO buffer_manager] Allocated weights buffer at (8606093312, 57344) [2026-04-08 08:46:04.432130 INFO buffer_manager] Allocated weights buffer at (8606150656, 132120576) [2026-04-08 08:46:04.432132 INFO buffer_manager] Allocated weights buffer at (8738271232, 57344) [2026-04-08 08:46:04.432133 INFO buffer_manager] Allocated weights buffer at (8738328576, 132120576) [2026-04-08 08:46:04.432135 INFO buffer_manager] Allocated weights buffer at (8870449152, 57344) [2026-04-08 08:46:04.432140 INFO buffer_manager] Allocated weights buffer at (8870506496, 0) [2026-04-08 08:46:04.432141 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=24, cache_slot=24) planned desc only [2026-04-08 08:46:04.468329 INFO buffer_manager] Allocated weights buffer at (8870506496, 0) [2026-04-08 08:46:04.468349 INFO buffer_manager] Allocated weights buffer at (8870506496, 132120576) [2026-04-08 08:46:04.468352 INFO buffer_manager] Allocated weights buffer at (9002627072, 57344) [2026-04-08 08:46:04.468353 INFO buffer_manager] Allocated weights buffer at (9002684416, 132120576) [2026-04-08 08:46:04.468355 INFO buffer_manager] Allocated weights buffer at (9134804992, 57344) [2026-04-08 08:46:04.468356 INFO buffer_manager] Allocated weights buffer at (9134862336, 132120576) [2026-04-08 08:46:04.468358 INFO buffer_manager] Allocated weights buffer at (9266982912, 57344) [2026-04-08 08:46:04.468359 INFO buffer_manager] Allocated weights buffer at (9267040256, 0) [2026-04-08 08:46:04.468360 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=25, cache_slot=25) planned desc only [2026-04-08 08:46:04.504537 INFO buffer_manager] Allocated weights buffer at (9267040256, 0) [2026-04-08 08:46:04.504551 INFO buffer_manager] Allocated weights buffer at (9267040256, 132120576) [2026-04-08 08:46:04.504553 INFO buffer_manager] Allocated weights buffer at (9399160832, 57344) [2026-04-08 08:46:04.504554 INFO buffer_manager] Allocated weights buffer at (9399218176, 132120576) [2026-04-08 08:46:04.504556 INFO buffer_manager] Allocated weights buffer at (9531338752, 57344) [2026-04-08 08:46:04.504557 INFO buffer_manager] Allocated weights buffer at (9531396096, 132120576) [2026-04-08 08:46:04.504559 INFO buffer_manager] Allocated weights buffer at (9663516672, 57344) [2026-04-08 08:46:04.504560 INFO buffer_manager] Allocated weights buffer at (9663574016, 0) [2026-04-08 08:46:04.504562 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=26, cache_slot=26) planned desc only [2026-04-08 08:46:04.540757 INFO buffer_manager] Allocated weights buffer at (9663574016, 0) [2026-04-08 08:46:04.540771 INFO buffer_manager] Allocated weights buffer at (9663574016, 132120576) [2026-04-08 08:46:04.540773 INFO buffer_manager] Allocated weights buffer at (9795694592, 57344) [2026-04-08 08:46:04.540776 INFO buffer_manager] Allocated weights buffer at (9795751936, 132120576) [2026-04-08 08:46:04.540778 INFO buffer_manager] Allocated weights buffer at (9927872512, 57344) [2026-04-08 08:46:04.540780 INFO buffer_manager] Allocated weights buffer at (9927929856, 132120576) [2026-04-08 08:46:04.540781 INFO buffer_manager] Allocated weights buffer at (10060050432, 57344) [2026-04-08 08:46:04.540783 INFO buffer_manager] Allocated weights buffer at (10060107776, 0) [2026-04-08 08:46:04.540784 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=27, cache_slot=27) planned desc only [2026-04-08 08:46:04.576966 INFO buffer_manager] Allocated weights buffer at (10060107776, 0) [2026-04-08 08:46:04.576979 INFO buffer_manager] Allocated weights buffer at (10060107776, 132120576) [2026-04-08 08:46:04.576981 INFO buffer_manager] Allocated weights buffer at (10192228352, 57344) [2026-04-08 08:46:04.576983 INFO buffer_manager] Allocated weights buffer at (10192285696, 132120576) [2026-04-08 08:46:04.576984 INFO buffer_manager] Allocated weights buffer at (10324406272, 57344) [2026-04-08 08:46:04.576986 INFO buffer_manager] Allocated weights buffer at (10324463616, 132120576) [2026-04-08 08:46:04.576987 INFO buffer_manager] Allocated weights buffer at (10456584192, 57344) [2026-04-08 08:46:04.576989 INFO buffer_manager] Allocated weights buffer at (10456641536, 0) [2026-04-08 08:46:04.576990 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=28, cache_slot=28) planned desc only [2026-04-08 08:46:04.613124 INFO buffer_manager] Allocated weights buffer at (10456641536, 0) [2026-04-08 08:46:04.613137 INFO buffer_manager] Allocated weights buffer at (10456641536, 132120576) [2026-04-08 08:46:04.613143 INFO buffer_manager] Allocated weights buffer at (10588762112, 57344) [2026-04-08 08:46:04.613145 INFO buffer_manager] Allocated weights buffer at (10588819456, 132120576) [2026-04-08 08:46:04.613146 INFO buffer_manager] Allocated weights buffer at (10720940032, 57344) [2026-04-08 08:46:04.613148 INFO buffer_manager] Allocated weights buffer at (10720997376, 132120576) [2026-04-08 08:46:04.613149 INFO buffer_manager] Allocated weights buffer at (10853117952, 57344) [2026-04-08 08:46:04.613151 INFO buffer_manager] Allocated weights buffer at (10853175296, 0) [2026-04-08 08:46:04.613152 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=29, cache_slot=29) planned desc only [2026-04-08 08:46:04.649246 INFO buffer_manager] Allocated weights buffer at (10853175296, 0) [2026-04-08 08:46:04.649259 INFO buffer_manager] Allocated weights buffer at (10853175296, 132120576) [2026-04-08 08:46:04.649261 INFO buffer_manager] Allocated weights buffer at (10985295872, 57344) [2026-04-08 08:46:04.649263 INFO buffer_manager] Allocated weights buffer at (10985353216, 132120576) [2026-04-08 08:46:04.649264 INFO buffer_manager] Allocated weights buffer at (11117473792, 57344) [2026-04-08 08:46:04.649266 INFO buffer_manager] Allocated weights buffer at (11117531136, 132120576) [2026-04-08 08:46:04.649267 INFO buffer_manager] Allocated weights buffer at (11249651712, 57344) [2026-04-08 08:46:04.649269 INFO buffer_manager] Allocated weights buffer at (11249709056, 0) [2026-04-08 08:46:04.649270 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=30, cache_slot=30) planned desc only [2026-04-08 08:46:04.685558 INFO buffer_manager] Allocated weights buffer at (11249709056, 0) [2026-04-08 08:46:04.685571 INFO buffer_manager] Allocated weights buffer at (11249709056, 132120576) [2026-04-08 08:46:04.685573 INFO buffer_manager] Allocated weights buffer at (11381829632, 57344) [2026-04-08 08:46:04.685575 INFO buffer_manager] Allocated weights buffer at (11381886976, 132120576) [2026-04-08 08:46:04.685576 INFO buffer_manager] Allocated weights buffer at (11514007552, 57344) [2026-04-08 08:46:04.685578 INFO buffer_manager] Allocated weights buffer at (11514064896, 132120576) [2026-04-08 08:46:04.685579 INFO buffer_manager] Allocated weights buffer at (11646185472, 57344) [2026-04-08 08:46:04.685581 INFO buffer_manager] Allocated weights buffer at (11646242816, 0) [2026-04-08 08:46:04.685582 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=31, cache_slot=31) planned desc only [2026-04-08 08:46:04.721793 INFO buffer_manager] Allocated weights buffer at (11646242816, 0) [2026-04-08 08:46:04.721807 INFO buffer_manager] Allocated weights buffer at (11646242816, 132120576) [2026-04-08 08:46:04.721809 INFO buffer_manager] Allocated weights buffer at (11778363392, 57344) [2026-04-08 08:46:04.721810 INFO buffer_manager] Allocated weights buffer at (11778420736, 132120576) [2026-04-08 08:46:04.721812 INFO buffer_manager] Allocated weights buffer at (11910541312, 57344) [2026-04-08 08:46:04.721813 INFO buffer_manager] Allocated weights buffer at (11910598656, 132120576) [2026-04-08 08:46:04.721815 INFO buffer_manager] Allocated weights buffer at (12042719232, 57344) [2026-04-08 08:46:04.721816 INFO buffer_manager] Allocated weights buffer at (12042776576, 0) [2026-04-08 08:46:04.721818 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=32, cache_slot=32) planned desc only [2026-04-08 08:46:04.757964 INFO buffer_manager] Allocated weights buffer at (12042776576, 0) [2026-04-08 08:46:04.757978 INFO buffer_manager] Allocated weights buffer at (12042776576, 132120576) [2026-04-08 08:46:04.757980 INFO buffer_manager] Allocated weights buffer at (12174897152, 57344) [2026-04-08 08:46:04.757982 INFO buffer_manager] Allocated weights buffer at (12174954496, 132120576) [2026-04-08 08:46:04.757984 INFO buffer_manager] Allocated weights buffer at (12307075072, 57344) [2026-04-08 08:46:04.757985 INFO buffer_manager] Allocated weights buffer at (12307132416, 132120576) [2026-04-08 08:46:04.757986 INFO buffer_manager] Allocated weights buffer at (12439252992, 57344) [2026-04-08 08:46:04.757991 INFO buffer_manager] Allocated weights buffer at (12439310336, 0) [2026-04-08 08:46:04.757993 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=33, cache_slot=33) planned desc only [2026-04-08 08:46:04.794078 INFO buffer_manager] Allocated weights buffer at (12439310336, 0) [2026-04-08 08:46:04.794092 INFO buffer_manager] Allocated weights buffer at (12439310336, 132120576) [2026-04-08 08:46:04.794094 INFO buffer_manager] Allocated weights buffer at (12571430912, 57344) [2026-04-08 08:46:04.794096 INFO buffer_manager] Allocated weights buffer at (12571488256, 132120576) [2026-04-08 08:46:04.794097 INFO buffer_manager] Allocated weights buffer at (12703608832, 57344) [2026-04-08 08:46:04.794099 INFO buffer_manager] Allocated weights buffer at (12703666176, 132120576) [2026-04-08 08:46:04.794100 INFO buffer_manager] Allocated weights buffer at (12835786752, 57344) [2026-04-08 08:46:04.794102 INFO buffer_manager] Allocated weights buffer at (12835844096, 0) [2026-04-08 08:46:04.794103 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=34, cache_slot=34) planned desc only [2026-04-08 08:46:04.830387 INFO buffer_manager] Allocated weights buffer at (12835844096, 0) [2026-04-08 08:46:04.830401 INFO buffer_manager] Allocated weights buffer at (12835844096, 132120576) [2026-04-08 08:46:04.830403 INFO buffer_manager] Allocated weights buffer at (12967964672, 57344) [2026-04-08 08:46:04.830405 INFO buffer_manager] Allocated weights buffer at (12968022016, 132120576) [2026-04-08 08:46:04.830406 INFO buffer_manager] Allocated weights buffer at (13100142592, 57344) [2026-04-08 08:46:04.830408 INFO buffer_manager] Allocated weights buffer at (13100199936, 132120576) [2026-04-08 08:46:04.830409 INFO buffer_manager] Allocated weights buffer at (13232320512, 57344) [2026-04-08 08:46:04.830411 INFO buffer_manager] Allocated weights buffer at (13232377856, 0) [2026-04-08 08:46:04.830412 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=35, cache_slot=35) planned desc only [2026-04-08 08:46:04.866653 INFO buffer_manager] Allocated weights buffer at (13232377856, 0) [2026-04-08 08:46:04.866671 INFO buffer_manager] Allocated weights buffer at (13232377856, 132120576) [2026-04-08 08:46:04.866673 INFO buffer_manager] Allocated weights buffer at (13364498432, 57344) [2026-04-08 08:46:04.866675 INFO buffer_manager] Allocated weights buffer at (13364555776, 132120576) [2026-04-08 08:46:04.866677 INFO buffer_manager] Allocated weights buffer at (13496676352, 57344) [2026-04-08 08:46:04.866678 INFO buffer_manager] Allocated weights buffer at (13496733696, 132120576) [2026-04-08 08:46:04.866680 INFO buffer_manager] Allocated weights buffer at (13628854272, 57344) [2026-04-08 08:46:04.866681 INFO buffer_manager] Allocated weights buffer at (13628911616, 0) [2026-04-08 08:46:04.866683 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=36, cache_slot=36) planned desc only [2026-04-08 08:46:04.902961 INFO buffer_manager] Allocated weights buffer at (13628911616, 0) [2026-04-08 08:46:04.902976 INFO buffer_manager] Allocated weights buffer at (13628911616, 132120576) [2026-04-08 08:46:04.902978 INFO buffer_manager] Allocated weights buffer at (13761032192, 57344) [2026-04-08 08:46:04.902980 INFO buffer_manager] Allocated weights buffer at (13761089536, 132120576) [2026-04-08 08:46:04.902981 INFO buffer_manager] Allocated weights buffer at (13893210112, 57344) [2026-04-08 08:46:04.902983 INFO buffer_manager] Allocated weights buffer at (13893267456, 132120576) [2026-04-08 08:46:04.902984 INFO buffer_manager] Allocated weights buffer at (14025388032, 57344) [2026-04-08 08:46:04.902986 INFO buffer_manager] Allocated weights buffer at (14025445376, 0) [2026-04-08 08:46:04.902987 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=37, cache_slot=37) planned desc only [2026-04-08 08:46:04.939178 INFO buffer_manager] Allocated weights buffer at (14025445376, 0) [2026-04-08 08:46:04.939192 INFO buffer_manager] Allocated weights buffer at (14025445376, 132120576) [2026-04-08 08:46:04.939197 INFO buffer_manager] Allocated weights buffer at (14157565952, 57344) [2026-04-08 08:46:04.939198 INFO buffer_manager] Allocated weights buffer at (14157623296, 132120576) [2026-04-08 08:46:04.939200 INFO buffer_manager] Allocated weights buffer at (14289743872, 57344) [2026-04-08 08:46:04.939201 INFO buffer_manager] Allocated weights buffer at (14289801216, 132120576) [2026-04-08 08:46:04.939203 INFO buffer_manager] Allocated weights buffer at (14421921792, 57344) [2026-04-08 08:46:04.939204 INFO buffer_manager] Allocated weights buffer at (14421979136, 0) [2026-04-08 08:46:04.939206 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=38, cache_slot=38) planned desc only [2026-04-08 08:46:04.975528 INFO buffer_manager] Allocated weights buffer at (14421979136, 0) [2026-04-08 08:46:04.975542 INFO buffer_manager] Allocated weights buffer at (14421979136, 132120576) [2026-04-08 08:46:04.975544 INFO buffer_manager] Allocated weights buffer at (14554099712, 57344) [2026-04-08 08:46:04.975546 INFO buffer_manager] Allocated weights buffer at (14554157056, 132120576) [2026-04-08 08:46:04.975547 INFO buffer_manager] Allocated weights buffer at (14686277632, 57344) [2026-04-08 08:46:04.975548 INFO buffer_manager] Allocated weights buffer at (14686334976, 132120576) [2026-04-08 08:46:04.975550 INFO buffer_manager] Allocated weights buffer at (14818455552, 57344) [2026-04-08 08:46:04.975552 INFO buffer_manager] Allocated weights buffer at (14818512896, 0) [2026-04-08 08:46:04.975553 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=39, cache_slot=39) planned desc only [2026-04-08 08:46:05.011756 INFO buffer_manager] Allocated weights buffer at (14818512896, 0) [2026-04-08 08:46:05.011769 INFO buffer_manager] Allocated weights buffer at (14818512896, 132120576) [2026-04-08 08:46:05.011771 INFO buffer_manager] Allocated weights buffer at (14950633472, 57344) [2026-04-08 08:46:05.011773 INFO buffer_manager] Allocated weights buffer at (14950690816, 132120576) [2026-04-08 08:46:05.011774 INFO buffer_manager] Allocated weights buffer at (15082811392, 57344) [2026-04-08 08:46:05.011780 INFO buffer_manager] Allocated weights buffer at (15082868736, 132120576) [2026-04-08 08:46:05.011781 INFO buffer_manager] Allocated weights buffer at (15214989312, 57344) [2026-04-08 08:46:05.011783 INFO buffer_manager] Allocated weights buffer at (15215046656, 0) [2026-04-08 08:46:05.011784 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=40, cache_slot=40) planned desc only [2026-04-08 08:46:05.048030 INFO buffer_manager] Allocated weights buffer at (15215046656, 0) [2026-04-08 08:46:05.048043 INFO buffer_manager] Allocated weights buffer at (15215046656, 132120576) [2026-04-08 08:46:05.048045 INFO buffer_manager] Allocated weights buffer at (15347167232, 57344) [2026-04-08 08:46:05.048047 INFO buffer_manager] Allocated weights buffer at (15347224576, 132120576) [2026-04-08 08:46:05.048049 INFO buffer_manager] Allocated weights buffer at (15479345152, 57344) [2026-04-08 08:46:05.048050 INFO buffer_manager] Allocated weights buffer at (15479402496, 132120576) [2026-04-08 08:46:05.048052 INFO buffer_manager] Allocated weights buffer at (15611523072, 57344) [2026-04-08 08:46:05.048053 INFO buffer_manager] Allocated weights buffer at (15611580416, 0) [2026-04-08 08:46:05.048054 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=41, cache_slot=41) planned desc only [2026-04-08 08:46:05.084188 INFO buffer_manager] Allocated weights buffer at (15611580416, 0) [2026-04-08 08:46:05.084202 INFO buffer_manager] Allocated weights buffer at (15611580416, 132120576) [2026-04-08 08:46:05.084204 INFO buffer_manager] Allocated weights buffer at (15743700992, 57344) [2026-04-08 08:46:05.084205 INFO buffer_manager] Allocated weights buffer at (15743758336, 132120576) [2026-04-08 08:46:05.084207 INFO buffer_manager] Allocated weights buffer at (15875878912, 57344) [2026-04-08 08:46:05.084208 INFO buffer_manager] Allocated weights buffer at (15875936256, 132120576) [2026-04-08 08:46:05.084210 INFO buffer_manager] Allocated weights buffer at (16008056832, 57344) [2026-04-08 08:46:05.084214 INFO buffer_manager] Allocated weights buffer at (16008114176, 0) [2026-04-08 08:46:05.084216 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=42, cache_slot=42) planned desc only [2026-04-08 08:46:05.120355 INFO buffer_manager] Allocated weights buffer at (16008114176, 0) [2026-04-08 08:46:05.120368 INFO buffer_manager] Allocated weights buffer at (16008114176, 132120576) [2026-04-08 08:46:05.120370 INFO buffer_manager] Allocated weights buffer at (16140234752, 57344) [2026-04-08 08:46:05.120372 INFO buffer_manager] Allocated weights buffer at (16140292096, 132120576) [2026-04-08 08:46:05.120374 INFO buffer_manager] Allocated weights buffer at (16272412672, 57344) [2026-04-08 08:46:05.120375 INFO buffer_manager] Allocated weights buffer at (16272470016, 132120576) [2026-04-08 08:46:05.120377 INFO buffer_manager] Allocated weights buffer at (16404590592, 57344) [2026-04-08 08:46:05.120378 INFO buffer_manager] Allocated weights buffer at (16404647936, 0) [2026-04-08 08:46:05.120380 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=43, cache_slot=43) planned desc only [2026-04-08 08:46:05.156564 INFO buffer_manager] Allocated weights buffer at (16404647936, 0) [2026-04-08 08:46:05.156578 INFO buffer_manager] Allocated weights buffer at (16404647936, 132120576) [2026-04-08 08:46:05.156580 INFO buffer_manager] Allocated weights buffer at (16536768512, 57344) [2026-04-08 08:46:05.156582 INFO buffer_manager] Allocated weights buffer at (16536825856, 132120576) [2026-04-08 08:46:05.156583 INFO buffer_manager] Allocated weights buffer at (16668946432, 57344) [2026-04-08 08:46:05.156584 INFO buffer_manager] Allocated weights buffer at (16669003776, 132120576) [2026-04-08 08:46:05.156586 INFO buffer_manager] Allocated weights buffer at (16801124352, 57344) [2026-04-08 08:46:05.156588 INFO buffer_manager] Allocated weights buffer at (16801181696, 0) [2026-04-08 08:46:05.156589 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=44, cache_slot=44) planned desc only [2026-04-08 08:46:05.192732 INFO buffer_manager] Allocated weights buffer at (16801181696, 0) [2026-04-08 08:46:05.192746 INFO buffer_manager] Allocated weights buffer at (16801181696, 132120576) [2026-04-08 08:46:05.192748 INFO buffer_manager] Allocated weights buffer at (16933302272, 57344) [2026-04-08 08:46:05.192749 INFO buffer_manager] Allocated weights buffer at (16933359616, 132120576) [2026-04-08 08:46:05.192751 INFO buffer_manager] Allocated weights buffer at (17065480192, 57344) [2026-04-08 08:46:05.192753 INFO buffer_manager] Allocated weights buffer at (17065537536, 132120576) [2026-04-08 08:46:05.192754 INFO buffer_manager] Allocated weights buffer at (17197658112, 57344) [2026-04-08 08:46:05.192756 INFO buffer_manager] Allocated weights buffer at (17197715456, 0) [2026-04-08 08:46:05.192757 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=45, cache_slot=45) planned desc only [2026-04-08 08:46:05.228874 INFO buffer_manager] Allocated weights buffer at (17197715456, 0) [2026-04-08 08:46:05.228888 INFO buffer_manager] Allocated weights buffer at (17197715456, 132120576) [2026-04-08 08:46:05.228890 INFO buffer_manager] Allocated weights buffer at (17329836032, 57344) [2026-04-08 08:46:05.228892 INFO buffer_manager] Allocated weights buffer at (17329893376, 132120576) [2026-04-08 08:46:05.228894 INFO buffer_manager] Allocated weights buffer at (17462013952, 57344) [2026-04-08 08:46:05.228895 INFO buffer_manager] Allocated weights buffer at (17462071296, 132120576) [2026-04-08 08:46:05.228897 INFO buffer_manager] Allocated weights buffer at (17594191872, 57344) [2026-04-08 08:46:05.228898 INFO buffer_manager] Allocated weights buffer at (17594249216, 0) [2026-04-08 08:46:05.228900 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=46, cache_slot=46) planned desc only [2026-04-08 08:46:05.265059 INFO buffer_manager] Allocated weights buffer at (17594249216, 0) [2026-04-08 08:46:05.265072 INFO buffer_manager] Allocated weights buffer at (17594249216, 132120576) [2026-04-08 08:46:05.265077 INFO buffer_manager] Allocated weights buffer at (17726369792, 57344) [2026-04-08 08:46:05.265078 INFO buffer_manager] Allocated weights buffer at (17726427136, 132120576) [2026-04-08 08:46:05.265080 INFO buffer_manager] Allocated weights buffer at (17858547712, 57344) [2026-04-08 08:46:05.265081 INFO buffer_manager] Allocated weights buffer at (17858605056, 132120576) [2026-04-08 08:46:05.265083 INFO buffer_manager] Allocated weights buffer at (17990725632, 57344) [2026-04-08 08:46:05.265084 INFO buffer_manager] Allocated weights buffer at (17990782976, 0) [2026-04-08 08:46:05.265086 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=47, cache_slot=47) planned desc only [2026-04-08 08:46:05.301225 INFO buffer_manager] Allocated weights buffer at (17990782976, 0) [2026-04-08 08:46:05.301240 INFO buffer_manager] Allocated weights buffer at (17990782976, 132120576) [2026-04-08 08:46:05.301242 INFO buffer_manager] Allocated weights buffer at (18122903552, 57344) [2026-04-08 08:46:05.301244 INFO buffer_manager] Allocated weights buffer at (18122960896, 132120576) [2026-04-08 08:46:05.301245 INFO buffer_manager] Allocated weights buffer at (18255081472, 57344) [2026-04-08 08:46:05.301247 INFO buffer_manager] Allocated weights buffer at (18255138816, 132120576) [2026-04-08 08:46:05.301248 INFO buffer_manager] Allocated weights buffer at (18387259392, 57344) [2026-04-08 08:46:05.301250 INFO buffer_manager] Allocated weights buffer at (18387316736, 0) [2026-04-08 08:46:05.301251 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=48, cache_slot=48) planned desc only [2026-04-08 08:46:05.337361 INFO buffer_manager] Allocated weights buffer at (18387316736, 0) [2026-04-08 08:46:05.337375 INFO buffer_manager] Allocated weights buffer at (18387316736, 132120576) [2026-04-08 08:46:05.337377 INFO buffer_manager] Allocated weights buffer at (18519437312, 57344) [2026-04-08 08:46:05.337379 INFO buffer_manager] Allocated weights buffer at (18519494656, 132120576) [2026-04-08 08:46:05.337381 INFO buffer_manager] Allocated weights buffer at (18651615232, 57344) [2026-04-08 08:46:05.337382 INFO buffer_manager] Allocated weights buffer at (18651672576, 132120576) [2026-04-08 08:46:05.337384 INFO buffer_manager] Allocated weights buffer at (18783793152, 57344) [2026-04-08 08:46:05.337385 INFO buffer_manager] Allocated weights buffer at (18783850496, 0) [2026-04-08 08:46:05.337387 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=49, cache_slot=49) planned desc only [2026-04-08 08:46:05.373446 INFO buffer_manager] Allocated weights buffer at (18783850496, 0) [2026-04-08 08:46:05.373461 INFO buffer_manager] Allocated weights buffer at (18783850496, 132120576) [2026-04-08 08:46:05.373463 INFO buffer_manager] Allocated weights buffer at (18915971072, 57344) [2026-04-08 08:46:05.373464 INFO buffer_manager] Allocated weights buffer at (18916028416, 132120576) [2026-04-08 08:46:05.373466 INFO buffer_manager] Allocated weights buffer at (19048148992, 57344) [2026-04-08 08:46:05.373467 INFO buffer_manager] Allocated weights buffer at (19048206336, 132120576) [2026-04-08 08:46:05.373469 INFO buffer_manager] Allocated weights buffer at (19180326912, 57344) [2026-04-08 08:46:05.373470 INFO buffer_manager] Allocated weights buffer at (19180384256, 0) [2026-04-08 08:46:05.373472 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=50, cache_slot=50) planned desc only [2026-04-08 08:46:05.409767 INFO buffer_manager] Allocated weights buffer at (19180384256, 0) [2026-04-08 08:46:05.409783 INFO buffer_manager] Allocated weights buffer at (19180384256, 132120576) [2026-04-08 08:46:05.409785 INFO buffer_manager] Allocated weights buffer at (19312504832, 57344) [2026-04-08 08:46:05.409787 INFO buffer_manager] Allocated weights buffer at (19312562176, 132120576) [2026-04-08 08:46:05.409789 INFO buffer_manager] Allocated weights buffer at (19444682752, 57344) [2026-04-08 08:46:05.409790 INFO buffer_manager] Allocated weights buffer at (19444740096, 132120576) [2026-04-08 08:46:05.409794 INFO buffer_manager] Allocated weights buffer at (19576860672, 57344) [2026-04-08 08:46:05.409796 INFO buffer_manager] Allocated weights buffer at (19576918016, 0) [2026-04-08 08:46:05.409798 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=51, cache_slot=51) planned desc only [2026-04-08 08:46:05.445938 INFO buffer_manager] Allocated weights buffer at (19576918016, 0) [2026-04-08 08:46:05.445951 INFO buffer_manager] Allocated weights buffer at (19576918016, 132120576) [2026-04-08 08:46:05.445953 INFO buffer_manager] Allocated weights buffer at (19709038592, 57344) [2026-04-08 08:46:05.445955 INFO buffer_manager] Allocated weights buffer at (19709095936, 132120576) [2026-04-08 08:46:05.445957 INFO buffer_manager] Allocated weights buffer at (19841216512, 57344) [2026-04-08 08:46:05.445958 INFO buffer_manager] Allocated weights buffer at (19841273856, 132120576) [2026-04-08 08:46:05.445960 INFO buffer_manager] Allocated weights buffer at (19973394432, 57344) [2026-04-08 08:46:05.445961 INFO buffer_manager] Allocated weights buffer at (19973451776, 0) [2026-04-08 08:46:05.445963 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=52, cache_slot=52) planned desc only [2026-04-08 08:46:05.482084 INFO buffer_manager] Allocated weights buffer at (19973451776, 0) [2026-04-08 08:46:05.482102 INFO buffer_manager] Allocated weights buffer at (19973451776, 132120576) [2026-04-08 08:46:05.482109 INFO buffer_manager] Allocated weights buffer at (20105572352, 57344) [2026-04-08 08:46:05.482110 INFO buffer_manager] Allocated weights buffer at (20105629696, 132120576) [2026-04-08 08:46:05.482112 INFO buffer_manager] Allocated weights buffer at (20237750272, 57344) [2026-04-08 08:46:05.482114 INFO buffer_manager] Allocated weights buffer at (20237807616, 132120576) [2026-04-08 08:46:05.482115 INFO buffer_manager] Allocated weights buffer at (20369928192, 57344) [2026-04-08 08:46:05.482116 INFO buffer_manager] Allocated weights buffer at (20369985536, 0) [2026-04-08 08:46:05.482118 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=53, cache_slot=53) planned desc only [2026-04-08 08:46:05.518343 INFO buffer_manager] Allocated weights buffer at (20369985536, 0) [2026-04-08 08:46:05.518356 INFO buffer_manager] Allocated weights buffer at (20369985536, 132120576) [2026-04-08 08:46:05.518358 INFO buffer_manager] Allocated weights buffer at (20502106112, 57344) [2026-04-08 08:46:05.518360 INFO buffer_manager] Allocated weights buffer at (20502163456, 132120576) [2026-04-08 08:46:05.518362 INFO buffer_manager] Allocated weights buffer at (20634284032, 57344) [2026-04-08 08:46:05.518363 INFO buffer_manager] Allocated weights buffer at (20634341376, 132120576) [2026-04-08 08:46:05.518365 INFO buffer_manager] Allocated weights buffer at (20766461952, 57344) [2026-04-08 08:46:05.518366 INFO buffer_manager] Allocated weights buffer at (20766519296, 0) [2026-04-08 08:46:05.518368 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=54, cache_slot=54) planned desc only [2026-04-08 08:46:05.554934 INFO buffer_manager] Allocated weights buffer at (20766519296, 0) [2026-04-08 08:46:05.554948 INFO buffer_manager] Allocated weights buffer at (20766519296, 132120576) [2026-04-08 08:46:05.554950 INFO buffer_manager] Allocated weights buffer at (20898639872, 57344) [2026-04-08 08:46:05.554952 INFO buffer_manager] Allocated weights buffer at (20898697216, 132120576) [2026-04-08 08:46:05.554953 INFO buffer_manager] Allocated weights buffer at (21030817792, 57344) [2026-04-08 08:46:05.554955 INFO buffer_manager] Allocated weights buffer at (21030875136, 132120576) [2026-04-08 08:46:05.554956 INFO buffer_manager] Allocated weights buffer at (21162995712, 57344) [2026-04-08 08:46:05.554958 INFO buffer_manager] Allocated weights buffer at (21163053056, 0) [2026-04-08 08:46:05.554959 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=55, cache_slot=55) planned desc only [2026-04-08 08:46:05.591308 INFO buffer_manager] Allocated weights buffer at (21163053056, 0) [2026-04-08 08:46:05.591326 INFO buffer_manager] Allocated weights buffer at (21163053056, 132120576) [2026-04-08 08:46:05.591328 INFO buffer_manager] Allocated weights buffer at (21295173632, 57344) [2026-04-08 08:46:05.591329 INFO buffer_manager] Allocated weights buffer at (21295230976, 132120576) [2026-04-08 08:46:05.591331 INFO buffer_manager] Allocated weights buffer at (21427351552, 57344) [2026-04-08 08:46:05.591332 INFO buffer_manager] Allocated weights buffer at (21427408896, 132120576) [2026-04-08 08:46:05.591334 INFO buffer_manager] Allocated weights buffer at (21559529472, 57344) [2026-04-08 08:46:05.591335 INFO buffer_manager] Allocated weights buffer at (21559586816, 0) [2026-04-08 08:46:05.591338 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=56, cache_slot=56) planned desc only [2026-04-08 08:46:05.627609 INFO buffer_manager] Allocated weights buffer at (21559586816, 0) [2026-04-08 08:46:05.627622 INFO buffer_manager] Allocated weights buffer at (21559586816, 132120576) [2026-04-08 08:46:05.627624 INFO buffer_manager] Allocated weights buffer at (21691707392, 57344) [2026-04-08 08:46:05.627626 INFO buffer_manager] Allocated weights buffer at (21691764736, 132120576) [2026-04-08 08:46:05.627627 INFO buffer_manager] Allocated weights buffer at (21823885312, 57344) [2026-04-08 08:46:05.627629 INFO buffer_manager] Allocated weights buffer at (21823942656, 132120576) [2026-04-08 08:46:05.627630 INFO buffer_manager] Allocated weights buffer at (21956063232, 57344) [2026-04-08 08:46:05.627632 INFO buffer_manager] Allocated weights buffer at (21956120576, 0) [2026-04-08 08:46:05.627633 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=57, cache_slot=57) planned desc only [2026-04-08 08:46:05.663889 INFO buffer_manager] Allocated weights buffer at (21956120576, 0) [2026-04-08 08:46:05.663906 INFO buffer_manager] Allocated weights buffer at (21956120576, 132120576) [2026-04-08 08:46:05.663908 INFO buffer_manager] Allocated weights buffer at (22088241152, 57344) [2026-04-08 08:46:05.663910 INFO buffer_manager] Allocated weights buffer at (22088298496, 132120576) [2026-04-08 08:46:05.663911 INFO buffer_manager] Allocated weights buffer at (22220419072, 57344) [2026-04-08 08:46:05.663913 INFO buffer_manager] Allocated weights buffer at (22220476416, 132120576) [2026-04-08 08:46:05.663915 INFO buffer_manager] Allocated weights buffer at (22352596992, 57344) [2026-04-08 08:46:05.663917 INFO buffer_manager] Allocated weights buffer at (22352654336, 0) [2026-04-08 08:46:05.663919 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=58, cache_slot=58) planned desc only [2026-04-08 08:46:05.700187 INFO buffer_manager] Allocated weights buffer at (22352654336, 0) [2026-04-08 08:46:05.700200 INFO buffer_manager] Allocated weights buffer at (22352654336, 132120576) [2026-04-08 08:46:05.700202 INFO buffer_manager] Allocated weights buffer at (22484774912, 57344) [2026-04-08 08:46:05.700204 INFO buffer_manager] Allocated weights buffer at (22484832256, 132120576) [2026-04-08 08:46:05.700205 INFO buffer_manager] Allocated weights buffer at (22616952832, 57344) [2026-04-08 08:46:05.700207 INFO buffer_manager] Allocated weights buffer at (22617010176, 132120576) [2026-04-08 08:46:05.700208 INFO buffer_manager] Allocated weights buffer at (22749130752, 57344) [2026-04-08 08:46:05.700210 INFO buffer_manager] Allocated weights buffer at (22749188096, 0) [2026-04-08 08:46:05.700211 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=59, cache_slot=59) planned desc only [2026-04-08 08:46:05.736548 INFO buffer_manager] Allocated weights buffer at (22749188096, 0) [2026-04-08 08:46:05.736561 INFO buffer_manager] Allocated weights buffer at (22749188096, 132120576) [2026-04-08 08:46:05.736564 INFO buffer_manager] Allocated weights buffer at (22881308672, 57344) [2026-04-08 08:46:05.736565 INFO buffer_manager] Allocated weights buffer at (22881366016, 132120576) [2026-04-08 08:46:05.736567 INFO buffer_manager] Allocated weights buffer at (23013486592, 57344) [2026-04-08 08:46:05.736568 INFO buffer_manager] Allocated weights buffer at (23013543936, 132120576) [2026-04-08 08:46:05.736572 INFO buffer_manager] Allocated weights buffer at (23145664512, 57344) [2026-04-08 08:46:05.736574 INFO buffer_manager] Allocated weights buffer at (23145721856, 0) [2026-04-08 08:46:05.736575 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=60, cache_slot=60) planned desc only [2026-04-08 08:46:06.099621 INFO buffer_manager] Allocated weights buffer at (23145721856, 0) [2026-04-08 08:46:06.099640 INFO buffer_manager] Allocated weights buffer at (23145721856, 132120576) [2026-04-08 08:46:06.099642 INFO buffer_manager] Allocated weights buffer at (23277842432, 57344) [2026-04-08 08:46:06.099644 INFO buffer_manager] Allocated weights buffer at (23277899776, 132120576) [2026-04-08 08:46:06.099645 INFO buffer_manager] Allocated weights buffer at (23410020352, 57344) [2026-04-08 08:46:06.099646 INFO buffer_manager] Allocated weights buffer at (23410077696, 132120576) [2026-04-08 08:46:06.099648 INFO buffer_manager] Allocated weights buffer at (23542198272, 57344) [2026-04-08 08:46:06.099650 INFO buffer_manager] Allocated weights buffer at (23542255616, 0) [2026-04-08 08:46:06.099651 INFO fp8_moe_dpdk] fp8_moe_dpdk: init_layer_cached(layer_idx=61, cache_slot=61) planned desc only [2026-04-08 08:46:21.320228 INFO fp8_dpdk_common] fp9 fast path forced on by default in the current kernel build [2026-04-08 08:46:21.590848 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=256, expert_tiles=756, avg_tile_batch=3.47, prepare=3.517263ms, send=20.265541ms, judge_wait=202.251756ms, fetch=26.534514ms, reduce=139ns; duck time-ns stats: p50=192.97265ms, p90=193.133211ms, max=193.391137ms; kernel_model: matmul=7.222591 GFLOP (37.347 GFLOP/s @ duck_max), param_stream=1.040450G (5.380 Gparam/s @ duck_max), weight_stream=1116.766 MiB (6.055 GB/s @ duck_max) [2026-04-08 08:46:21.863668 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=256, expert_tiles=748, avg_tile_batch=3.51, prepare=2.953825ms, send=17.213491ms, judge_wait=206.240653ms, fetch=26.857689ms, reduce=25ns; duck time-ns stats: p50=190.414788ms, p90=190.906771ms, max=191.156211ms; kernel_model: matmul=7.222591 GFLOP (37.784 GFLOP/s @ duck_max), param_stream=1.029439G (5.385 Gparam/s @ duck_max), weight_stream=1104.948 MiB (6.061 GB/s @ duck_max) [2026-04-08 08:46:22.139163 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=256, expert_tiles=746, avg_tile_batch=3.52, prepare=2.85797ms, send=17.232426ms, judge_wait=208.48949ms, fetch=26.977132ms, reduce=163ns; duck time-ns stats: p50=190.806335ms, p90=191.494448ms, max=192.02918ms; kernel_model: matmul=7.222591 GFLOP (37.612 GFLOP/s @ duck_max), param_stream=1.026687G (5.347 Gparam/s @ duck_max), weight_stream=1101.994 MiB (6.017 GB/s @ duck_max) [2026-04-08 08:46:22.416336 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=255, expert_tiles=752, avg_tile_batch=3.49, prepare=661.67µs, send=18.464896ms, judge_wait=212.518963ms, fetch=26.995607ms, reduce=23ns; duck time-ns stats: p50=194.662001ms, p90=194.911743ms, max=195.095826ms; kernel_model: matmul=7.222591 GFLOP (37.021 GFLOP/s @ duck_max), param_stream=1.034945G (5.305 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.970 GB/s @ duck_max) [2026-04-08 08:46:22.559072 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=250, expert_tiles=437, avg_tile_batch=3.09, prepare=375.622µs, send=8.900316ms, judge_wait=111.754115ms, fetch=13.172422ms, reduce=25ns; duck time-ns stats: p50=103.080621ms, p90=103.18837ms, max=103.311682ms; kernel_model: matmul=3.721396 GFLOP (36.021 GFLOP/s @ duck_max), param_stream=0.601424G (5.821 Gparam/s @ duck_max), weight_stream=645.538 MiB (6.552 GB/s @ duck_max) [2026-04-08 08:46:22.858807 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=254, expert_tiles=758, avg_tile_batch=3.46, prepare=730.228µs, send=18.541302ms, judge_wait=214.421601ms, fetch=26.937715ms, reduce=167ns; duck time-ns stats: p50=195.352485ms, p90=195.625464ms, max=195.729103ms; kernel_model: matmul=7.222591 GFLOP (36.901 GFLOP/s @ duck_max), param_stream=1.043202G (5.330 Gparam/s @ duck_max), weight_stream=1119.720 MiB (5.999 GB/s @ duck_max) [2026-04-08 08:46:23.125446 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=255, expert_tiles=734, avg_tile_batch=3.57, prepare=677.156µs, send=18.364102ms, judge_wait=212.342683ms, fetch=20.619862ms, reduce=19ns; duck time-ns stats: p50=192.369374ms, p90=192.686933ms, max=192.941376ms; kernel_model: matmul=7.222591 GFLOP (37.434 GFLOP/s @ duck_max), param_stream=1.010172G (5.236 Gparam/s @ duck_max), weight_stream=1084.267 MiB (5.893 GB/s @ duck_max) [2026-04-08 08:46:23.396413 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=256, expert_tiles=750, avg_tile_batch=3.50, prepare=665.541µs, send=17.226072ms, judge_wait=216.475454ms, fetch=21.658983ms, reduce=138ns; duck time-ns stats: p50=196.302874ms, p90=196.590411ms, max=196.674114ms; kernel_model: matmul=7.222591 GFLOP (36.724 GFLOP/s @ duck_max), param_stream=1.032192G (5.248 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.907 GB/s @ duck_max) [2026-04-08 08:46:23.666539 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=254, expert_tiles=757, avg_tile_batch=3.47, prepare=665.782µs, send=17.234581ms, judge_wait=216.875759ms, fetch=20.648695ms, reduce=138ns; duck time-ns stats: p50=195.89196ms, p90=196.104397ms, max=196.188453ms; kernel_model: matmul=7.222591 GFLOP (36.815 GFLOP/s @ duck_max), param_stream=1.041826G (5.310 Gparam/s @ duck_max), weight_stream=1118.243 MiB (5.977 GB/s @ duck_max) [2026-04-08 08:46:23.812110 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=247, expert_tiles=445, avg_tile_batch=3.04, prepare=415.139µs, send=10.126929ms, judge_wait=115.917886ms, fetch=11.582487ms, reduce=25ns; duck time-ns stats: p50=106.347986ms, p90=106.468445ms, max=106.56194ms; kernel_model: matmul=3.721396 GFLOP (34.922 GFLOP/s @ duck_max), param_stream=0.612434G (5.747 Gparam/s @ duck_max), weight_stream=657.355 MiB (6.468 GB/s @ duck_max) [2026-04-08 08:46:24.104419 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=253, expert_tiles=751, avg_tile_batch=3.49, prepare=711.525µs, send=19.002211ms, judge_wait=212.636085ms, fetch=20.640327ms, reduce=20ns; duck time-ns stats: p50=192.255057ms, p90=192.555014ms, max=192.795944ms; kernel_model: matmul=7.222591 GFLOP (37.462 GFLOP/s @ duck_max), param_stream=1.033568G (5.361 Gparam/s @ duck_max), weight_stream=1109.380 MiB (6.034 GB/s @ duck_max) [2026-04-08 08:46:24.378867 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=254, expert_tiles=753, avg_tile_batch=3.48, prepare=662.304µs, send=17.239152ms, judge_wait=220.166771ms, fetch=21.592005ms, reduce=20ns; duck time-ns stats: p50=196.695794ms, p90=196.936086ms, max=197.169982ms; kernel_model: matmul=7.222591 GFLOP (36.631 GFLOP/s @ duck_max), param_stream=1.036321G (5.256 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.916 GB/s @ duck_max) [2026-04-08 08:46:24.648922 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=255, expert_tiles=756, avg_tile_batch=3.47, prepare=661.867µs, send=17.236139ms, judge_wait=215.811209ms, fetch=21.635923ms, reduce=136ns; duck time-ns stats: p50=192.786997ms, p90=193.178526ms, max=193.443224ms; kernel_model: matmul=7.222591 GFLOP (37.337 GFLOP/s @ duck_max), param_stream=1.040450G (5.379 Gparam/s @ duck_max), weight_stream=1116.766 MiB (6.054 GB/s @ duck_max) [2026-04-08 08:46:24.919532 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=247, expert_tiles=749, avg_tile_batch=3.50, prepare=666.545µs, send=17.236345ms, judge_wait=216.334167ms, fetch=21.648279ms, reduce=136ns; duck time-ns stats: p50=193.795568ms, p90=194.035002ms, max=194.207447ms; kernel_model: matmul=7.222591 GFLOP (37.190 GFLOP/s @ duck_max), param_stream=1.030816G (5.308 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.974 GB/s @ duck_max) [2026-04-08 08:46:25.062905 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=236, expert_tiles=429, avg_tile_batch=3.15, prepare=417.413µs, send=10.097409ms, judge_wait=114.513865ms, fetch=10.885071ms, reduce=20ns; duck time-ns stats: p50=103.377281ms, p90=103.553675ms, max=103.681902ms; kernel_model: matmul=3.721396 GFLOP (35.892 GFLOP/s @ duck_max), param_stream=0.590414G (5.694 Gparam/s @ duck_max), weight_stream=633.720 MiB (6.409 GB/s @ duck_max) [2026-04-08 08:46:25.359648 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=253, expert_tiles=757, avg_tile_batch=3.47, prepare=707.849µs, send=18.674586ms, judge_wait=217.451889ms, fetch=20.663254ms, reduce=136ns; duck time-ns stats: p50=194.72192ms, p90=194.96603ms, max=195.065288ms; kernel_model: matmul=7.222591 GFLOP (37.027 GFLOP/s @ duck_max), param_stream=1.041826G (5.341 Gparam/s @ duck_max), weight_stream=1118.243 MiB (6.011 GB/s @ duck_max) [2026-04-08 08:46:25.629359 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=249, expert_tiles=750, avg_tile_batch=3.50, prepare=664.834µs, send=17.213401ms, judge_wait=216.550867ms, fetch=20.643689ms, reduce=19ns; duck time-ns stats: p50=192.607946ms, p90=192.924162ms, max=193.055675ms; kernel_model: matmul=7.222591 GFLOP (37.412 GFLOP/s @ duck_max), param_stream=1.032192G (5.347 Gparam/s @ duck_max), weight_stream=1107.903 MiB (6.018 GB/s @ duck_max) [2026-04-08 08:46:25.900702 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=252, expert_tiles=750, avg_tile_batch=3.50, prepare=660.287µs, send=17.222078ms, judge_wait=217.164175ms, fetch=21.633705ms, reduce=138ns; duck time-ns stats: p50=193.729987ms, p90=193.932047ms, max=194.230003ms; kernel_model: matmul=7.222591 GFLOP (37.186 GFLOP/s @ duck_max), param_stream=1.032192G (5.314 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.981 GB/s @ duck_max) [2026-04-08 08:46:26.168354 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=249, expert_tiles=741, avg_tile_batch=3.54, prepare=659.895µs, send=17.221172ms, judge_wait=214.201567ms, fetch=20.681679ms, reduce=135ns; duck time-ns stats: p50=190.526092ms, p90=190.720814ms, max=191.254119ms; kernel_model: matmul=7.222591 GFLOP (37.764 GFLOP/s @ duck_max), param_stream=1.019806G (5.332 Gparam/s @ duck_max), weight_stream=1094.608 MiB (6.001 GB/s @ duck_max) [2026-04-08 08:46:26.312980 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=245, expert_tiles=433, avg_tile_batch=3.12, prepare=418.627µs, send=8.900938ms, judge_wait=116.310429ms, fetch=11.561495ms, reduce=105ns; duck time-ns stats: p50=104.430677ms, p90=104.548761ms, max=104.653198ms; kernel_model: matmul=3.721396 GFLOP (35.559 GFLOP/s @ duck_max), param_stream=0.595919G (5.694 Gparam/s @ duck_max), weight_stream=639.629 MiB (6.409 GB/s @ duck_max) [2026-04-08 08:46:26.608954 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=252, expert_tiles=740, avg_tile_batch=3.55, prepare=702.712µs, send=19.248768ms, judge_wait=215.475624ms, fetch=21.591703ms, reduce=20ns; duck time-ns stats: p50=191.981235ms, p90=192.394758ms, max=192.635816ms; kernel_model: matmul=7.222591 GFLOP (37.494 GFLOP/s @ duck_max), param_stream=1.018429G (5.287 Gparam/s @ duck_max), weight_stream=1093.130 MiB (5.950 GB/s @ duck_max) [2026-04-08 08:46:26.882294 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=252, expert_tiles=754, avg_tile_batch=3.48, prepare=660.331µs, send=18.348097ms, judge_wait=218.096774ms, fetch=21.644763ms, reduce=136ns; duck time-ns stats: p50=194.613611ms, p90=194.848237ms, max=195.004501ms; kernel_model: matmul=7.222591 GFLOP (37.038 GFLOP/s @ duck_max), param_stream=1.037697G (5.321 Gparam/s @ duck_max), weight_stream=1113.811 MiB (5.989 GB/s @ duck_max) [2026-04-08 08:46:27.153145 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=254, expert_tiles=750, avg_tile_batch=3.50, prepare=659.751µs, send=18.328496ms, judge_wait=215.533836ms, fetch=21.666542ms, reduce=136ns; duck time-ns stats: p50=192.081422ms, p90=192.377779ms, max=192.570659ms; kernel_model: matmul=7.222591 GFLOP (37.506 GFLOP/s @ duck_max), param_stream=1.032192G (5.360 Gparam/s @ duck_max), weight_stream=1107.903 MiB (6.033 GB/s @ duck_max) [2026-04-08 08:46:27.423857 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=252, expert_tiles=753, avg_tile_batch=3.48, prepare=662.976µs, send=17.22437ms, judge_wait=217.413893ms, fetch=20.654518ms, reduce=20ns; duck time-ns stats: p50=192.478717ms, p90=192.870613ms, max=193.094816ms; kernel_model: matmul=7.222591 GFLOP (37.404 GFLOP/s @ duck_max), param_stream=1.036321G (5.367 Gparam/s @ duck_max), weight_stream=1112.334 MiB (6.040 GB/s @ duck_max) [2026-04-08 08:46:27.569728 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=241, expert_tiles=440, avg_tile_batch=3.07, prepare=413.099µs, send=10.077868ms, judge_wait=116.3176ms, fetch=11.544252ms, reduce=19ns; duck time-ns stats: p50=103.262268ms, p90=103.378927ms, max=103.56709ms; kernel_model: matmul=3.721396 GFLOP (35.932 GFLOP/s @ duck_max), param_stream=0.605553G (5.847 Gparam/s @ duck_max), weight_stream=649.969 MiB (6.581 GB/s @ duck_max) [2026-04-08 08:46:27.864406 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=247, expert_tiles=748, avg_tile_batch=3.51, prepare=699.916µs, send=19.18245ms, judge_wait=215.196371ms, fetch=20.641591ms, reduce=19ns; duck time-ns stats: p50=191.621414ms, p90=192.158921ms, max=192.341623ms; kernel_model: matmul=7.222591 GFLOP (37.551 GFLOP/s @ duck_max), param_stream=1.029439G (5.352 Gparam/s @ duck_max), weight_stream=1104.948 MiB (6.024 GB/s @ duck_max) [2026-04-08 08:46:28.134755 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=248, expert_tiles=754, avg_tile_batch=3.48, prepare=665.827µs, send=17.225749ms, judge_wait=217.174115ms, fetch=20.654188ms, reduce=20ns; duck time-ns stats: p50=192.13306ms, p90=192.392369ms, max=192.421993ms; kernel_model: matmul=7.222591 GFLOP (37.535 GFLOP/s @ duck_max), param_stream=1.037697G (5.393 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.070 GB/s @ duck_max) [2026-04-08 08:46:28.407328 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=248, expert_tiles=757, avg_tile_batch=3.47, prepare=659.177µs, send=17.226293ms, judge_wait=218.442679ms, fetch=21.641988ms, reduce=20ns; duck time-ns stats: p50=193.262604ms, p90=193.738147ms, max=193.805902ms; kernel_model: matmul=7.222591 GFLOP (37.267 GFLOP/s @ duck_max), param_stream=1.041826G (5.376 Gparam/s @ duck_max), weight_stream=1118.243 MiB (6.050 GB/s @ duck_max) [2026-04-08 08:46:28.681501 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=246, expert_tiles=751, avg_tile_batch=3.49, prepare=663.164µs, send=18.333627ms, judge_wait=218.9034ms, fetch=21.672694ms, reduce=136ns; duck time-ns stats: p50=193.358945ms, p90=193.584054ms, max=193.813786ms; kernel_model: matmul=7.222591 GFLOP (37.266 GFLOP/s @ duck_max), param_stream=1.033568G (5.333 Gparam/s @ duck_max), weight_stream=1109.380 MiB (6.002 GB/s @ duck_max) [2026-04-08 08:46:28.827830 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=234, expert_tiles=429, avg_tile_batch=3.15, prepare=414.576µs, send=10.223444ms, judge_wait=116.638959ms, fetch=11.57228ms, reduce=20ns; duck time-ns stats: p50=104.524655ms, p90=104.799696ms, max=104.917844ms; kernel_model: matmul=3.721396 GFLOP (35.470 GFLOP/s @ duck_max), param_stream=0.590414G (5.627 Gparam/s @ duck_max), weight_stream=633.720 MiB (6.334 GB/s @ duck_max) [2026-04-08 08:46:29.127351 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=249, expert_tiles=757, avg_tile_batch=3.47, prepare=700.421µs, send=19.260297ms, judge_wait=218.891178ms, fetch=21.628149ms, reduce=141ns; duck time-ns stats: p50=193.69171ms, p90=194.055294ms, max=194.108601ms; kernel_model: matmul=7.222591 GFLOP (37.209 GFLOP/s @ duck_max), param_stream=1.041826G (5.367 Gparam/s @ duck_max), weight_stream=1118.243 MiB (6.041 GB/s @ duck_max) [2026-04-08 08:46:29.397030 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=243, expert_tiles=746, avg_tile_batch=3.52, prepare=665.699µs, send=17.225702ms, judge_wait=216.464863ms, fetch=20.686269ms, reduce=20ns; duck time-ns stats: p50=191.059412ms, p90=191.540327ms, max=191.854304ms; kernel_model: matmul=7.222591 GFLOP (37.646 GFLOP/s @ duck_max), param_stream=1.026687G (5.351 Gparam/s @ duck_max), weight_stream=1101.994 MiB (6.023 GB/s @ duck_max) [2026-04-08 08:46:29.669878 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=251, expert_tiles=753, avg_tile_batch=3.48, prepare=662.535µs, send=17.226404ms, judge_wait=218.680484ms, fetch=21.658598ms, reduce=21ns; duck time-ns stats: p50=192.043412ms, p90=192.438871ms, max=192.954774ms; kernel_model: matmul=7.222591 GFLOP (37.432 GFLOP/s @ duck_max), param_stream=1.036321G (5.371 Gparam/s @ duck_max), weight_stream=1112.334 MiB (6.045 GB/s @ duck_max) [2026-04-08 08:46:29.942002 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=250, expert_tiles=749, avg_tile_batch=3.50, prepare=661.168µs, send=17.222436ms, judge_wait=217.818252ms, fetch=21.67545ms, reduce=136ns; duck time-ns stats: p50=192.474917ms, p90=192.67372ms, max=192.854956ms; kernel_model: matmul=7.222591 GFLOP (37.451 GFLOP/s @ duck_max), param_stream=1.030816G (5.345 Gparam/s @ duck_max), weight_stream=1106.425 MiB (6.016 GB/s @ duck_max) [2026-04-08 08:46:30.088140 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=233, expert_tiles=437, avg_tile_batch=3.09, prepare=413.961µs, send=10.06647ms, judge_wait=116.594167ms, fetch=11.570043ms, reduce=136ns; duck time-ns stats: p50=103.674392ms, p90=103.773924ms, max=103.848468ms; kernel_model: matmul=3.721396 GFLOP (35.835 GFLOP/s @ duck_max), param_stream=0.601424G (5.791 Gparam/s @ duck_max), weight_stream=645.538 MiB (6.518 GB/s @ duck_max) [2026-04-08 08:46:30.386218 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=249, expert_tiles=757, avg_tile_batch=3.47, prepare=700.81µs, send=18.88909ms, judge_wait=218.56853ms, fetch=20.639109ms, reduce=20ns; duck time-ns stats: p50=194.308996ms, p90=194.527306ms, max=194.645103ms; kernel_model: matmul=7.222591 GFLOP (37.106 GFLOP/s @ duck_max), param_stream=1.041826G (5.352 Gparam/s @ duck_max), weight_stream=1118.243 MiB (6.024 GB/s @ duck_max) [2026-04-08 08:46:30.669339 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=249, expert_tiles=759, avg_tile_batch=3.46, prepare=660.461µs, send=17.214638ms, judge_wait=228.965716ms, fetch=21.594147ms, reduce=134ns; duck time-ns stats: p50=205.244724ms, p90=205.823805ms, max=206.062687ms; kernel_model: matmul=7.222591 GFLOP (35.050 GFLOP/s @ duck_max), param_stream=1.044578G (5.069 Gparam/s @ duck_max), weight_stream=1121.197 MiB (5.705 GB/s @ duck_max) [2026-04-08 08:46:30.943657 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=246, expert_tiles=747, avg_tile_batch=3.51, prepare=658.953µs, send=18.31313ms, judge_wait=218.887239ms, fetch=21.666134ms, reduce=135ns; duck time-ns stats: p50=193.679032ms, p90=193.956849ms, max=194.078656ms; kernel_model: matmul=7.222591 GFLOP (37.215 GFLOP/s @ duck_max), param_stream=1.028063G (5.297 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.962 GB/s @ duck_max) [2026-04-08 08:46:31.216811 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=245, expert_tiles=758, avg_tile_batch=3.46, prepare=658.291µs, send=17.227292ms, judge_wait=218.870567ms, fetch=21.642297ms, reduce=144ns; duck time-ns stats: p50=193.416973ms, p90=193.776394ms, max=193.925078ms; kernel_model: matmul=7.222591 GFLOP (37.244 GFLOP/s @ duck_max), param_stream=1.043202G (5.379 Gparam/s @ duck_max), weight_stream=1119.720 MiB (6.054 GB/s @ duck_max) [2026-04-08 08:46:31.366920 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=231, expert_tiles=442, avg_tile_batch=3.06, prepare=414.297µs, send=10.11059ms, judge_wait=121.408292ms, fetch=10.56684ms, reduce=141ns; duck time-ns stats: p50=109.00354ms, p90=109.123958ms, max=109.157888ms; kernel_model: matmul=3.721396 GFLOP (34.092 GFLOP/s @ duck_max), param_stream=0.608305G (5.573 Gparam/s @ duck_max), weight_stream=652.924 MiB (6.272 GB/s @ duck_max) [2026-04-08 08:46:31.668100 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=250, expert_tiles=755, avg_tile_batch=3.48, prepare=695.575µs, send=19.061488ms, judge_wait=220.684618ms, fetch=21.589803ms, reduce=138ns; duck time-ns stats: p50=196.463025ms, p90=196.790015ms, max=196.892775ms; kernel_model: matmul=7.222591 GFLOP (36.683 GFLOP/s @ duck_max), param_stream=1.039073G (5.277 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.940 GB/s @ duck_max) [2026-04-08 08:46:31.943034 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=248, expert_tiles=756, avg_tile_batch=3.47, prepare=664.563µs, send=17.223934ms, judge_wait=220.539152ms, fetch=21.65745ms, reduce=161ns; duck time-ns stats: p50=194.265375ms, p90=194.60457ms, max=194.698131ms; kernel_model: matmul=7.222591 GFLOP (37.096 GFLOP/s @ duck_max), param_stream=1.040450G (5.344 Gparam/s @ duck_max), weight_stream=1116.766 MiB (6.015 GB/s @ duck_max) [2026-04-08 08:46:32.215883 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=253, expert_tiles=754, avg_tile_batch=3.48, prepare=663.759µs, send=17.226512ms, judge_wait=218.532058ms, fetch=21.640448ms, reduce=135ns; duck time-ns stats: p50=191.763143ms, p90=192.189224ms, max=192.427989ms; kernel_model: matmul=7.222591 GFLOP (37.534 GFLOP/s @ duck_max), param_stream=1.037697G (5.393 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.069 GB/s @ duck_max) [2026-04-08 08:46:32.490303 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=243, expert_tiles=749, avg_tile_batch=3.50, prepare=664.012µs, send=17.226854ms, judge_wait=221.091791ms, fetch=20.65038ms, reduce=136ns; duck time-ns stats: p50=195.328262ms, p90=195.571059ms, max=195.759329ms; kernel_model: matmul=7.222591 GFLOP (36.895 GFLOP/s @ duck_max), param_stream=1.030816G (5.266 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.927 GB/s @ duck_max) [2026-04-08 08:46:32.636005 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=235, expert_tiles=436, avg_tile_batch=3.10, prepare=416.425µs, send=10.054043ms, judge_wait=117.064182ms, fetch=10.628446ms, reduce=20ns; duck time-ns stats: p50=104.654665ms, p90=104.86453ms, max=104.938312ms; kernel_model: matmul=3.721396 GFLOP (35.463 GFLOP/s @ duck_max), param_stream=0.600048G (5.718 Gparam/s @ duck_max), weight_stream=644.061 MiB (6.436 GB/s @ duck_max) [2026-04-08 08:46:32.934693 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=247, expert_tiles=756, avg_tile_batch=3.47, prepare=696.994µs, send=19.023205ms, judge_wait=219.164686ms, fetch=20.646184ms, reduce=138ns; duck time-ns stats: p50=194.070204ms, p90=194.321567ms, max=194.379516ms; kernel_model: matmul=7.222591 GFLOP (37.157 GFLOP/s @ duck_max), param_stream=1.040450G (5.353 Gparam/s @ duck_max), weight_stream=1116.766 MiB (6.024 GB/s @ duck_max) [2026-04-08 08:46:33.211619 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=250, expert_tiles=760, avg_tile_batch=3.45, prepare=652.95µs, send=17.22828ms, judge_wait=222.738504ms, fetch=21.670855ms, reduce=25ns; duck time-ns stats: p50=197.103744ms, p90=197.427284ms, max=197.666827ms; kernel_model: matmul=7.222591 GFLOP (36.539 GFLOP/s @ duck_max), param_stream=1.045955G (5.292 Gparam/s @ duck_max), weight_stream=1122.675 MiB (5.956 GB/s @ duck_max) [2026-04-08 08:46:33.487599 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=250, expert_tiles=756, avg_tile_batch=3.47, prepare=663.29µs, send=17.224522ms, judge_wait=221.726117ms, fetch=21.668931ms, reduce=19ns; duck time-ns stats: p50=195.473091ms, p90=195.708487ms, max=195.973586ms; kernel_model: matmul=7.222591 GFLOP (36.855 GFLOP/s @ duck_max), param_stream=1.040450G (5.309 Gparam/s @ duck_max), weight_stream=1116.766 MiB (5.975 GB/s @ duck_max) [2026-04-08 08:46:33.763867 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=247, expert_tiles=751, avg_tile_batch=3.49, prepare=663.679µs, send=17.223235ms, judge_wait=222.995945ms, fetch=20.664281ms, reduce=20ns; duck time-ns stats: p50=197.965761ms, p90=198.188199ms, max=198.314677ms; kernel_model: matmul=7.222591 GFLOP (36.420 GFLOP/s @ duck_max), param_stream=1.033568G (5.212 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.866 GB/s @ duck_max) [2026-04-08 08:46:33.909954 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=233, expert_tiles=434, avg_tile_batch=3.12, prepare=416.444µs, send=10.121419ms, judge_wait=117.322848ms, fetch=10.75693ms, reduce=135ns; duck time-ns stats: p50=104.19747ms, p90=104.300572ms, max=104.3319ms; kernel_model: matmul=3.721396 GFLOP (35.669 GFLOP/s @ duck_max), param_stream=0.597295G (5.725 Gparam/s @ duck_max), weight_stream=641.106 MiB (6.443 GB/s @ duck_max) [2026-04-08 08:46:34.209154 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=246, expert_tiles=756, avg_tile_batch=3.47, prepare=702.465µs, send=19.023969ms, judge_wait=218.899481ms, fetch=21.601366ms, reduce=138ns; duck time-ns stats: p50=193.21305ms, p90=193.395873ms, max=193.458893ms; kernel_model: matmul=7.222591 GFLOP (37.334 GFLOP/s @ duck_max), param_stream=1.040450G (5.378 Gparam/s @ duck_max), weight_stream=1116.766 MiB (6.053 GB/s @ duck_max) [2026-04-08 08:46:34.483821 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=246, expert_tiles=752, avg_tile_batch=3.49, prepare=663.857µs, send=18.337297ms, judge_wait=219.409865ms, fetch=21.661496ms, reduce=137ns; duck time-ns stats: p50=192.898161ms, p90=193.327229ms, max=193.664224ms; kernel_model: matmul=7.222591 GFLOP (37.294 GFLOP/s @ duck_max), param_stream=1.034945G (5.344 Gparam/s @ duck_max), weight_stream=1110.857 MiB (6.015 GB/s @ duck_max) [2026-04-08 08:46:34.762021 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=250, expert_tiles=757, avg_tile_batch=3.47, prepare=657.853µs, send=18.329588ms, judge_wait=222.159724ms, fetch=22.360131ms, reduce=135ns; duck time-ns stats: p50=195.047571ms, p90=195.330968ms, max=195.343153ms; kernel_model: matmul=7.222591 GFLOP (36.974 GFLOP/s @ duck_max), param_stream=1.041826G (5.333 Gparam/s @ duck_max), weight_stream=1118.243 MiB (6.003 GB/s @ duck_max) [2026-04-08 08:46:35.037159 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=243, expert_tiles=746, avg_tile_batch=3.52, prepare=665.14µs, send=18.506313ms, judge_wait=219.570256ms, fetch=21.64791ms, reduce=135ns; duck time-ns stats: p50=194.350139ms, p90=194.568734ms, max=194.854033ms; kernel_model: matmul=7.222591 GFLOP (37.067 GFLOP/s @ duck_max), param_stream=1.026687G (5.269 Gparam/s @ duck_max), weight_stream=1101.994 MiB (5.930 GB/s @ duck_max) [2026-04-08 08:46:35.184331 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=223, expert_tiles=442, avg_tile_batch=3.06, prepare=411.514µs, send=10.110167ms, judge_wait=118.639891ms, fetch=10.553364ms, reduce=20ns; duck time-ns stats: p50=106.252207ms, p90=106.334796ms, max=106.424927ms; kernel_model: matmul=3.721396 GFLOP (34.967 GFLOP/s @ duck_max), param_stream=0.608305G (5.716 Gparam/s @ duck_max), weight_stream=652.924 MiB (6.433 GB/s @ duck_max) [2026-04-08 08:46:35.479008 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=239, expert_tiles=748, avg_tile_batch=3.51, prepare=698.224µs, send=19.153485ms, judge_wait=215.038569ms, fetch=20.658064ms, reduce=138ns; duck time-ns stats: p50=190.170213ms, p90=190.332128ms, max=190.421265ms; kernel_model: matmul=7.222591 GFLOP (37.930 GFLOP/s @ duck_max), param_stream=1.029439G (5.406 Gparam/s @ duck_max), weight_stream=1104.948 MiB (6.085 GB/s @ duck_max) [2026-04-08 08:46:35.754987 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=750, avg_tile_batch=3.50, prepare=662.33µs, send=17.22892ms, judge_wait=218.8964ms, fetch=24.597167ms, reduce=135ns; duck time-ns stats: p50=191.395854ms, p90=191.633286ms, max=192.039858ms; kernel_model: matmul=7.222591 GFLOP (37.610 GFLOP/s @ duck_max), param_stream=1.032192G (5.375 Gparam/s @ duck_max), weight_stream=1107.903 MiB (6.049 GB/s @ duck_max) [2026-04-08 08:46:36.032238 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=242, expert_tiles=740, avg_tile_batch=3.55, prepare=659.183µs, send=18.450432ms, judge_wait=221.13693ms, fetch=22.362555ms, reduce=134ns; duck time-ns stats: p50=194.437336ms, p90=194.799172ms, max=195.019344ms; kernel_model: matmul=7.222591 GFLOP (37.035 GFLOP/s @ duck_max), param_stream=1.018429G (5.222 Gparam/s @ duck_max), weight_stream=1093.130 MiB (5.878 GB/s @ duck_max) [2026-04-08 08:46:36.308599 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=238, expert_tiles=755, avg_tile_batch=3.48, prepare=661.056µs, send=18.554574ms, judge_wait=219.090131ms, fetch=23.378076ms, reduce=142ns; duck time-ns stats: p50=191.867003ms, p90=192.162363ms, max=192.391737ms; kernel_model: matmul=7.222591 GFLOP (37.541 GFLOP/s @ duck_max), param_stream=1.039073G (5.401 Gparam/s @ duck_max), weight_stream=1115.289 MiB (6.079 GB/s @ duck_max) [2026-04-08 08:46:36.457520 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=218, expert_tiles=435, avg_tile_batch=3.11, prepare=419.146µs, send=10.185473ms, judge_wait=119.913314ms, fetch=10.927199ms, reduce=20ns; duck time-ns stats: p50=106.902161ms, p90=107.042165ms, max=107.103703ms; kernel_model: matmul=3.721396 GFLOP (34.746 GFLOP/s @ duck_max), param_stream=0.598671G (5.590 Gparam/s @ duck_max), weight_stream=642.583 MiB (6.291 GB/s @ duck_max) [2026-04-08 08:46:36.761800 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=238, expert_tiles=749, avg_tile_batch=3.50, prepare=695.455µs, send=18.824687ms, judge_wait=224.126205ms, fetch=21.635615ms, reduce=27ns; duck time-ns stats: p50=198.225184ms, p90=198.587335ms, max=198.747089ms; kernel_model: matmul=7.222591 GFLOP (36.341 GFLOP/s @ duck_max), param_stream=1.030816G (5.187 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.837 GB/s @ duck_max) [2026-04-08 08:46:37.055352 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=221, expert_tiles=749, avg_tile_batch=3.50, prepare=659.439µs, send=18.317635ms, judge_wait=239.184952ms, fetch=20.641408ms, reduce=137ns; duck time-ns stats: p50=212.457697ms, p90=212.867322ms, max=213.230406ms; kernel_model: matmul=7.222591 GFLOP (33.872 GFLOP/s @ duck_max), param_stream=1.030816G (4.834 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.441 GB/s @ duck_max) [2026-04-08 08:46:37.329165 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=241, expert_tiles=748, avg_tile_batch=3.51, prepare=664.612µs, send=17.224218ms, judge_wait=219.585584ms, fetch=21.68533ms, reduce=19ns; duck time-ns stats: p50=193.091942ms, p90=193.415806ms, max=193.57845ms; kernel_model: matmul=7.222591 GFLOP (37.311 GFLOP/s @ duck_max), param_stream=1.029439G (5.318 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.985 GB/s @ duck_max) [2026-04-08 08:46:37.606343 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=752, avg_tile_batch=3.49, prepare=664.258µs, send=17.213768ms, judge_wait=220.981864ms, fetch=23.496156ms, reduce=135ns; duck time-ns stats: p50=193.809362ms, p90=194.223611ms, max=194.811039ms; kernel_model: matmul=7.222591 GFLOP (37.075 GFLOP/s @ duck_max), param_stream=1.034945G (5.313 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.979 GB/s @ duck_max) [2026-04-08 08:46:37.751446 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=200, expert_tiles=425, avg_tile_batch=3.18, prepare=414.852µs, send=10.15548ms, judge_wait=116.304643ms, fetch=10.56614ms, reduce=138ns; duck time-ns stats: p50=102.876499ms, p90=102.962081ms, max=103.001496ms; kernel_model: matmul=3.721396 GFLOP (36.130 GFLOP/s @ duck_max), param_stream=0.584909G (5.679 Gparam/s @ duck_max), weight_stream=627.811 MiB (6.391 GB/s @ duck_max) [2026-04-08 08:46:38.055785 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=237, expert_tiles=757, avg_tile_batch=3.47, prepare=696.012µs, send=18.914816ms, judge_wait=223.841833ms, fetch=21.650381ms, reduce=135ns; duck time-ns stats: p50=198.110349ms, p90=198.312362ms, max=198.464996ms; kernel_model: matmul=7.222591 GFLOP (36.392 GFLOP/s @ duck_max), param_stream=1.041826G (5.249 Gparam/s @ duck_max), weight_stream=1118.243 MiB (5.908 GB/s @ duck_max) [2026-04-08 08:46:38.332207 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=752, avg_tile_batch=3.49, prepare=662.853µs, send=17.234995ms, judge_wait=223.15235ms, fetch=20.643844ms, reduce=136ns; duck time-ns stats: p50=196.116544ms, p90=196.378449ms, max=196.593741ms; kernel_model: matmul=7.222591 GFLOP (36.739 GFLOP/s @ duck_max), param_stream=1.034945G (5.264 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.925 GB/s @ duck_max) [2026-04-08 08:46:38.605529 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=238, expert_tiles=754, avg_tile_batch=3.48, prepare=661.168µs, send=17.226312ms, judge_wait=220.084076ms, fetch=20.648346ms, reduce=19ns; duck time-ns stats: p50=193.944303ms, p90=194.275082ms, max=194.349813ms; kernel_model: matmul=7.222591 GFLOP (37.163 GFLOP/s @ duck_max), param_stream=1.037697G (5.339 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.009 GB/s @ duck_max) [2026-04-08 08:46:38.890347 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=232, expert_tiles=746, avg_tile_batch=3.52, prepare=664.965µs, send=17.226263ms, judge_wait=231.440657ms, fetch=20.667999ms, reduce=136ns; duck time-ns stats: p50=205.353728ms, p90=205.675117ms, max=206.160276ms; kernel_model: matmul=7.222591 GFLOP (35.034 GFLOP/s @ duck_max), param_stream=1.026687G (4.980 Gparam/s @ duck_max), weight_stream=1101.994 MiB (5.605 GB/s @ duck_max) [2026-04-08 08:46:39.036974 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=216, expert_tiles=438, avg_tile_batch=3.09, prepare=417.522µs, send=9.998641ms, judge_wait=117.960496ms, fetch=10.663945ms, reduce=136ns; duck time-ns stats: p50=104.526778ms, p90=104.753161ms, max=104.840889ms; kernel_model: matmul=3.721396 GFLOP (35.496 GFLOP/s @ duck_max), param_stream=0.602800G (5.750 Gparam/s @ duck_max), weight_stream=647.015 MiB (6.471 GB/s @ duck_max) [2026-04-08 08:46:39.337315 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=243, expert_tiles=756, avg_tile_batch=3.47, prepare=716.941µs, send=18.857099ms, judge_wait=220.903799ms, fetch=20.646205ms, reduce=540ns; duck time-ns stats: p50=196.250805ms, p90=196.615858ms, max=196.730136ms; kernel_model: matmul=7.222591 GFLOP (36.713 GFLOP/s @ duck_max), param_stream=1.040450G (5.289 Gparam/s @ duck_max), weight_stream=1116.766 MiB (5.952 GB/s @ duck_max) [2026-04-08 08:46:39.613892 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=240, expert_tiles=754, avg_tile_batch=3.48, prepare=665.555µs, send=17.227209ms, judge_wait=221.263607ms, fetch=22.698632ms, reduce=133ns; duck time-ns stats: p50=194.406804ms, p90=194.906905ms, max=195.164794ms; kernel_model: matmul=7.222591 GFLOP (37.008 GFLOP/s @ duck_max), param_stream=1.037697G (5.317 Gparam/s @ duck_max), weight_stream=1113.811 MiB (5.984 GB/s @ duck_max) [2026-04-08 08:46:39.889184 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=242, expert_tiles=750, avg_tile_batch=3.50, prepare=697.256µs, send=17.222758ms, judge_wait=219.692935ms, fetch=22.937601ms, reduce=15ns; duck time-ns stats: p50=192.511987ms, p90=192.674022ms, max=192.835647ms; kernel_model: matmul=7.222591 GFLOP (37.455 GFLOP/s @ duck_max), param_stream=1.032192G (5.353 Gparam/s @ duck_max), weight_stream=1107.903 MiB (6.024 GB/s @ duck_max) [2026-04-08 08:46:40.164596 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=245, expert_tiles=751, avg_tile_batch=3.49, prepare=662.482µs, send=18.571508ms, judge_wait=219.579308ms, fetch=21.664822ms, reduce=100ns; duck time-ns stats: p50=193.494544ms, p90=193.913791ms, max=194.285761ms; kernel_model: matmul=7.222591 GFLOP (37.175 GFLOP/s @ duck_max), param_stream=1.033568G (5.320 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.987 GB/s @ duck_max) [2026-04-08 08:46:40.308311 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=212, expert_tiles=430, avg_tile_batch=3.14, prepare=415.165µs, send=8.900394ms, judge_wait=116.014358ms, fetch=10.953604ms, reduce=15ns; duck time-ns stats: p50=103.904718ms, p90=104.185906ms, max=104.339269ms; kernel_model: matmul=3.721396 GFLOP (35.666 GFLOP/s @ duck_max), param_stream=0.591790G (5.672 Gparam/s @ duck_max), weight_stream=635.197 MiB (6.384 GB/s @ duck_max) [2026-04-08 08:46:40.607561 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=242, expert_tiles=756, avg_tile_batch=3.47, prepare=718.02µs, send=18.591824ms, judge_wait=219.045094ms, fetch=21.647733ms, reduce=133ns; duck time-ns stats: p50=193.301241ms, p90=193.457859ms, max=193.836056ms; kernel_model: matmul=7.222591 GFLOP (37.261 GFLOP/s @ duck_max), param_stream=1.040450G (5.368 Gparam/s @ duck_max), weight_stream=1116.766 MiB (6.041 GB/s @ duck_max) [2026-04-08 08:46:40.881446 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=240, expert_tiles=758, avg_tile_batch=3.46, prepare=664.381µs, send=17.2246ms, judge_wait=219.667461ms, fetch=21.645865ms, reduce=137ns; duck time-ns stats: p50=193.458525ms, p90=193.671137ms, max=193.994317ms; kernel_model: matmul=7.222591 GFLOP (37.231 GFLOP/s @ duck_max), param_stream=1.043202G (5.377 Gparam/s @ duck_max), weight_stream=1119.720 MiB (6.052 GB/s @ duck_max) [2026-04-08 08:46:41.155798 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=241, expert_tiles=753, avg_tile_batch=3.48, prepare=657.912µs, send=17.227537ms, judge_wait=220.16437ms, fetch=21.658515ms, reduce=20ns; duck time-ns stats: p50=193.763613ms, p90=193.995361ms, max=194.329202ms; kernel_model: matmul=7.222591 GFLOP (37.167 GFLOP/s @ duck_max), param_stream=1.036321G (5.333 Gparam/s @ duck_max), weight_stream=1112.334 MiB (6.002 GB/s @ duck_max) [2026-04-08 08:46:41.436873 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=239, expert_tiles=756, avg_tile_batch=3.47, prepare=661.165µs, send=17.225455ms, judge_wait=227.823202ms, fetch=20.654916ms, reduce=20ns; duck time-ns stats: p50=200.967145ms, p90=201.25999ms, max=201.405395ms; kernel_model: matmul=7.222591 GFLOP (35.861 GFLOP/s @ duck_max), param_stream=1.040450G (5.166 Gparam/s @ duck_max), weight_stream=1116.766 MiB (5.814 GB/s @ duck_max) [2026-04-08 08:46:41.581804 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=213, expert_tiles=421, avg_tile_batch=3.21, prepare=412.772µs, send=10.135057ms, judge_wait=115.378427ms, fetch=11.585264ms, reduce=20ns; duck time-ns stats: p50=102.406251ms, p90=102.623598ms, max=102.665125ms; kernel_model: matmul=3.721396 GFLOP (36.248 GFLOP/s @ duck_max), param_stream=0.579404G (5.644 Gparam/s @ duck_max), weight_stream=621.903 MiB (6.352 GB/s @ duck_max) [2026-04-08 08:46:41.883231 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=237, expert_tiles=751, avg_tile_batch=3.49, prepare=703.011µs, send=19.225611ms, judge_wait=220.568595ms, fetch=21.874763ms, reduce=19ns; duck time-ns stats: p50=194.704779ms, p90=194.9506ms, max=195.222264ms; kernel_model: matmul=7.222591 GFLOP (36.997 GFLOP/s @ duck_max), param_stream=1.033568G (5.294 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.959 GB/s @ duck_max) [2026-04-08 08:46:42.169503 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=745, avg_tile_batch=3.52, prepare=661.2µs, send=17.228528ms, judge_wait=233.092953ms, fetch=20.660758ms, reduce=19ns; duck time-ns stats: p50=210.189406ms, p90=210.472868ms, max=210.61819ms; kernel_model: matmul=7.222591 GFLOP (34.292 GFLOP/s @ duck_max), param_stream=1.025311G (4.868 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.479 GB/s @ duck_max) [2026-04-08 08:46:42.460422 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=240, expert_tiles=748, avg_tile_batch=3.51, prepare=664.043µs, send=17.225726ms, judge_wait=236.730355ms, fetch=21.638064ms, reduce=134ns; duck time-ns stats: p50=210.389395ms, p90=210.71505ms, max=211.203106ms; kernel_model: matmul=7.222591 GFLOP (34.197 GFLOP/s @ duck_max), param_stream=1.029439G (4.874 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.486 GB/s @ duck_max) [2026-04-08 08:46:42.746974 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=752, avg_tile_batch=3.49, prepare=666.261µs, send=17.226373ms, judge_wait=232.329322ms, fetch=21.634201ms, reduce=135ns; duck time-ns stats: p50=206.909767ms, p90=207.283718ms, max=207.557144ms; kernel_model: matmul=7.222591 GFLOP (34.798 GFLOP/s @ duck_max), param_stream=1.034945G (4.986 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.612 GB/s @ duck_max) [2026-04-08 08:46:42.895189 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=217, expert_tiles=433, avg_tile_batch=3.12, prepare=412.844µs, send=10.170766ms, judge_wait=118.483888ms, fetch=11.608094ms, reduce=143ns; duck time-ns stats: p50=105.706481ms, p90=105.825135ms, max=105.915318ms; kernel_model: matmul=3.721396 GFLOP (35.136 GFLOP/s @ duck_max), param_stream=0.595919G (5.626 Gparam/s @ duck_max), weight_stream=639.629 MiB (6.332 GB/s @ duck_max) [2026-04-08 08:46:43.195856 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=238, expert_tiles=747, avg_tile_batch=3.51, prepare=698.583µs, send=18.673794ms, judge_wait=219.976254ms, fetch=21.680902ms, reduce=138ns; duck time-ns stats: p50=194.367483ms, p90=194.615609ms, max=194.805246ms; kernel_model: matmul=7.222591 GFLOP (37.076 GFLOP/s @ duck_max), param_stream=1.028063G (5.277 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.940 GB/s @ duck_max) [2026-04-08 08:46:43.476415 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=235, expert_tiles=751, avg_tile_batch=3.49, prepare=660.36µs, send=17.224179ms, judge_wait=226.170887ms, fetch=21.686757ms, reduce=19ns; duck time-ns stats: p50=199.630337ms, p90=199.964972ms, max=200.11488ms; kernel_model: matmul=7.222591 GFLOP (36.092 GFLOP/s @ duck_max), param_stream=1.033568G (5.165 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.813 GB/s @ duck_max) [2026-04-08 08:46:43.752565 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=239, expert_tiles=748, avg_tile_batch=3.51, prepare=665.467µs, send=17.22663ms, judge_wait=221.944233ms, fetch=21.64654ms, reduce=19ns; duck time-ns stats: p50=196.637641ms, p90=196.836701ms, max=196.921117ms; kernel_model: matmul=7.222591 GFLOP (36.678 GFLOP/s @ duck_max), param_stream=1.029439G (5.228 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.884 GB/s @ duck_max) [2026-04-08 08:46:44.030067 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=239, expert_tiles=758, avg_tile_batch=3.46, prepare=671.362µs, send=17.225ms, judge_wait=223.297194ms, fetch=21.671552ms, reduce=20ns; duck time-ns stats: p50=197.353975ms, p90=197.66922ms, max=197.855328ms; kernel_model: matmul=7.222591 GFLOP (36.504 GFLOP/s @ duck_max), param_stream=1.043202G (5.273 Gparam/s @ duck_max), weight_stream=1119.720 MiB (5.934 GB/s @ duck_max) [2026-04-08 08:46:44.180055 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=214, expert_tiles=430, avg_tile_batch=3.14, prepare=415.08µs, send=10.207887ms, judge_wait=120.241051ms, fetch=11.587466ms, reduce=141ns; duck time-ns stats: p50=107.624202ms, p90=107.720465ms, max=107.872984ms; kernel_model: matmul=3.721396 GFLOP (34.498 GFLOP/s @ duck_max), param_stream=0.591790G (5.486 Gparam/s @ duck_max), weight_stream=635.197 MiB (6.174 GB/s @ duck_max) [2026-04-08 08:46:44.483902 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=741, avg_tile_batch=3.54, prepare=700.634µs, send=19.078244ms, judge_wait=223.289202ms, fetch=21.652064ms, reduce=133ns; duck time-ns stats: p50=199.0817ms, p90=199.347553ms, max=199.657245ms; kernel_model: matmul=7.222591 GFLOP (36.175 GFLOP/s @ duck_max), param_stream=1.019806G (5.108 Gparam/s @ duck_max), weight_stream=1094.608 MiB (5.749 GB/s @ duck_max) [2026-04-08 08:46:44.760930 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=238, expert_tiles=747, avg_tile_batch=3.51, prepare=661.799µs, send=17.224964ms, judge_wait=223.834658ms, fetch=20.644669ms, reduce=21ns; duck time-ns stats: p50=197.47343ms, p90=197.725661ms, max=197.755445ms; kernel_model: matmul=7.222591 GFLOP (36.523 GFLOP/s @ duck_max), param_stream=1.028063G (5.199 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.851 GB/s @ duck_max) [2026-04-08 08:46:45.037544 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=241, expert_tiles=753, avg_tile_batch=3.48, prepare=678.43µs, send=17.224597ms, judge_wait=222.409405ms, fetch=21.647649ms, reduce=21ns; duck time-ns stats: p50=196.3711ms, p90=196.75685ms, max=197.408314ms; kernel_model: matmul=7.222591 GFLOP (36.587 GFLOP/s @ duck_max), param_stream=1.036321G (5.250 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.908 GB/s @ duck_max) [2026-04-08 08:46:45.317287 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=241, expert_tiles=746, avg_tile_batch=3.52, prepare=662.994µs, send=17.225529ms, judge_wait=225.227222ms, fetch=21.647324ms, reduce=136ns; duck time-ns stats: p50=198.547855ms, p90=198.811452ms, max=199.024546ms; kernel_model: matmul=7.222591 GFLOP (36.290 GFLOP/s @ duck_max), param_stream=1.026687G (5.159 Gparam/s @ duck_max), weight_stream=1101.994 MiB (5.806 GB/s @ duck_max) [2026-04-08 08:46:45.461388 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=217, expert_tiles=424, avg_tile_batch=3.19, prepare=414.21µs, send=8.899607ms, judge_wait=115.701515ms, fetch=11.593913ms, reduce=19ns; duck time-ns stats: p50=102.830455ms, p90=103.046814ms, max=103.186502ms; kernel_model: matmul=3.721396 GFLOP (36.065 GFLOP/s @ duck_max), param_stream=0.583533G (5.655 Gparam/s @ duck_max), weight_stream=626.334 MiB (6.365 GB/s @ duck_max) [2026-04-08 08:46:45.763659 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=752, avg_tile_batch=3.49, prepare=694.463µs, send=19.054324ms, judge_wait=222.674607ms, fetch=20.648114ms, reduce=19ns; duck time-ns stats: p50=196.763736ms, p90=197.031694ms, max=197.109323ms; kernel_model: matmul=7.222591 GFLOP (36.643 GFLOP/s @ duck_max), param_stream=1.034945G (5.251 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.910 GB/s @ duck_max) [2026-04-08 08:46:46.045839 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=228, expert_tiles=752, avg_tile_batch=3.49, prepare=663.345µs, send=17.212858ms, judge_wait=229.059103ms, fetch=20.65362ms, reduce=140ns; duck time-ns stats: p50=202.812511ms, p90=203.159106ms, max=203.548913ms; kernel_model: matmul=7.222591 GFLOP (35.483 GFLOP/s @ duck_max), param_stream=1.034945G (5.085 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.723 GB/s @ duck_max) [2026-04-08 08:46:46.316995 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=747, avg_tile_batch=3.51, prepare=659.969µs, send=18.320867ms, judge_wait=216.853212ms, fetch=20.663341ms, reduce=136ns; duck time-ns stats: p50=190.185145ms, p90=190.338944ms, max=190.417333ms; kernel_model: matmul=7.222591 GFLOP (37.930 GFLOP/s @ duck_max), param_stream=1.028063G (5.399 Gparam/s @ duck_max), weight_stream=1103.471 MiB (6.077 GB/s @ duck_max) [2026-04-08 08:46:46.593260 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=228, expert_tiles=747, avg_tile_batch=3.51, prepare=672.365µs, send=17.223697ms, judge_wait=220.69708ms, fetch=22.982755ms, reduce=20ns; duck time-ns stats: p50=193.751443ms, p90=194.106484ms, max=194.415665ms; kernel_model: matmul=7.222591 GFLOP (37.150 GFLOP/s @ duck_max), param_stream=1.028063G (5.288 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.952 GB/s @ duck_max) [2026-04-08 08:46:46.739670 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=204, expert_tiles=436, avg_tile_batch=3.10, prepare=415.292µs, send=10.149427ms, judge_wait=117.441587ms, fetch=10.910443ms, reduce=136ns; duck time-ns stats: p50=105.367863ms, p90=105.475169ms, max=105.581093ms; kernel_model: matmul=3.721396 GFLOP (35.247 GFLOP/s @ duck_max), param_stream=0.600048G (5.683 Gparam/s @ duck_max), weight_stream=644.061 MiB (6.396 GB/s @ duck_max) [2026-04-08 08:46:47.064435 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=207, expert_tiles=743, avg_tile_batch=3.53, prepare=695.915µs, send=18.905374ms, judge_wait=244.462732ms, fetch=21.642563ms, reduce=136ns; duck time-ns stats: p50=219.999737ms, p90=220.216325ms, max=220.403583ms; kernel_model: matmul=7.222591 GFLOP (32.770 GFLOP/s @ duck_max), param_stream=1.022558G (4.639 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.222 GB/s @ duck_max) [2026-04-08 08:46:47.343798 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=209, expert_tiles=747, avg_tile_batch=3.51, prepare=665.691µs, send=17.228609ms, judge_wait=222.233607ms, fetch=24.513447ms, reduce=20ns; duck time-ns stats: p50=194.651726ms, p90=195.012498ms, max=195.065327ms; kernel_model: matmul=7.222591 GFLOP (37.027 GFLOP/s @ duck_max), param_stream=1.028063G (5.270 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.932 GB/s @ duck_max) [2026-04-08 08:46:47.618604 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=224, expert_tiles=744, avg_tile_batch=3.53, prepare=659.561µs, send=18.527396ms, judge_wait=220.15155ms, fetch=20.784669ms, reduce=20ns; duck time-ns stats: p50=194.238657ms, p90=194.475596ms, max=194.510906ms; kernel_model: matmul=7.222591 GFLOP (37.132 GFLOP/s @ duck_max), param_stream=1.023934G (5.264 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.925 GB/s @ duck_max) [2026-04-08 08:46:47.897271 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=751, avg_tile_batch=3.49, prepare=663.446µs, send=17.211303ms, judge_wait=224.359543ms, fetch=21.661895ms, reduce=135ns; duck time-ns stats: p50=197.496609ms, p90=197.995845ms, max=198.194586ms; kernel_model: matmul=7.222591 GFLOP (36.442 GFLOP/s @ duck_max), param_stream=1.033568G (5.215 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.869 GB/s @ duck_max) [2026-04-08 08:46:48.043209 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=192, expert_tiles=422, avg_tile_batch=3.20, prepare=411.954µs, send=10.065783ms, judge_wait=116.376584ms, fetch=11.59102ms, reduce=134ns; duck time-ns stats: p50=103.555113ms, p90=103.652505ms, max=103.740263ms; kernel_model: matmul=3.721396 GFLOP (35.872 GFLOP/s @ duck_max), param_stream=0.580780G (5.598 Gparam/s @ duck_max), weight_stream=623.380 MiB (6.301 GB/s @ duck_max) [2026-04-08 08:46:48.343605 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=220, expert_tiles=756, avg_tile_batch=3.47, prepare=691.07µs, send=19.009339ms, judge_wait=220.773811ms, fetch=20.635172ms, reduce=24ns; duck time-ns stats: p50=194.510211ms, p90=194.902385ms, max=195.121131ms; kernel_model: matmul=7.222591 GFLOP (37.016 GFLOP/s @ duck_max), param_stream=1.040450G (5.332 Gparam/s @ duck_max), weight_stream=1116.766 MiB (6.001 GB/s @ duck_max) [2026-04-08 08:46:48.637541 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=217, expert_tiles=751, avg_tile_batch=3.49, prepare=667.891µs, send=17.225674ms, judge_wait=240.685524ms, fetch=20.645048ms, reduce=133ns; duck time-ns stats: p50=216.479944ms, p90=216.953283ms, max=217.43854ms; kernel_model: matmul=7.222591 GFLOP (33.217 GFLOP/s @ duck_max), param_stream=1.033568G (4.753 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.350 GB/s @ duck_max) [2026-04-08 08:46:48.911585 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=223, expert_tiles=746, avg_tile_batch=3.52, prepare=662.966µs, send=17.225714ms, judge_wait=219.844821ms, fetch=21.658878ms, reduce=136ns; duck time-ns stats: p50=193.327651ms, p90=193.648293ms, max=194.534066ms; kernel_model: matmul=7.222591 GFLOP (37.128 GFLOP/s @ duck_max), param_stream=1.026687G (5.278 Gparam/s @ duck_max), weight_stream=1101.994 MiB (5.940 GB/s @ duck_max) [2026-04-08 08:46:49.185807 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=216, expert_tiles=743, avg_tile_batch=3.53, prepare=661.046µs, send=17.226151ms, judge_wait=217.738867ms, fetch=23.931367ms, reduce=20ns; duck time-ns stats: p50=189.876732ms, p90=190.172036ms, max=190.335913ms; kernel_model: matmul=7.222591 GFLOP (37.947 GFLOP/s @ duck_max), param_stream=1.022558G (5.372 Gparam/s @ duck_max), weight_stream=1097.562 MiB (6.047 GB/s @ duck_max) [2026-04-08 08:46:49.334588 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=201, expert_tiles=427, avg_tile_batch=3.17, prepare=415.47µs, send=10.183037ms, judge_wait=119.111351ms, fetch=11.536351ms, reduce=20ns; duck time-ns stats: p50=105.037001ms, p90=105.298412ms, max=105.344144ms; kernel_model: matmul=3.721396 GFLOP (35.326 GFLOP/s @ duck_max), param_stream=0.587661G (5.578 Gparam/s @ duck_max), weight_stream=630.766 MiB (6.279 GB/s @ duck_max) [2026-04-08 08:46:49.632277 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=201, expert_tiles=738, avg_tile_batch=3.56, prepare=709.528µs, send=19.087215ms, judge_wait=218.164795ms, fetch=20.635615ms, reduce=135ns; duck time-ns stats: p50=191.828717ms, p90=192.237365ms, max=192.27947ms; kernel_model: matmul=7.222591 GFLOP (37.563 GFLOP/s @ duck_max), param_stream=1.015677G (5.282 Gparam/s @ duck_max), weight_stream=1090.176 MiB (5.945 GB/s @ duck_max) [2026-04-08 08:46:49.911275 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=210, expert_tiles=740, avg_tile_batch=3.55, prepare=661.548µs, send=17.212463ms, judge_wait=221.943753ms, fetch=24.435064ms, reduce=20ns; duck time-ns stats: p50=194.178586ms, p90=194.550789ms, max=194.653096ms; kernel_model: matmul=7.222591 GFLOP (37.105 GFLOP/s @ duck_max), param_stream=1.018429G (5.232 Gparam/s @ duck_max), weight_stream=1093.130 MiB (5.889 GB/s @ duck_max) [2026-04-08 08:46:50.233029 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=212, expert_tiles=746, avg_tile_batch=3.52, prepare=662.785µs, send=17.225457ms, judge_wait=267.46787ms, fetch=21.663089ms, reduce=20ns; duck time-ns stats: p50=241.838785ms, p90=242.363636ms, max=242.535719ms; kernel_model: matmul=7.222591 GFLOP (29.779 GFLOP/s @ duck_max), param_stream=1.026687G (4.233 Gparam/s @ duck_max), weight_stream=1101.994 MiB (4.764 GB/s @ duck_max) [2026-04-08 08:46:50.519546 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=215, expert_tiles=752, avg_tile_batch=3.49, prepare=685.832µs, send=17.224294ms, judge_wait=229.54773ms, fetch=24.314442ms, reduce=137ns; duck time-ns stats: p50=201.820276ms, p90=202.189211ms, max=202.386463ms; kernel_model: matmul=7.222591 GFLOP (35.687 GFLOP/s @ duck_max), param_stream=1.034945G (5.114 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.755 GB/s @ duck_max) [2026-04-08 08:46:50.664327 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=190, expert_tiles=425, avg_tile_batch=3.18, prepare=417.729µs, send=10.101498ms, judge_wait=116.162102ms, fetch=10.581842ms, reduce=130ns; duck time-ns stats: p50=103.759935ms, p90=103.900271ms, max=103.952369ms; kernel_model: matmul=3.721396 GFLOP (35.799 GFLOP/s @ duck_max), param_stream=0.584909G (5.627 Gparam/s @ duck_max), weight_stream=627.811 MiB (6.333 GB/s @ duck_max) [2026-04-08 08:46:50.963181 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=747, avg_tile_batch=3.51, prepare=700.314µs, send=18.59575ms, judge_wait=218.355941ms, fetch=21.651334ms, reduce=136ns; duck time-ns stats: p50=191.711644ms, p90=191.975175ms, max=192.117658ms; kernel_model: matmul=7.222591 GFLOP (37.595 GFLOP/s @ duck_max), param_stream=1.028063G (5.351 Gparam/s @ duck_max), weight_stream=1103.471 MiB (6.023 GB/s @ duck_max) [2026-04-08 08:46:51.236892 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=210, expert_tiles=743, avg_tile_batch=3.53, prepare=663.231µs, send=17.221885ms, judge_wait=218.127059ms, fetch=23.044698ms, reduce=149ns; duck time-ns stats: p50=189.973277ms, p90=190.199181ms, max=190.294275ms; kernel_model: matmul=7.222591 GFLOP (37.955 GFLOP/s @ duck_max), param_stream=1.022558G (5.374 Gparam/s @ duck_max), weight_stream=1097.562 MiB (6.048 GB/s @ duck_max) [2026-04-08 08:46:51.519843 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=749, avg_tile_batch=3.50, prepare=660.528µs, send=17.222267ms, judge_wait=228.780074ms, fetch=21.672044ms, reduce=20ns; duck time-ns stats: p50=202.662961ms, p90=202.994213ms, max=203.164962ms; kernel_model: matmul=7.222591 GFLOP (35.550 GFLOP/s @ duck_max), param_stream=1.030816G (5.074 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.710 GB/s @ duck_max) [2026-04-08 08:46:51.805879 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=221, expert_tiles=746, avg_tile_batch=3.52, prepare=662.146µs, send=18.319404ms, judge_wait=230.756901ms, fetch=21.645785ms, reduce=136ns; duck time-ns stats: p50=205.046988ms, p90=205.362156ms, max=205.626037ms; kernel_model: matmul=7.222591 GFLOP (35.125 GFLOP/s @ duck_max), param_stream=1.026687G (4.993 Gparam/s @ duck_max), weight_stream=1101.994 MiB (5.620 GB/s @ duck_max) [2026-04-08 08:46:51.951344 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=187, expert_tiles=420, avg_tile_batch=3.22, prepare=411.609µs, send=10.181073ms, judge_wait=116.499796ms, fetch=10.927414ms, reduce=22ns; duck time-ns stats: p50=103.907357ms, p90=104.203297ms, max=104.315417ms; kernel_model: matmul=3.721396 GFLOP (35.674 GFLOP/s @ duck_max), param_stream=0.578028G (5.541 Gparam/s @ duck_max), weight_stream=620.425 MiB (6.237 GB/s @ duck_max) [2026-04-08 08:46:52.245752 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=212, expert_tiles=741, avg_tile_batch=3.54, prepare=694.654µs, send=18.856605ms, judge_wait=214.221063ms, fetch=21.665371ms, reduce=132ns; duck time-ns stats: p50=189.726571ms, p90=189.928398ms, max=190.305205ms; kernel_model: matmul=7.222591 GFLOP (37.953 GFLOP/s @ duck_max), param_stream=1.019806G (5.359 Gparam/s @ duck_max), weight_stream=1094.608 MiB (6.031 GB/s @ duck_max) [2026-04-08 08:46:52.525989 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=221, expert_tiles=749, avg_tile_batch=3.50, prepare=660.459µs, send=18.339967ms, judge_wait=224.908533ms, fetch=21.697949ms, reduce=134ns; duck time-ns stats: p50=201.386871ms, p90=201.59181ms, max=201.855233ms; kernel_model: matmul=7.222591 GFLOP (35.781 GFLOP/s @ duck_max), param_stream=1.030816G (5.107 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.748 GB/s @ duck_max) [2026-04-08 08:46:52.799978 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=751, avg_tile_batch=3.49, prepare=656.86µs, send=17.225009ms, judge_wait=217.661614ms, fetch=23.760095ms, reduce=20ns; duck time-ns stats: p50=190.329416ms, p90=190.631561ms, max=190.804427ms; kernel_model: matmul=7.222591 GFLOP (37.853 GFLOP/s @ duck_max), param_stream=1.033568G (5.417 Gparam/s @ duck_max), weight_stream=1109.380 MiB (6.097 GB/s @ duck_max) [2026-04-08 08:46:53.079727 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=219, expert_tiles=750, avg_tile_batch=3.50, prepare=660.26µs, send=18.452898ms, judge_wait=222.583874ms, fetch=23.345788ms, reduce=135ns; duck time-ns stats: p50=195.44967ms, p90=195.762824ms, max=195.910324ms; kernel_model: matmul=7.222591 GFLOP (36.867 GFLOP/s @ duck_max), param_stream=1.032192G (5.269 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.930 GB/s @ duck_max) [2026-04-08 08:46:53.228661 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=200, expert_tiles=428, avg_tile_batch=3.16, prepare=413.36µs, send=10.173076ms, judge_wait=119.278263ms, fetch=11.573623ms, reduce=23ns; duck time-ns stats: p50=105.546483ms, p90=105.717252ms, max=105.750882ms; kernel_model: matmul=3.721396 GFLOP (35.190 GFLOP/s @ duck_max), param_stream=0.589038G (5.570 Gparam/s @ duck_max), weight_stream=632.243 MiB (6.269 GB/s @ duck_max) [2026-04-08 08:46:53.530122 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=220, expert_tiles=738, avg_tile_batch=3.56, prepare=699.461µs, send=19.132003ms, judge_wait=219.587121ms, fetch=22.938422ms, reduce=20ns; duck time-ns stats: p50=192.297562ms, p90=192.601257ms, max=192.702655ms; kernel_model: matmul=7.222591 GFLOP (37.480 GFLOP/s @ duck_max), param_stream=1.015677G (5.271 Gparam/s @ duck_max), weight_stream=1090.176 MiB (5.932 GB/s @ duck_max) [2026-04-08 08:46:53.804202 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=755, avg_tile_batch=3.48, prepare=668.734µs, send=17.223349ms, judge_wait=220.914061ms, fetch=20.631307ms, reduce=137ns; duck time-ns stats: p50=194.239542ms, p90=194.701295ms, max=194.894495ms; kernel_model: matmul=7.222591 GFLOP (37.059 GFLOP/s @ duck_max), param_stream=1.039073G (5.331 Gparam/s @ duck_max), weight_stream=1115.289 MiB (6.001 GB/s @ duck_max) [2026-04-08 08:46:54.081076 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=228, expert_tiles=752, avg_tile_batch=3.49, prepare=667.952µs, send=17.222901ms, judge_wait=223.71489ms, fetch=20.652731ms, reduce=102ns; duck time-ns stats: p50=196.61455ms, p90=196.933133ms, max=197.275876ms; kernel_model: matmul=7.222591 GFLOP (36.612 GFLOP/s @ duck_max), param_stream=1.034945G (5.246 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.905 GB/s @ duck_max) [2026-04-08 08:46:54.357511 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=749, avg_tile_batch=3.50, prepare=661.446µs, send=17.223768ms, judge_wait=221.485822ms, fetch=22.377128ms, reduce=101ns; duck time-ns stats: p50=194.359946ms, p90=195.071145ms, max=195.154192ms; kernel_model: matmul=7.222591 GFLOP (37.010 GFLOP/s @ duck_max), param_stream=1.030816G (5.282 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.945 GB/s @ duck_max) [2026-04-08 08:46:54.501614 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=191, expert_tiles=421, avg_tile_batch=3.21, prepare=415.74µs, send=10.166584ms, judge_wait=115.232588ms, fetch=10.871018ms, reduce=101ns; duck time-ns stats: p50=103.096201ms, p90=103.302921ms, max=103.614797ms; kernel_model: matmul=3.721396 GFLOP (35.916 GFLOP/s @ duck_max), param_stream=0.579404G (5.592 Gparam/s @ duck_max), weight_stream=621.903 MiB (6.294 GB/s @ duck_max) [2026-04-08 08:46:54.805101 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=220, expert_tiles=753, avg_tile_batch=3.48, prepare=703.244µs, send=18.920109ms, judge_wait=224.143115ms, fetch=20.66366ms, reduce=20ns; duck time-ns stats: p50=198.924101ms, p90=199.166928ms, max=199.370869ms; kernel_model: matmul=7.222591 GFLOP (36.227 GFLOP/s @ duck_max), param_stream=1.036321G (5.198 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.850 GB/s @ duck_max) [2026-04-08 08:46:55.084343 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=219, expert_tiles=744, avg_tile_batch=3.53, prepare=669.206µs, send=17.223324ms, judge_wait=225.025075ms, fetch=21.665544ms, reduce=136ns; duck time-ns stats: p50=200.567697ms, p90=200.823943ms, max=200.984637ms; kernel_model: matmul=7.222591 GFLOP (35.936 GFLOP/s @ duck_max), param_stream=1.023934G (5.095 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.734 GB/s @ duck_max) [2026-04-08 08:46:55.358234 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=758, avg_tile_batch=3.46, prepare=661.726µs, send=17.225233ms, judge_wait=220.626444ms, fetch=20.632161ms, reduce=104ns; duck time-ns stats: p50=195.969046ms, p90=196.738121ms, max=196.863798ms; kernel_model: matmul=7.222591 GFLOP (36.688 GFLOP/s @ duck_max), param_stream=1.043202G (5.299 Gparam/s @ duck_max), weight_stream=1119.720 MiB (5.964 GB/s @ duck_max) [2026-04-08 08:46:55.632719 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=238, expert_tiles=751, avg_tile_batch=3.49, prepare=667.53µs, send=17.21181ms, judge_wait=220.301903ms, fetch=21.613269ms, reduce=99ns; duck time-ns stats: p50=194.647155ms, p90=194.873578ms, max=194.990828ms; kernel_model: matmul=7.222591 GFLOP (37.041 GFLOP/s @ duck_max), param_stream=1.033568G (5.301 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.966 GB/s @ duck_max) [2026-04-08 08:46:55.779491 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=192, expert_tiles=424, avg_tile_batch=3.19, prepare=412.974µs, send=10.18782ms, judge_wait=117.205174ms, fetch=11.55397ms, reduce=16ns; duck time-ns stats: p50=104.259696ms, p90=104.366685ms, max=104.460732ms; kernel_model: matmul=3.721396 GFLOP (35.625 GFLOP/s @ duck_max), param_stream=0.583533G (5.586 Gparam/s @ duck_max), weight_stream=626.334 MiB (6.287 GB/s @ duck_max) [2026-04-08 08:46:56.081358 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=224, expert_tiles=741, avg_tile_batch=3.54, prepare=694.976µs, send=19.22636ms, judge_wait=221.248409ms, fetch=21.675166ms, reduce=539ns; duck time-ns stats: p50=195.862404ms, p90=196.08426ms, max=196.171866ms; kernel_model: matmul=7.222591 GFLOP (36.818 GFLOP/s @ duck_max), param_stream=1.019806G (5.199 Gparam/s @ duck_max), weight_stream=1094.608 MiB (5.851 GB/s @ duck_max) [2026-04-08 08:46:56.363272 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=756, avg_tile_batch=3.47, prepare=681.58µs, send=17.223406ms, judge_wait=227.756333ms, fetch=21.668918ms, reduce=19ns; duck time-ns stats: p50=201.282481ms, p90=201.687427ms, max=202.177291ms; kernel_model: matmul=7.222591 GFLOP (35.724 GFLOP/s @ duck_max), param_stream=1.040450G (5.146 Gparam/s @ duck_max), weight_stream=1116.766 MiB (5.792 GB/s @ duck_max) [2026-04-08 08:46:56.653616 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=749, avg_tile_batch=3.50, prepare=706.507µs, send=18.328506ms, judge_wait=235.968251ms, fetch=20.659931ms, reduce=135ns; duck time-ns stats: p50=210.293902ms, p90=210.42051ms, max=210.534089ms; kernel_model: matmul=7.222591 GFLOP (34.306 GFLOP/s @ duck_max), param_stream=1.030816G (4.896 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.511 GB/s @ duck_max) [2026-04-08 08:46:56.925993 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=748, avg_tile_batch=3.51, prepare=678.477µs, send=17.223843ms, judge_wait=218.080917ms, fetch=21.718307ms, reduce=137ns; duck time-ns stats: p50=193.451354ms, p90=193.812847ms, max=194.006734ms; kernel_model: matmul=7.222591 GFLOP (37.229 GFLOP/s @ duck_max), param_stream=1.029439G (5.306 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.972 GB/s @ duck_max) [2026-04-08 08:46:57.071650 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=191, expert_tiles=423, avg_tile_batch=3.20, prepare=437.331µs, send=10.104919ms, judge_wait=116.902897ms, fetch=10.728652ms, reduce=25ns; duck time-ns stats: p50=103.620441ms, p90=103.743823ms, max=103.893508ms; kernel_model: matmul=3.721396 GFLOP (35.819 GFLOP/s @ duck_max), param_stream=0.582156G (5.603 Gparam/s @ duck_max), weight_stream=624.857 MiB (6.307 GB/s @ duck_max) [2026-04-08 08:46:57.372178 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=753, avg_tile_batch=3.48, prepare=705.029µs, send=19.050664ms, judge_wait=220.08988ms, fetch=21.636075ms, reduce=133ns; duck time-ns stats: p50=194.825293ms, p90=195.07478ms, max=195.255594ms; kernel_model: matmul=7.222591 GFLOP (36.990 GFLOP/s @ duck_max), param_stream=1.036321G (5.308 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.974 GB/s @ duck_max) [2026-04-08 08:46:57.650069 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=224, expert_tiles=744, avg_tile_batch=3.53, prepare=660.272µs, send=17.225276ms, judge_wait=221.132262ms, fetch=24.22866ms, reduce=20ns; duck time-ns stats: p50=193.598508ms, p90=194.055041ms, max=194.344928ms; kernel_model: matmul=7.222591 GFLOP (37.164 GFLOP/s @ duck_max), param_stream=1.023934G (5.269 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.930 GB/s @ duck_max) [2026-04-08 08:46:57.927918 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=222, expert_tiles=753, avg_tile_batch=3.48, prepare=660.93µs, send=18.341749ms, judge_wait=220.742334ms, fetch=23.366163ms, reduce=20ns; duck time-ns stats: p50=192.47225ms, p90=192.799282ms, max=193.068706ms; kernel_model: matmul=7.222591 GFLOP (37.409 GFLOP/s @ duck_max), param_stream=1.036321G (5.368 Gparam/s @ duck_max), weight_stream=1112.334 MiB (6.041 GB/s @ duck_max) [2026-04-08 08:46:58.207380 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=754, avg_tile_batch=3.48, prepare=659.094µs, send=17.225686ms, judge_wait=222.413774ms, fetch=24.405385ms, reduce=20ns; duck time-ns stats: p50=194.920965ms, p90=195.199294ms, max=195.446178ms; kernel_model: matmul=7.222591 GFLOP (36.954 GFLOP/s @ duck_max), param_stream=1.037697G (5.309 Gparam/s @ duck_max), weight_stream=1113.811 MiB (5.976 GB/s @ duck_max) [2026-04-08 08:46:58.355650 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=205, expert_tiles=434, avg_tile_batch=3.12, prepare=417.535µs, send=10.099182ms, judge_wait=119.517377ms, fetch=10.843525ms, reduce=19ns; duck time-ns stats: p50=107.336075ms, p90=107.477328ms, max=107.626783ms; kernel_model: matmul=3.721396 GFLOP (34.577 GFLOP/s @ duck_max), param_stream=0.597295G (5.550 Gparam/s @ duck_max), weight_stream=641.106 MiB (6.246 GB/s @ duck_max) [2026-04-08 08:46:58.656790 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=220, expert_tiles=752, avg_tile_batch=3.49, prepare=702.73µs, send=18.982147ms, judge_wait=220.753855ms, fetch=21.659811ms, reduce=21ns; duck time-ns stats: p50=194.69908ms, p90=195.251111ms, max=196.083206ms; kernel_model: matmul=7.222591 GFLOP (36.834 GFLOP/s @ duck_max), param_stream=1.034945G (5.278 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.940 GB/s @ duck_max) [2026-04-08 08:46:58.941708 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=221, expert_tiles=746, avg_tile_batch=3.52, prepare=666.423µs, send=17.230802ms, judge_wait=230.746682ms, fetch=21.598208ms, reduce=139ns; duck time-ns stats: p50=204.412102ms, p90=204.713234ms, max=204.874932ms; kernel_model: matmul=7.222591 GFLOP (35.254 GFLOP/s @ duck_max), param_stream=1.026687G (5.011 Gparam/s @ duck_max), weight_stream=1101.994 MiB (5.640 GB/s @ duck_max) [2026-04-08 08:46:59.219861 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=228, expert_tiles=751, avg_tile_batch=3.49, prepare=659.968µs, send=17.215058ms, judge_wait=220.815553ms, fetch=24.814796ms, reduce=104ns; duck time-ns stats: p50=193.411634ms, p90=193.699846ms, max=193.925405ms; kernel_model: matmul=7.222591 GFLOP (37.244 GFLOP/s @ duck_max), param_stream=1.033568G (5.330 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.999 GB/s @ duck_max) [2026-04-08 08:46:59.501452 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=746, avg_tile_batch=3.52, prepare=669.33µs, send=17.225322ms, judge_wait=228.323611ms, fetch=20.627671ms, reduce=21ns; duck time-ns stats: p50=201.704889ms, p90=202.17517ms, max=202.863867ms; kernel_model: matmul=7.222591 GFLOP (35.603 GFLOP/s @ duck_max), param_stream=1.026687G (5.061 Gparam/s @ duck_max), weight_stream=1101.994 MiB (5.696 GB/s @ duck_max) [2026-04-08 08:46:59.662889 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=207, expert_tiles=426, avg_tile_batch=3.17, prepare=418.704µs, send=10.093444ms, judge_wait=131.888876ms, fetch=11.596583ms, reduce=20ns; duck time-ns stats: p50=120.926772ms, p90=121.130069ms, max=121.26967ms; kernel_model: matmul=3.721396 GFLOP (30.687 GFLOP/s @ duck_max), param_stream=0.586285G (4.835 Gparam/s @ duck_max), weight_stream=629.289 MiB (5.441 GB/s @ duck_max) [2026-04-08 08:46:59.961822 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=222, expert_tiles=748, avg_tile_batch=3.51, prepare=697.807µs, send=19.270782ms, judge_wait=219.277383ms, fetch=20.659774ms, reduce=20ns; duck time-ns stats: p50=193.168826ms, p90=193.563478ms, max=193.740523ms; kernel_model: matmul=7.222591 GFLOP (37.280 GFLOP/s @ duck_max), param_stream=1.029439G (5.313 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.980 GB/s @ duck_max) [2026-04-08 08:47:00.240467 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=238, expert_tiles=758, avg_tile_batch=3.46, prepare=661.705µs, send=17.235328ms, judge_wait=221.558524ms, fetch=24.488547ms, reduce=19ns; duck time-ns stats: p50=193.747426ms, p90=194.138282ms, max=194.475258ms; kernel_model: matmul=7.222591 GFLOP (37.139 GFLOP/s @ duck_max), param_stream=1.043202G (5.364 Gparam/s @ duck_max), weight_stream=1119.720 MiB (6.037 GB/s @ duck_max) [2026-04-08 08:47:00.517453 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=747, avg_tile_batch=3.51, prepare=663.461µs, send=18.538435ms, judge_wait=218.602806ms, fetch=24.525975ms, reduce=19ns; duck time-ns stats: p50=190.95805ms, p90=191.246883ms, max=191.29424ms; kernel_model: matmul=7.222591 GFLOP (37.756 GFLOP/s @ duck_max), param_stream=1.028063G (5.374 Gparam/s @ duck_max), weight_stream=1103.471 MiB (6.049 GB/s @ duck_max) [2026-04-08 08:47:00.794827 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=749, avg_tile_batch=3.50, prepare=669.929µs, send=17.224523ms, judge_wait=221.033113ms, fetch=23.799654ms, reduce=20ns; duck time-ns stats: p50=192.896917ms, p90=193.263123ms, max=193.568742ms; kernel_model: matmul=7.222591 GFLOP (37.313 GFLOP/s @ duck_max), param_stream=1.030816G (5.325 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.994 GB/s @ duck_max) [2026-04-08 08:47:00.945089 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=203, expert_tiles=430, avg_tile_batch=3.14, prepare=411.215µs, send=10.204732ms, judge_wait=121.33401ms, fetch=10.875633ms, reduce=21ns; duck time-ns stats: p50=108.231805ms, p90=108.366456ms, max=108.449748ms; kernel_model: matmul=3.721396 GFLOP (34.314 GFLOP/s @ duck_max), param_stream=0.591790G (5.457 Gparam/s @ duck_max), weight_stream=635.197 MiB (6.142 GB/s @ duck_max) [2026-04-08 08:47:01.248006 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=752, avg_tile_batch=3.49, prepare=703.253µs, send=18.832104ms, judge_wait=223.60298ms, fetch=20.643368ms, reduce=132ns; duck time-ns stats: p50=199.028516ms, p90=199.535003ms, max=199.681443ms; kernel_model: matmul=7.222591 GFLOP (36.171 GFLOP/s @ duck_max), param_stream=1.034945G (5.183 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.833 GB/s @ duck_max) [2026-04-08 08:47:01.521520 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=215, expert_tiles=749, avg_tile_batch=3.50, prepare=659.999µs, send=17.223907ms, judge_wait=219.287278ms, fetch=21.618963ms, reduce=142ns; duck time-ns stats: p50=192.70874ms, p90=193.223311ms, max=193.896722ms; kernel_model: matmul=7.222591 GFLOP (37.250 GFLOP/s @ duck_max), param_stream=1.030816G (5.316 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.983 GB/s @ duck_max) [2026-04-08 08:47:01.799687 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=221, expert_tiles=752, avg_tile_batch=3.49, prepare=669.399µs, send=18.443899ms, judge_wait=220.560275ms, fetch=23.802502ms, reduce=140ns; duck time-ns stats: p50=193.248565ms, p90=193.552945ms, max=193.685778ms; kernel_model: matmul=7.222591 GFLOP (37.290 GFLOP/s @ duck_max), param_stream=1.034945G (5.343 Gparam/s @ duck_max), weight_stream=1110.857 MiB (6.014 GB/s @ duck_max) [2026-04-08 08:47:02.072622 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=220, expert_tiles=744, avg_tile_batch=3.53, prepare=658.987µs, send=18.580387ms, judge_wait=218.369857ms, fetch=20.649797ms, reduce=20ns; duck time-ns stats: p50=192.220024ms, p90=192.634413ms, max=192.81418ms; kernel_model: matmul=7.222591 GFLOP (37.459 GFLOP/s @ duck_max), param_stream=1.023934G (5.310 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.977 GB/s @ duck_max) [2026-04-08 08:47:02.218786 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=201, expert_tiles=439, avg_tile_batch=3.08, prepare=416.184µs, send=10.184951ms, judge_wait=117.18339ms, fetch=10.908899ms, reduce=20ns; duck time-ns stats: p50=104.671714ms, p90=104.817426ms, max=104.867303ms; kernel_model: matmul=3.721396 GFLOP (35.487 GFLOP/s @ duck_max), param_stream=0.604176G (5.761 Gparam/s @ duck_max), weight_stream=648.492 MiB (6.484 GB/s @ duck_max) [2026-04-08 08:47:02.518850 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=224, expert_tiles=742, avg_tile_batch=3.54, prepare=698.588µs, send=18.84793ms, judge_wait=219.803989ms, fetch=21.652972ms, reduce=20ns; duck time-ns stats: p50=193.637492ms, p90=193.904194ms, max=194.016549ms; kernel_model: matmul=7.222591 GFLOP (37.227 GFLOP/s @ duck_max), param_stream=1.021182G (5.263 Gparam/s @ duck_max), weight_stream=1096.085 MiB (5.924 GB/s @ duck_max) [2026-04-08 08:47:02.793142 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=222, expert_tiles=747, avg_tile_batch=3.51, prepare=661.515µs, send=17.214705ms, judge_wait=217.752705ms, fetch=23.963219ms, reduce=19ns; duck time-ns stats: p50=190.357407ms, p90=190.781192ms, max=190.968361ms; kernel_model: matmul=7.222591 GFLOP (37.821 GFLOP/s @ duck_max), param_stream=1.028063G (5.383 Gparam/s @ duck_max), weight_stream=1103.471 MiB (6.059 GB/s @ duck_max) [2026-04-08 08:47:03.074970 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=749, avg_tile_batch=3.50, prepare=660.068µs, send=18.48027ms, judge_wait=223.409925ms, fetch=24.580505ms, reduce=19ns; duck time-ns stats: p50=195.712914ms, p90=196.063651ms, max=196.235416ms; kernel_model: matmul=7.222591 GFLOP (36.806 GFLOP/s @ duck_max), param_stream=1.030816G (5.253 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.912 GB/s @ duck_max) [2026-04-08 08:47:03.355141 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=222, expert_tiles=749, avg_tile_batch=3.50, prepare=662.586µs, send=18.525423ms, judge_wait=223.386184ms, fetch=22.917705ms, reduce=20ns; duck time-ns stats: p50=196.312429ms, p90=196.906952ms, max=197.036793ms; kernel_model: matmul=7.222591 GFLOP (36.656 GFLOP/s @ duck_max), param_stream=1.030816G (5.232 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.888 GB/s @ duck_max) [2026-04-08 08:47:03.502424 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=199, expert_tiles=426, avg_tile_batch=3.17, prepare=420.065µs, send=10.171287ms, judge_wait=117.611528ms, fetch=11.590393ms, reduce=142ns; duck time-ns stats: p50=106.012047ms, p90=106.149279ms, max=106.200865ms; kernel_model: matmul=3.721396 GFLOP (35.041 GFLOP/s @ duck_max), param_stream=0.586285G (5.521 Gparam/s @ duck_max), weight_stream=629.289 MiB (6.213 GB/s @ duck_max) [2026-04-08 08:47:03.805217 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=751, avg_tile_batch=3.49, prepare=699.425µs, send=19.127242ms, judge_wait=223.243803ms, fetch=20.625296ms, reduce=130ns; duck time-ns stats: p50=199.475646ms, p90=199.647734ms, max=199.907568ms; kernel_model: matmul=7.222591 GFLOP (36.130 GFLOP/s @ duck_max), param_stream=1.033568G (5.170 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.819 GB/s @ duck_max) [2026-04-08 08:47:04.080865 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=745, avg_tile_batch=3.52, prepare=668.92µs, send=17.225743ms, judge_wait=221.445023ms, fetch=21.671786ms, reduce=138ns; duck time-ns stats: p50=194.927454ms, p90=195.217447ms, max=195.287026ms; kernel_model: matmul=7.222591 GFLOP (36.984 GFLOP/s @ duck_max), param_stream=1.025311G (5.250 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.909 GB/s @ duck_max) [2026-04-08 08:47:04.353024 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=240, expert_tiles=754, avg_tile_batch=3.48, prepare=671.332µs, send=17.224083ms, judge_wait=217.97027ms, fetch=21.624796ms, reduce=135ns; duck time-ns stats: p50=191.148308ms, p90=191.463957ms, max=191.59127ms; kernel_model: matmul=7.222591 GFLOP (37.698 GFLOP/s @ duck_max), param_stream=1.037697G (5.416 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.096 GB/s @ duck_max) [2026-04-08 08:47:04.630277 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=746, avg_tile_batch=3.52, prepare=667.151µs, send=17.222725ms, judge_wait=222.385637ms, fetch=22.340389ms, reduce=140ns; duck time-ns stats: p50=195.240232ms, p90=195.514425ms, max=195.781377ms; kernel_model: matmul=7.222591 GFLOP (36.891 GFLOP/s @ duck_max), param_stream=1.026687G (5.244 Gparam/s @ duck_max), weight_stream=1101.994 MiB (5.902 GB/s @ duck_max) [2026-04-08 08:47:04.779414 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=203, expert_tiles=428, avg_tile_batch=3.16, prepare=411.767µs, send=10.197423ms, judge_wait=119.496374ms, fetch=11.583502ms, reduce=20ns; duck time-ns stats: p50=108.647638ms, p90=108.779968ms, max=108.888831ms; kernel_model: matmul=3.721396 GFLOP (34.176 GFLOP/s @ duck_max), param_stream=0.589038G (5.410 Gparam/s @ duck_max), weight_stream=632.243 MiB (6.088 GB/s @ duck_max) [2026-04-08 08:47:05.083333 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=235, expert_tiles=758, avg_tile_batch=3.46, prepare=707.155µs, send=19.179359ms, judge_wait=224.312599ms, fetch=20.639696ms, reduce=139ns; duck time-ns stats: p50=198.517411ms, p90=198.686588ms, max=198.691126ms; kernel_model: matmul=7.222591 GFLOP (36.351 GFLOP/s @ duck_max), param_stream=1.043202G (5.250 Gparam/s @ duck_max), weight_stream=1119.720 MiB (5.909 GB/s @ duck_max) [2026-04-08 08:47:05.362289 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=223, expert_tiles=745, avg_tile_batch=3.52, prepare=674.694µs, send=17.226902ms, judge_wait=225.667414ms, fetch=20.647471ms, reduce=136ns; duck time-ns stats: p50=199.116056ms, p90=199.423823ms, max=199.499284ms; kernel_model: matmul=7.222591 GFLOP (36.204 GFLOP/s @ duck_max), param_stream=1.025311G (5.139 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.784 GB/s @ duck_max) [2026-04-08 08:47:05.635943 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=746, avg_tile_batch=3.52, prepare=664.019µs, send=17.225352ms, judge_wait=218.221688ms, fetch=22.88909ms, reduce=20ns; duck time-ns stats: p50=189.696637ms, p90=190.304176ms, max=190.614006ms; kernel_model: matmul=7.222591 GFLOP (37.891 GFLOP/s @ duck_max), param_stream=1.026687G (5.386 Gparam/s @ duck_max), weight_stream=1101.994 MiB (6.062 GB/s @ duck_max) [2026-04-08 08:47:05.913127 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=744, avg_tile_batch=3.53, prepare=662.145µs, send=18.438871ms, judge_wait=220.339789ms, fetch=23.027822ms, reduce=163ns; duck time-ns stats: p50=192.070626ms, p90=192.365682ms, max=192.572041ms; kernel_model: matmul=7.222591 GFLOP (37.506 GFLOP/s @ duck_max), param_stream=1.023934G (5.317 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.984 GB/s @ duck_max) [2026-04-08 08:47:06.061204 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=200, expert_tiles=434, avg_tile_batch=3.12, prepare=414.348µs, send=10.121498ms, judge_wait=118.547568ms, fetch=11.551499ms, reduce=135ns; duck time-ns stats: p50=104.619937ms, p90=104.86773ms, max=104.931954ms; kernel_model: matmul=3.721396 GFLOP (35.465 GFLOP/s @ duck_max), param_stream=0.597295G (5.692 Gparam/s @ duck_max), weight_stream=641.106 MiB (6.407 GB/s @ duck_max) [2026-04-08 08:47:06.371950 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=228, expert_tiles=747, avg_tile_batch=3.51, prepare=702.838µs, send=19.090754ms, judge_wait=228.943657ms, fetch=22.815213ms, reduce=138ns; duck time-ns stats: p50=201.799215ms, p90=202.076145ms, max=202.129853ms; kernel_model: matmul=7.222591 GFLOP (35.732 GFLOP/s @ duck_max), param_stream=1.028063G (5.086 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.724 GB/s @ duck_max) [2026-04-08 08:47:06.645537 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=224, expert_tiles=743, avg_tile_batch=3.53, prepare=664.043µs, send=17.225673ms, judge_wait=218.103511ms, fetch=22.940537ms, reduce=19ns; duck time-ns stats: p50=191.174524ms, p90=191.51507ms, max=191.620449ms; kernel_model: matmul=7.222591 GFLOP (37.692 GFLOP/s @ duck_max), param_stream=1.022558G (5.336 Gparam/s @ duck_max), weight_stream=1097.562 MiB (6.006 GB/s @ duck_max) [2026-04-08 08:47:06.925023 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=754, avg_tile_batch=3.48, prepare=664.766µs, send=18.568393ms, judge_wait=220.800182ms, fetch=24.813618ms, reduce=21ns; duck time-ns stats: p50=193.154994ms, p90=193.324901ms, max=193.499677ms; kernel_model: matmul=7.222591 GFLOP (37.326 GFLOP/s @ duck_max), param_stream=1.037697G (5.363 Gparam/s @ duck_max), weight_stream=1113.811 MiB (6.036 GB/s @ duck_max) [2026-04-08 08:47:07.203590 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=745, avg_tile_batch=3.52, prepare=666.556µs, send=18.396947ms, judge_wait=221.219267ms, fetch=23.674905ms, reduce=20ns; duck time-ns stats: p50=193.777641ms, p90=194.240573ms, max=194.487913ms; kernel_model: matmul=7.222591 GFLOP (37.136 GFLOP/s @ duck_max), param_stream=1.025311G (5.272 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.933 GB/s @ duck_max) [2026-04-08 08:47:07.347226 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=198, expert_tiles=418, avg_tile_batch=3.23, prepare=415.862µs, send=10.24456ms, judge_wait=114.574739ms, fetch=10.913901ms, reduce=133ns; duck time-ns stats: p50=101.644046ms, p90=101.729503ms, max=101.967483ms; kernel_model: matmul=3.721396 GFLOP (36.496 GFLOP/s @ duck_max), param_stream=0.575275G (5.642 Gparam/s @ duck_max), weight_stream=617.471 MiB (6.350 GB/s @ duck_max) [2026-04-08 08:47:07.656622 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=232, expert_tiles=751, avg_tile_batch=3.49, prepare=699.992µs, send=18.661876ms, judge_wait=229.000156ms, fetch=21.640173ms, reduce=136ns; duck time-ns stats: p50=203.721614ms, p90=204.080844ms, max=204.179488ms; kernel_model: matmul=7.222591 GFLOP (35.374 GFLOP/s @ duck_max), param_stream=1.033568G (5.062 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.697 GB/s @ duck_max) [2026-04-08 08:47:07.932849 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=232, expert_tiles=760, avg_tile_batch=3.45, prepare=666.008µs, send=17.214975ms, judge_wait=221.076352ms, fetch=22.300296ms, reduce=19ns; duck time-ns stats: p50=194.077703ms, p90=194.307215ms, max=194.684064ms; kernel_model: matmul=7.222591 GFLOP (37.099 GFLOP/s @ duck_max), param_stream=1.045955G (5.373 Gparam/s @ duck_max), weight_stream=1122.675 MiB (6.047 GB/s @ duck_max) [2026-04-08 08:47:08.211284 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=752, avg_tile_batch=3.49, prepare=661.534µs, send=17.224035ms, judge_wait=220.384871ms, fetch=25.324499ms, reduce=132ns; duck time-ns stats: p50=192.93463ms, p90=193.282353ms, max=193.444315ms; kernel_model: matmul=7.222591 GFLOP (37.337 GFLOP/s @ duck_max), param_stream=1.034945G (5.350 Gparam/s @ duck_max), weight_stream=1110.857 MiB (6.021 GB/s @ duck_max) [2026-04-08 08:47:08.484619 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=237, expert_tiles=750, avg_tile_batch=3.50, prepare=659.7µs, send=18.456066ms, judge_wait=218.814023ms, fetch=20.654856ms, reduce=104ns; duck time-ns stats: p50=193.950604ms, p90=194.217908ms, max=194.376651ms; kernel_model: matmul=7.222591 GFLOP (37.158 GFLOP/s @ duck_max), param_stream=1.032192G (5.310 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.977 GB/s @ duck_max) [2026-04-08 08:47:08.631435 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=203, expert_tiles=437, avg_tile_batch=3.09, prepare=417.205µs, send=10.075706ms, judge_wait=117.235998ms, fetch=11.552903ms, reduce=14ns; duck time-ns stats: p50=104.63991ms, p90=104.766321ms, max=104.821865ms; kernel_model: matmul=3.721396 GFLOP (35.502 GFLOP/s @ duck_max), param_stream=0.601424G (5.738 Gparam/s @ duck_max), weight_stream=645.538 MiB (6.458 GB/s @ duck_max) [2026-04-08 08:47:08.936010 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=752, avg_tile_batch=3.49, prepare=690.317µs, send=19.116963ms, judge_wait=223.934972ms, fetch=21.629137ms, reduce=135ns; duck time-ns stats: p50=197.409132ms, p90=197.77283ms, max=198.074352ms; kernel_model: matmul=7.222591 GFLOP (36.464 GFLOP/s @ duck_max), param_stream=1.034945G (5.225 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.881 GB/s @ duck_max) [2026-04-08 08:47:09.248564 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=749, avg_tile_batch=3.50, prepare=661.078µs, send=17.225095ms, judge_wait=258.27826ms, fetch=21.627425ms, reduce=15ns; duck time-ns stats: p50=233.89638ms, p90=234.280455ms, max=234.434674ms; kernel_model: matmul=7.222591 GFLOP (30.809 GFLOP/s @ duck_max), param_stream=1.030816G (4.397 Gparam/s @ duck_max), weight_stream=1106.425 MiB (4.949 GB/s @ duck_max) [2026-04-08 08:47:09.525330 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=750, avg_tile_batch=3.50, prepare=663.607µs, send=17.224491ms, judge_wait=222.579034ms, fetch=21.600263ms, reduce=134ns; duck time-ns stats: p50=196.50601ms, p90=196.841921ms, max=197.061025ms; kernel_model: matmul=7.222591 GFLOP (36.652 GFLOP/s @ duck_max), param_stream=1.032192G (5.238 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.895 GB/s @ duck_max) [2026-04-08 08:47:09.796649 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=235, expert_tiles=748, avg_tile_batch=3.51, prepare=675.284µs, send=17.225905ms, judge_wait=217.181463ms, fetch=21.627235ms, reduce=19ns; duck time-ns stats: p50=190.991408ms, p90=191.365327ms, max=191.559051ms; kernel_model: matmul=7.222591 GFLOP (37.704 GFLOP/s @ duck_max), param_stream=1.029439G (5.374 Gparam/s @ duck_max), weight_stream=1104.948 MiB (6.048 GB/s @ duck_max) [2026-04-08 08:47:09.941280 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=203, expert_tiles=426, avg_tile_batch=3.17, prepare=414.845µs, send=10.222406ms, judge_wait=115.566817ms, fetch=10.965175ms, reduce=101ns; duck time-ns stats: p50=102.601375ms, p90=102.718879ms, max=102.767856ms; kernel_model: matmul=3.721396 GFLOP (36.212 GFLOP/s @ duck_max), param_stream=0.586285G (5.705 Gparam/s @ duck_max), weight_stream=629.289 MiB (6.421 GB/s @ duck_max) [2026-04-08 08:47:10.252849 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=749, avg_tile_batch=3.50, prepare=695.919µs, send=18.605671ms, judge_wait=231.282642ms, fetch=21.634497ms, reduce=20ns; duck time-ns stats: p50=206.80411ms, p90=206.975107ms, max=207.131397ms; kernel_model: matmul=7.222591 GFLOP (34.870 GFLOP/s @ duck_max), param_stream=1.030816G (4.977 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.601 GB/s @ duck_max) [2026-04-08 08:47:10.525541 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=224, expert_tiles=748, avg_tile_batch=3.51, prepare=680.275µs, send=17.22599ms, judge_wait=218.225312ms, fetch=21.646467ms, reduce=136ns; duck time-ns stats: p50=192.883189ms, p90=193.316276ms, max=193.887721ms; kernel_model: matmul=7.222591 GFLOP (37.251 GFLOP/s @ duck_max), param_stream=1.029439G (5.309 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.976 GB/s @ duck_max) [2026-04-08 08:47:10.807992 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=222, expert_tiles=753, avg_tile_batch=3.48, prepare=661.75µs, send=17.209172ms, judge_wait=228.053928ms, fetch=21.678556ms, reduce=107ns; duck time-ns stats: p50=201.671008ms, p90=201.92263ms, max=202.417929ms; kernel_model: matmul=7.222591 GFLOP (35.682 GFLOP/s @ duck_max), param_stream=1.036321G (5.120 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.762 GB/s @ duck_max) [2026-04-08 08:47:11.086067 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=756, avg_tile_batch=3.47, prepare=664.284µs, send=17.224752ms, judge_wait=222.431323ms, fetch=22.948645ms, reduce=132ns; duck time-ns stats: p50=195.641105ms, p90=196.103445ms, max=196.634263ms; kernel_model: matmul=7.222591 GFLOP (36.731 GFLOP/s @ duck_max), param_stream=1.040450G (5.291 Gparam/s @ duck_max), weight_stream=1116.766 MiB (5.955 GB/s @ duck_max) [2026-04-08 08:47:11.234565 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=191, expert_tiles=429, avg_tile_batch=3.15, prepare=413.726µs, send=10.007061ms, judge_wait=119.615627ms, fetch=10.953662ms, reduce=21ns; duck time-ns stats: p50=107.056963ms, p90=107.232372ms, max=107.397582ms; kernel_model: matmul=3.721396 GFLOP (34.651 GFLOP/s @ duck_max), param_stream=0.590414G (5.497 Gparam/s @ duck_max), weight_stream=633.720 MiB (6.187 GB/s @ duck_max) [2026-04-08 08:47:11.537803 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=749, avg_tile_batch=3.50, prepare=706.653µs, send=18.461934ms, judge_wait=223.147825ms, fetch=21.622176ms, reduce=136ns; duck time-ns stats: p50=196.610664ms, p90=197.179989ms, max=197.624255ms; kernel_model: matmul=7.222591 GFLOP (36.547 GFLOP/s @ duck_max), param_stream=1.030816G (5.216 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.871 GB/s @ duck_max) [2026-04-08 08:47:11.820866 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=238, expert_tiles=757, avg_tile_batch=3.47, prepare=668.659µs, send=18.316974ms, judge_wait=224.963277ms, fetch=24.470897ms, reduce=26ns; duck time-ns stats: p50=197.322702ms, p90=197.579221ms, max=197.815813ms; kernel_model: matmul=7.222591 GFLOP (36.512 GFLOP/s @ duck_max), param_stream=1.041826G (5.267 Gparam/s @ duck_max), weight_stream=1118.243 MiB (5.928 GB/s @ duck_max) [2026-04-08 08:47:12.097580 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=238, expert_tiles=759, avg_tile_batch=3.46, prepare=680.179µs, send=17.232081ms, judge_wait=222.574382ms, fetch=21.583792ms, reduce=135ns; duck time-ns stats: p50=196.804047ms, p90=197.212661ms, max=197.377684ms; kernel_model: matmul=7.222591 GFLOP (36.593 GFLOP/s @ duck_max), param_stream=1.044578G (5.292 Gparam/s @ duck_max), weight_stream=1121.197 MiB (5.956 GB/s @ duck_max) [2026-04-08 08:47:12.374452 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=748, avg_tile_batch=3.51, prepare=711.9µs, send=17.223179ms, judge_wait=222.492296ms, fetch=21.641168ms, reduce=19ns; duck time-ns stats: p50=195.829709ms, p90=196.166197ms, max=196.397786ms; kernel_model: matmul=7.222591 GFLOP (36.775 GFLOP/s @ duck_max), param_stream=1.029439G (5.242 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.899 GB/s @ duck_max) [2026-04-08 08:47:12.523574 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=200, expert_tiles=432, avg_tile_batch=3.13, prepare=415.058µs, send=10.056967ms, judge_wait=119.590258ms, fetch=11.591083ms, reduce=20ns; duck time-ns stats: p50=106.778734ms, p90=106.880506ms, max=106.934062ms; kernel_model: matmul=3.721396 GFLOP (34.801 GFLOP/s @ duck_max), param_stream=0.594543G (5.560 Gparam/s @ duck_max), weight_stream=638.152 MiB (6.258 GB/s @ duck_max) [2026-04-08 08:47:12.825684 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=237, expert_tiles=752, avg_tile_batch=3.49, prepare=705.406µs, send=19.545082ms, judge_wait=220.022976ms, fetch=22.183571ms, reduce=20ns; duck time-ns stats: p50=192.972458ms, p90=193.456455ms, max=193.771631ms; kernel_model: matmul=7.222591 GFLOP (37.274 GFLOP/s @ duck_max), param_stream=1.034945G (5.341 Gparam/s @ duck_max), weight_stream=1110.857 MiB (6.011 GB/s @ duck_max) [2026-04-08 08:47:13.103228 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=238, expert_tiles=748, avg_tile_batch=3.51, prepare=665.121µs, send=17.216756ms, judge_wait=222.767217ms, fetch=22.104878ms, reduce=33ns; duck time-ns stats: p50=193.986975ms, p90=194.257855ms, max=194.588315ms; kernel_model: matmul=7.222591 GFLOP (37.117 GFLOP/s @ duck_max), param_stream=1.029439G (5.290 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.954 GB/s @ duck_max) [2026-04-08 08:47:13.377811 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=757, avg_tile_batch=3.47, prepare=668.256µs, send=17.225468ms, judge_wait=219.135646ms, fetch=22.708283ms, reduce=22ns; duck time-ns stats: p50=191.915671ms, p90=192.156022ms, max=192.220953ms; kernel_model: matmul=7.222591 GFLOP (37.574 GFLOP/s @ duck_max), param_stream=1.041826G (5.420 Gparam/s @ duck_max), weight_stream=1118.243 MiB (6.100 GB/s @ duck_max) [2026-04-08 08:47:13.657654 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=239, expert_tiles=753, avg_tile_batch=3.48, prepare=661.177µs, send=17.233368ms, judge_wait=222.030305ms, fetch=24.172563ms, reduce=20ns; duck time-ns stats: p50=194.536926ms, p90=194.716067ms, max=194.96707ms; kernel_model: matmul=7.222591 GFLOP (37.045 GFLOP/s @ duck_max), param_stream=1.036321G (5.315 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.982 GB/s @ duck_max) [2026-04-08 08:47:13.809035 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=213, expert_tiles=436, avg_tile_batch=3.10, prepare=414.078µs, send=10.124797ms, judge_wait=122.521876ms, fetch=10.921341ms, reduce=20ns; duck time-ns stats: p50=108.845007ms, p90=109.025642ms, max=109.159855ms; kernel_model: matmul=3.721396 GFLOP (34.091 GFLOP/s @ duck_max), param_stream=0.600048G (5.497 Gparam/s @ duck_max), weight_stream=644.061 MiB (6.187 GB/s @ duck_max) [2026-04-08 08:47:14.107525 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=228, expert_tiles=746, avg_tile_batch=3.52, prepare=696.377µs, send=18.414935ms, judge_wait=219.073465ms, fetch=20.636937ms, reduce=139ns; duck time-ns stats: p50=192.970311ms, p90=193.161132ms, max=193.29434ms; kernel_model: matmul=7.222591 GFLOP (37.366 GFLOP/s @ duck_max), param_stream=1.026687G (5.312 Gparam/s @ duck_max), weight_stream=1101.994 MiB (5.978 GB/s @ duck_max) [2026-04-08 08:47:14.385077 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=232, expert_tiles=748, avg_tile_batch=3.51, prepare=664.918µs, send=17.226107ms, judge_wait=223.199109ms, fetch=21.652569ms, reduce=20ns; duck time-ns stats: p50=197.923998ms, p90=198.20307ms, max=198.484494ms; kernel_model: matmul=7.222591 GFLOP (36.389 GFLOP/s @ duck_max), param_stream=1.029439G (5.186 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.837 GB/s @ duck_max) [2026-04-08 08:47:14.662951 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=750, avg_tile_batch=3.50, prepare=663.337µs, send=17.22761ms, judge_wait=223.60146ms, fetch=21.624959ms, reduce=20ns; duck time-ns stats: p50=197.441006ms, p90=197.635113ms, max=197.718411ms; kernel_model: matmul=7.222591 GFLOP (36.530 GFLOP/s @ duck_max), param_stream=1.032192G (5.221 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.876 GB/s @ duck_max) [2026-04-08 08:47:14.945000 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=749, avg_tile_batch=3.50, prepare=666.757µs, send=17.214313ms, judge_wait=228.841765ms, fetch=20.656076ms, reduce=139ns; duck time-ns stats: p50=202.625991ms, p90=202.755734ms, max=202.860963ms; kernel_model: matmul=7.222591 GFLOP (35.604 GFLOP/s @ duck_max), param_stream=1.030816G (5.081 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.719 GB/s @ duck_max) [2026-04-08 08:47:15.094626 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=200, expert_tiles=429, avg_tile_batch=3.15, prepare=412.973µs, send=10.173748ms, judge_wait=120.007474ms, fetch=11.57111ms, reduce=20ns; duck time-ns stats: p50=107.972855ms, p90=108.166417ms, max=108.269934ms; kernel_model: matmul=3.721396 GFLOP (34.371 GFLOP/s @ duck_max), param_stream=0.590414G (5.453 Gparam/s @ duck_max), weight_stream=633.720 MiB (6.137 GB/s @ duck_max) [2026-04-08 08:47:15.405697 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=232, expert_tiles=751, avg_tile_batch=3.49, prepare=693.505µs, send=18.990507ms, judge_wait=231.375882ms, fetch=20.651619ms, reduce=19ns; duck time-ns stats: p50=207.113623ms, p90=207.453617ms, max=207.73672ms; kernel_model: matmul=7.222591 GFLOP (34.768 GFLOP/s @ duck_max), param_stream=1.033568G (4.975 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.600 GB/s @ duck_max) [2026-04-08 08:47:15.680222 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=235, expert_tiles=745, avg_tile_batch=3.52, prepare=661.267µs, send=17.226367ms, judge_wait=220.258ms, fetch=21.659622ms, reduce=146ns; duck time-ns stats: p50=194.820971ms, p90=195.074092ms, max=195.48014ms; kernel_model: matmul=7.222591 GFLOP (36.948 GFLOP/s @ duck_max), param_stream=1.025311G (5.245 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.903 GB/s @ duck_max) [2026-04-08 08:47:15.953184 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=235, expert_tiles=751, avg_tile_batch=3.49, prepare=667.455µs, send=17.210387ms, judge_wait=219.60693ms, fetch=20.643556ms, reduce=139ns; duck time-ns stats: p50=193.716033ms, p90=194.063813ms, max=194.283342ms; kernel_model: matmul=7.222591 GFLOP (37.176 GFLOP/s @ duck_max), param_stream=1.033568G (5.320 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.987 GB/s @ duck_max) [2026-04-08 08:47:16.226188 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=745, avg_tile_batch=3.52, prepare=663.273µs, send=17.225336ms, judge_wait=218.684273ms, fetch=21.671883ms, reduce=143ns; duck time-ns stats: p50=192.479163ms, p90=192.717637ms, max=192.894981ms; kernel_model: matmul=7.222591 GFLOP (37.443 GFLOP/s @ duck_max), param_stream=1.025311G (5.315 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.982 GB/s @ duck_max) [2026-04-08 08:47:16.368465 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=203, expert_tiles=424, avg_tile_batch=3.19, prepare=414.168µs, send=10.07553ms, judge_wait=113.287346ms, fetch=10.887537ms, reduce=15ns; duck time-ns stats: p50=101.219813ms, p90=101.412343ms, max=101.457865ms; kernel_model: matmul=3.721396 GFLOP (36.679 GFLOP/s @ duck_max), param_stream=0.583533G (5.751 Gparam/s @ duck_max), weight_stream=626.334 MiB (6.473 GB/s @ duck_max) [2026-04-08 08:47:16.678610 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=752, avg_tile_batch=3.49, prepare=693.131µs, send=18.570902ms, judge_wait=230.974146ms, fetch=20.662368ms, reduce=21ns; duck time-ns stats: p50=206.993307ms, p90=207.283934ms, max=207.52858ms; kernel_model: matmul=7.222591 GFLOP (34.803 GFLOP/s @ duck_max), param_stream=1.034945G (4.987 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.613 GB/s @ duck_max) [2026-04-08 08:47:16.955361 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=750, avg_tile_batch=3.50, prepare=663.536µs, send=17.226739ms, judge_wait=223.484741ms, fetch=20.643314ms, reduce=20ns; duck time-ns stats: p50=197.454487ms, p90=197.737403ms, max=197.89054ms; kernel_model: matmul=7.222591 GFLOP (36.498 GFLOP/s @ duck_max), param_stream=1.032192G (5.216 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.871 GB/s @ duck_max) [2026-04-08 08:47:17.231721 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=756, avg_tile_batch=3.47, prepare=661.496µs, send=17.226727ms, judge_wait=222.967868ms, fetch=20.657358ms, reduce=138ns; duck time-ns stats: p50=196.885442ms, p90=197.345572ms, max=197.507412ms; kernel_model: matmul=7.222591 GFLOP (36.569 GFLOP/s @ duck_max), param_stream=1.040450G (5.268 Gparam/s @ duck_max), weight_stream=1116.766 MiB (5.929 GB/s @ duck_max) [2026-04-08 08:47:17.508243 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=747, avg_tile_batch=3.51, prepare=682.018µs, send=17.254322ms, judge_wait=222.080523ms, fetch=21.667669ms, reduce=137ns; duck time-ns stats: p50=195.413996ms, p90=195.601093ms, max=195.82988ms; kernel_model: matmul=7.222591 GFLOP (36.882 GFLOP/s @ duck_max), param_stream=1.028063G (5.250 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.909 GB/s @ duck_max) [2026-04-08 08:47:17.653167 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=206, expert_tiles=423, avg_tile_batch=3.20, prepare=415.829µs, send=10.008274ms, judge_wait=116.350134ms, fetch=10.579792ms, reduce=19ns; duck time-ns stats: p50=103.898476ms, p90=104.034308ms, max=104.133862ms; kernel_model: matmul=3.721396 GFLOP (35.737 GFLOP/s @ duck_max), param_stream=0.582156G (5.590 Gparam/s @ duck_max), weight_stream=624.857 MiB (6.292 GB/s @ duck_max) [2026-04-08 08:47:17.956845 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=756, avg_tile_batch=3.47, prepare=709.543µs, send=19.018144ms, judge_wait=223.105478ms, fetch=21.644053ms, reduce=130ns; duck time-ns stats: p50=197.906756ms, p90=198.335245ms, max=198.492147ms; kernel_model: matmul=7.222591 GFLOP (36.387 GFLOP/s @ duck_max), param_stream=1.040450G (5.242 Gparam/s @ duck_max), weight_stream=1116.766 MiB (5.900 GB/s @ duck_max) [2026-04-08 08:47:18.245644 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=744, avg_tile_batch=3.53, prepare=658.833µs, send=17.223911ms, judge_wait=235.577668ms, fetch=20.664272ms, reduce=39ns; duck time-ns stats: p50=210.59021ms, p90=210.857979ms, max=211.012716ms; kernel_model: matmul=7.222591 GFLOP (34.228 GFLOP/s @ duck_max), param_stream=1.023934G (4.852 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.461 GB/s @ duck_max) [2026-04-08 08:47:18.517992 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=235, expert_tiles=749, avg_tile_batch=3.50, prepare=665.088µs, send=17.224523ms, judge_wait=218.980626ms, fetch=20.641563ms, reduce=135ns; duck time-ns stats: p50=192.850753ms, p90=193.132316ms, max=193.481932ms; kernel_model: matmul=7.222591 GFLOP (37.330 GFLOP/s @ duck_max), param_stream=1.030816G (5.328 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.996 GB/s @ duck_max) [2026-04-08 08:47:18.795326 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=235, expert_tiles=750, avg_tile_batch=3.50, prepare=664.711µs, send=17.225021ms, judge_wait=220.039424ms, fetch=24.775799ms, reduce=21ns; duck time-ns stats: p50=192.504393ms, p90=192.672603ms, max=192.851935ms; kernel_model: matmul=7.222591 GFLOP (37.451 GFLOP/s @ duck_max), param_stream=1.032192G (5.352 Gparam/s @ duck_max), weight_stream=1107.903 MiB (6.024 GB/s @ duck_max) [2026-04-08 08:47:18.942589 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=190, expert_tiles=418, avg_tile_batch=3.23, prepare=413.085µs, send=10.225766ms, judge_wait=117.620832ms, fetch=11.567123ms, reduce=20ns; duck time-ns stats: p50=106.712136ms, p90=106.906818ms, max=106.96456ms; kernel_model: matmul=3.721396 GFLOP (34.791 GFLOP/s @ duck_max), param_stream=0.575275G (5.378 Gparam/s @ duck_max), weight_stream=617.471 MiB (6.053 GB/s @ duck_max) [2026-04-08 08:47:19.249603 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=219, expert_tiles=750, avg_tile_batch=3.50, prepare=695.243µs, send=18.913245ms, judge_wait=227.357716ms, fetch=20.647637ms, reduce=20ns; duck time-ns stats: p50=202.147513ms, p90=202.345698ms, max=202.725908ms; kernel_model: matmul=7.222591 GFLOP (35.627 GFLOP/s @ duck_max), param_stream=1.032192G (5.092 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.730 GB/s @ duck_max) [2026-04-08 08:47:19.528322 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=220, expert_tiles=747, avg_tile_batch=3.51, prepare=665.071µs, send=17.227186ms, judge_wait=225.465448ms, fetch=20.651981ms, reduce=20ns; duck time-ns stats: p50=198.920065ms, p90=199.200406ms, max=199.346528ms; kernel_model: matmul=7.222591 GFLOP (36.231 GFLOP/s @ duck_max), param_stream=1.028063G (5.157 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.804 GB/s @ duck_max) [2026-04-08 08:47:19.807127 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=223, expert_tiles=747, avg_tile_batch=3.51, prepare=662.893µs, send=17.224461ms, judge_wait=222.952632ms, fetch=23.362934ms, reduce=20ns; duck time-ns stats: p50=195.919544ms, p90=196.354278ms, max=196.490798ms; kernel_model: matmul=7.222591 GFLOP (36.758 GFLOP/s @ duck_max), param_stream=1.028063G (5.232 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.889 GB/s @ duck_max) [2026-04-08 08:47:20.085139 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=223, expert_tiles=745, avg_tile_batch=3.52, prepare=659.789µs, send=18.316889ms, judge_wait=222.684203ms, fetch=21.627492ms, reduce=140ns; duck time-ns stats: p50=199.532057ms, p90=199.906904ms, max=200.023301ms; kernel_model: matmul=7.222591 GFLOP (36.109 GFLOP/s @ duck_max), param_stream=1.025311G (5.126 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.769 GB/s @ duck_max) [2026-04-08 08:47:20.230758 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=188, expert_tiles=429, avg_tile_batch=3.15, prepare=415.348µs, send=10.133419ms, judge_wait=116.708872ms, fetch=10.910028ms, reduce=140ns; duck time-ns stats: p50=105.473511ms, p90=105.720941ms, max=105.86727ms; kernel_model: matmul=3.721396 GFLOP (35.152 GFLOP/s @ duck_max), param_stream=0.590414G (5.577 Gparam/s @ duck_max), weight_stream=633.720 MiB (6.277 GB/s @ duck_max) [2026-04-08 08:47:20.532354 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=752, avg_tile_batch=3.49, prepare=697.146µs, send=18.828871ms, judge_wait=221.307694ms, fetch=21.630949ms, reduce=19ns; duck time-ns stats: p50=196.834935ms, p90=197.101939ms, max=197.26751ms; kernel_model: matmul=7.222591 GFLOP (36.613 GFLOP/s @ duck_max), param_stream=1.034945G (5.246 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.905 GB/s @ duck_max) [2026-04-08 08:47:20.805062 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=240, expert_tiles=744, avg_tile_batch=3.53, prepare=662.072µs, send=17.209419ms, judge_wait=218.527455ms, fetch=21.659689ms, reduce=133ns; duck time-ns stats: p50=193.230086ms, p90=193.59447ms, max=193.736335ms; kernel_model: matmul=7.222591 GFLOP (37.281 GFLOP/s @ duck_max), param_stream=1.023934G (5.285 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.948 GB/s @ duck_max) [2026-04-08 08:47:21.076881 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=755, avg_tile_batch=3.48, prepare=661.068µs, send=17.226845ms, judge_wait=217.568923ms, fetch=21.654641ms, reduce=139ns; duck time-ns stats: p50=192.276193ms, p90=192.593552ms, max=192.956505ms; kernel_model: matmul=7.222591 GFLOP (37.431 GFLOP/s @ duck_max), param_stream=1.039073G (5.385 Gparam/s @ duck_max), weight_stream=1115.289 MiB (6.061 GB/s @ duck_max) [2026-04-08 08:47:21.350230 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=235, expert_tiles=748, avg_tile_batch=3.51, prepare=662.759µs, send=17.212523ms, judge_wait=219.178113ms, fetch=21.64324ms, reduce=136ns; duck time-ns stats: p50=193.944793ms, p90=194.286318ms, max=194.483531ms; kernel_model: matmul=7.222591 GFLOP (37.137 GFLOP/s @ duck_max), param_stream=1.029439G (5.293 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.957 GB/s @ duck_max) [2026-04-08 08:47:21.495743 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=205, expert_tiles=431, avg_tile_batch=3.14, prepare=417.829µs, send=10.221049ms, judge_wait=116.477316ms, fetch=10.898356ms, reduce=21ns; duck time-ns stats: p50=103.87681ms, p90=103.996952ms, max=104.048482ms; kernel_model: matmul=3.721396 GFLOP (35.766 GFLOP/s @ duck_max), param_stream=0.593166G (5.701 Gparam/s @ duck_max), weight_stream=636.675 MiB (6.416 GB/s @ duck_max) [2026-04-08 08:47:21.800585 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=748, avg_tile_batch=3.51, prepare=696.384µs, send=18.796849ms, judge_wait=225.62633ms, fetch=20.653613ms, reduce=21ns; duck time-ns stats: p50=200.503437ms, p90=200.640271ms, max=200.877927ms; kernel_model: matmul=7.222591 GFLOP (35.955 GFLOP/s @ duck_max), param_stream=1.029439G (5.125 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.768 GB/s @ duck_max) [2026-04-08 08:47:22.072956 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=228, expert_tiles=747, avg_tile_batch=3.51, prepare=666.116µs, send=17.210037ms, judge_wait=219.14654ms, fetch=20.640559ms, reduce=19ns; duck time-ns stats: p50=193.356943ms, p90=193.851782ms, max=193.889712ms; kernel_model: matmul=7.222591 GFLOP (37.251 GFLOP/s @ duck_max), param_stream=1.028063G (5.302 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.968 GB/s @ duck_max) [2026-04-08 08:47:22.358927 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=753, avg_tile_batch=3.48, prepare=661.002µs, send=17.22633ms, judge_wait=231.700674ms, fetch=21.639591ms, reduce=136ns; duck time-ns stats: p50=208.482284ms, p90=208.886253ms, max=209.166601ms; kernel_model: matmul=7.222591 GFLOP (34.530 GFLOP/s @ duck_max), param_stream=1.036321G (4.955 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.576 GB/s @ duck_max) [2026-04-08 08:47:22.640992 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=231, expert_tiles=752, avg_tile_batch=3.49, prepare=678.127µs, send=17.223814ms, judge_wait=228.722824ms, fetch=20.652481ms, reduce=133ns; duck time-ns stats: p50=206.705557ms, p90=206.941345ms, max=207.160647ms; kernel_model: matmul=7.222591 GFLOP (34.865 GFLOP/s @ duck_max), param_stream=1.034945G (4.996 Gparam/s @ duck_max), weight_stream=1110.857 MiB (5.623 GB/s @ duck_max) [2026-04-08 08:47:22.795096 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=193, expert_tiles=424, avg_tile_batch=3.19, prepare=425.652µs, send=10.03036ms, judge_wait=125.220126ms, fetch=10.98499ms, reduce=20ns; duck time-ns stats: p50=113.288066ms, p90=113.400215ms, max=113.461018ms; kernel_model: matmul=3.721396 GFLOP (32.799 GFLOP/s @ duck_max), param_stream=0.583533G (5.143 Gparam/s @ duck_max), weight_stream=626.334 MiB (5.788 GB/s @ duck_max) [2026-04-08 08:47:23.100144 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=754, avg_tile_batch=3.48, prepare=694.27µs, send=18.580542ms, judge_wait=225.798033ms, fetch=20.660769ms, reduce=21ns; duck time-ns stats: p50=204.478553ms, p90=204.86707ms, max=204.955882ms; kernel_model: matmul=7.222591 GFLOP (35.240 GFLOP/s @ duck_max), param_stream=1.037697G (5.063 Gparam/s @ duck_max), weight_stream=1113.811 MiB (5.698 GB/s @ duck_max) [2026-04-08 08:47:23.374040 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=750, avg_tile_batch=3.50, prepare=661.348µs, send=17.22901ms, judge_wait=219.655975ms, fetch=21.64592ms, reduce=135ns; duck time-ns stats: p50=194.210624ms, p90=194.382385ms, max=194.547457ms; kernel_model: matmul=7.222591 GFLOP (37.125 GFLOP/s @ duck_max), param_stream=1.032192G (5.306 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.971 GB/s @ duck_max) [2026-04-08 08:47:23.648070 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=750, avg_tile_batch=3.50, prepare=666.662µs, send=17.224668ms, judge_wait=219.865849ms, fetch=21.639485ms, reduce=139ns; duck time-ns stats: p50=194.354471ms, p90=194.667217ms, max=194.77516ms; kernel_model: matmul=7.222591 GFLOP (37.082 GFLOP/s @ duck_max), param_stream=1.032192G (5.299 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.964 GB/s @ duck_max) [2026-04-08 08:47:23.924809 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=753, avg_tile_batch=3.48, prepare=679.693µs, send=17.22153ms, judge_wait=222.437783ms, fetch=21.666438ms, reduce=136ns; duck time-ns stats: p50=195.922084ms, p90=196.177249ms, max=196.561997ms; kernel_model: matmul=7.222591 GFLOP (36.745 GFLOP/s @ duck_max), param_stream=1.036321G (5.272 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.934 GB/s @ duck_max) [2026-04-08 08:47:24.071256 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=181, expert_tiles=422, avg_tile_batch=3.20, prepare=415.159µs, send=10.104213ms, judge_wait=116.916573ms, fetch=11.586787ms, reduce=139ns; duck time-ns stats: p50=105.079669ms, p90=105.315639ms, max=105.48705ms; kernel_model: matmul=3.721396 GFLOP (35.278 GFLOP/s @ duck_max), param_stream=0.580780G (5.506 Gparam/s @ duck_max), weight_stream=623.380 MiB (6.197 GB/s @ duck_max) [2026-04-08 08:47:24.372195 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=753, avg_tile_batch=3.48, prepare=696.069µs, send=19.082938ms, judge_wait=220.214591ms, fetch=21.702218ms, reduce=146ns; duck time-ns stats: p50=194.039522ms, p90=194.293252ms, max=194.945714ms; kernel_model: matmul=7.222591 GFLOP (37.049 GFLOP/s @ duck_max), param_stream=1.036321G (5.316 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.983 GB/s @ duck_max) [2026-04-08 08:47:24.647657 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=232, expert_tiles=753, avg_tile_batch=3.48, prepare=661.141µs, send=17.213287ms, judge_wait=218.294903ms, fetch=24.668643ms, reduce=20ns; duck time-ns stats: p50=190.697544ms, p90=190.88921ms, max=191.027245ms; kernel_model: matmul=7.222591 GFLOP (37.809 GFLOP/s @ duck_max), param_stream=1.036321G (5.425 Gparam/s @ duck_max), weight_stream=1112.334 MiB (6.106 GB/s @ duck_max) [2026-04-08 08:47:24.923693 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=224, expert_tiles=751, avg_tile_batch=3.49, prepare=676.284µs, send=17.223452ms, judge_wait=219.95444ms, fetch=23.469189ms, reduce=22ns; duck time-ns stats: p50=192.956689ms, p90=193.329654ms, max=193.523883ms; kernel_model: matmul=7.222591 GFLOP (37.321 GFLOP/s @ duck_max), param_stream=1.033568G (5.341 Gparam/s @ duck_max), weight_stream=1109.380 MiB (6.011 GB/s @ duck_max) [2026-04-08 08:47:25.201715 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=745, avg_tile_batch=3.52, prepare=661.323µs, send=18.431644ms, judge_wait=223.614335ms, fetch=20.6508ms, reduce=139ns; duck time-ns stats: p50=196.55418ms, p90=196.84166ms, max=197.011215ms; kernel_model: matmul=7.222591 GFLOP (36.661 GFLOP/s @ duck_max), param_stream=1.025311G (5.204 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.857 GB/s @ duck_max) [2026-04-08 08:47:25.347790 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=191, expert_tiles=428, avg_tile_batch=3.16, prepare=416.529µs, send=10.176173ms, judge_wait=116.457745ms, fetch=11.602437ms, reduce=20ns; duck time-ns stats: p50=103.604279ms, p90=103.764896ms, max=103.898321ms; kernel_model: matmul=3.721396 GFLOP (35.818 GFLOP/s @ duck_max), param_stream=0.589038G (5.669 Gparam/s @ duck_max), weight_stream=632.243 MiB (6.381 GB/s @ duck_max) [2026-04-08 08:47:25.646440 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=751, avg_tile_batch=3.49, prepare=711.783µs, send=18.858497ms, judge_wait=218.046644ms, fetch=21.644003ms, reduce=141ns; duck time-ns stats: p50=192.829672ms, p90=193.059861ms, max=193.268688ms; kernel_model: matmul=7.222591 GFLOP (37.371 GFLOP/s @ duck_max), param_stream=1.033568G (5.348 Gparam/s @ duck_max), weight_stream=1109.380 MiB (6.019 GB/s @ duck_max) [2026-04-08 08:47:25.923995 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=228, expert_tiles=751, avg_tile_batch=3.49, prepare=666.217µs, send=17.209996ms, judge_wait=223.34216ms, fetch=21.647088ms, reduce=24ns; duck time-ns stats: p50=197.82924ms, p90=198.046794ms, max=198.176091ms; kernel_model: matmul=7.222591 GFLOP (36.445 GFLOP/s @ duck_max), param_stream=1.033568G (5.215 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.870 GB/s @ duck_max) [2026-04-08 08:47:26.204204 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=755, avg_tile_batch=3.48, prepare=660.65µs, send=17.225528ms, judge_wait=225.332185ms, fetch=22.342487ms, reduce=20ns; duck time-ns stats: p50=198.630259ms, p90=198.923952ms, max=198.961383ms; kernel_model: matmul=7.222591 GFLOP (36.301 GFLOP/s @ duck_max), param_stream=1.039073G (5.222 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.878 GB/s @ duck_max) [2026-04-08 08:47:26.486324 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=216, expert_tiles=747, avg_tile_batch=3.51, prepare=668.81µs, send=17.212798ms, judge_wait=227.921053ms, fetch=21.637896ms, reduce=131ns; duck time-ns stats: p50=201.325124ms, p90=201.722741ms, max=202.117285ms; kernel_model: matmul=7.222591 GFLOP (35.735 GFLOP/s @ duck_max), param_stream=1.028063G (5.086 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.725 GB/s @ duck_max) [2026-04-08 08:47:26.635440 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=189, expert_tiles=426, avg_tile_batch=3.17, prepare=416.304µs, send=10.16803ms, judge_wait=119.545943ms, fetch=11.568992ms, reduce=133ns; duck time-ns stats: p50=106.4932ms, p90=106.628383ms, max=106.757762ms; kernel_model: matmul=3.721396 GFLOP (34.858 GFLOP/s @ duck_max), param_stream=0.586285G (5.492 Gparam/s @ duck_max), weight_stream=629.289 MiB (6.181 GB/s @ duck_max) [2026-04-08 08:47:26.950988 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=759, avg_tile_batch=3.46, prepare=707.876µs, send=19.220765ms, judge_wait=235.964044ms, fetch=20.640768ms, reduce=134ns; duck time-ns stats: p50=210.222366ms, p90=210.520182ms, max=210.579999ms; kernel_model: matmul=7.222591 GFLOP (34.299 GFLOP/s @ duck_max), param_stream=1.044578G (4.960 Gparam/s @ duck_max), weight_stream=1121.197 MiB (5.583 GB/s @ duck_max) [2026-04-08 08:47:27.229209 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=228, expert_tiles=754, avg_tile_batch=3.48, prepare=660.985µs, send=17.227248ms, judge_wait=225.004769ms, fetch=20.634879ms, reduce=135ns; duck time-ns stats: p50=198.285808ms, p90=198.680513ms, max=199.099742ms; kernel_model: matmul=7.222591 GFLOP (36.276 GFLOP/s @ duck_max), param_stream=1.037697G (5.212 Gparam/s @ duck_max), weight_stream=1113.811 MiB (5.866 GB/s @ duck_max) [2026-04-08 08:47:27.507276 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=232, expert_tiles=745, avg_tile_batch=3.52, prepare=660.164µs, send=17.227376ms, judge_wait=223.813355ms, fetch=21.66738ms, reduce=142ns; duck time-ns stats: p50=199.477766ms, p90=199.761315ms, max=199.921965ms; kernel_model: matmul=7.222591 GFLOP (36.127 GFLOP/s @ duck_max), param_stream=1.025311G (5.129 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.772 GB/s @ duck_max) [2026-04-08 08:47:27.780821 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=749, avg_tile_batch=3.50, prepare=664.102µs, send=17.220993ms, judge_wait=219.33624ms, fetch=21.70409ms, reduce=141ns; duck time-ns stats: p50=192.889859ms, p90=193.126094ms, max=193.178523ms; kernel_model: matmul=7.222591 GFLOP (37.388 GFLOP/s @ duck_max), param_stream=1.030816G (5.336 Gparam/s @ duck_max), weight_stream=1106.425 MiB (6.006 GB/s @ duck_max) [2026-04-08 08:47:27.940307 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=195, expert_tiles=432, avg_tile_batch=3.13, prepare=420.027µs, send=10.212519ms, judge_wait=129.956989ms, fetch=11.476618ms, reduce=21ns; duck time-ns stats: p50=118.139039ms, p90=118.28497ms, max=118.346929ms; kernel_model: matmul=3.721396 GFLOP (31.445 GFLOP/s @ duck_max), param_stream=0.594543G (5.024 Gparam/s @ duck_max), weight_stream=638.152 MiB (5.654 GB/s @ duck_max) [2026-04-08 08:47:28.243314 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=753, avg_tile_batch=3.48, prepare=691.093µs, send=19.074582ms, judge_wait=222.414075ms, fetch=21.626553ms, reduce=144ns; duck time-ns stats: p50=198.169532ms, p90=198.414949ms, max=198.53787ms; kernel_model: matmul=7.222591 GFLOP (36.379 GFLOP/s @ duck_max), param_stream=1.036321G (5.220 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.875 GB/s @ duck_max) [2026-04-08 08:47:28.522563 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=747, avg_tile_batch=3.51, prepare=670.181µs, send=17.227942ms, judge_wait=223.736276ms, fetch=22.895642ms, reduce=135ns; duck time-ns stats: p50=196.960581ms, p90=197.32947ms, max=197.720824ms; kernel_model: matmul=7.222591 GFLOP (36.529 GFLOP/s @ duck_max), param_stream=1.028063G (5.200 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.852 GB/s @ duck_max) [2026-04-08 08:47:28.801423 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=220, expert_tiles=743, avg_tile_batch=3.53, prepare=662.168µs, send=17.209821ms, judge_wait=222.651393ms, fetch=23.739148ms, reduce=21ns; duck time-ns stats: p50=195.152073ms, p90=195.501449ms, max=195.588627ms; kernel_model: matmul=7.222591 GFLOP (36.927 GFLOP/s @ duck_max), param_stream=1.022558G (5.228 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.884 GB/s @ duck_max) [2026-04-08 08:47:29.076779 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=745, avg_tile_batch=3.52, prepare=661.446µs, send=18.329621ms, judge_wait=220.05938ms, fetch=21.651972ms, reduce=20ns; duck time-ns stats: p50=194.968892ms, p90=195.255222ms, max=195.429526ms; kernel_model: matmul=7.222591 GFLOP (36.958 GFLOP/s @ duck_max), param_stream=1.025311G (5.246 Gparam/s @ duck_max), weight_stream=1100.517 MiB (5.905 GB/s @ duck_max) [2026-04-08 08:47:29.224918 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=190, expert_tiles=422, avg_tile_batch=3.20, prepare=410.177µs, send=10.191531ms, judge_wait=119.454654ms, fetch=10.575328ms, reduce=137ns; duck time-ns stats: p50=106.972026ms, p90=107.110491ms, max=107.203051ms; kernel_model: matmul=3.721396 GFLOP (34.714 GFLOP/s @ duck_max), param_stream=0.580780G (5.418 Gparam/s @ duck_max), weight_stream=623.380 MiB (6.097 GB/s @ duck_max) [2026-04-08 08:47:29.531358 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=756, avg_tile_batch=3.47, prepare=708.753µs, send=19.091336ms, judge_wait=225.932558ms, fetch=21.649537ms, reduce=20ns; duck time-ns stats: p50=200.24656ms, p90=200.587726ms, max=200.873365ms; kernel_model: matmul=7.222591 GFLOP (35.956 GFLOP/s @ duck_max), param_stream=1.040450G (5.180 Gparam/s @ duck_max), weight_stream=1116.766 MiB (5.830 GB/s @ duck_max) [2026-04-08 08:47:29.823417 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=750, avg_tile_batch=3.50, prepare=664.137µs, send=17.211505ms, judge_wait=237.732162ms, fetch=21.64542ms, reduce=136ns; duck time-ns stats: p50=211.774821ms, p90=212.065744ms, max=212.207476ms; kernel_model: matmul=7.222591 GFLOP (34.036 GFLOP/s @ duck_max), param_stream=1.032192G (4.864 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.474 GB/s @ duck_max) [2026-04-08 08:47:30.106368 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=223, expert_tiles=748, avg_tile_batch=3.51, prepare=662.079µs, send=17.227712ms, judge_wait=228.750987ms, fetch=21.623599ms, reduce=167ns; duck time-ns stats: p50=203.67751ms, p90=204.066308ms, max=204.335198ms; kernel_model: matmul=7.222591 GFLOP (35.347 GFLOP/s @ duck_max), param_stream=1.029439G (5.038 Gparam/s @ duck_max), weight_stream=1104.948 MiB (5.670 GB/s @ duck_max) [2026-04-08 08:47:30.390770 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=222, expert_tiles=743, avg_tile_batch=3.53, prepare=663.924µs, send=17.229782ms, judge_wait=228.482191ms, fetch=23.326279ms, reduce=24ns; duck time-ns stats: p50=201.172562ms, p90=201.425757ms, max=201.70439ms; kernel_model: matmul=7.222591 GFLOP (35.808 GFLOP/s @ duck_max), param_stream=1.022558G (5.070 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.706 GB/s @ duck_max) [2026-04-08 08:47:30.545029 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=190, expert_tiles=432, avg_tile_batch=3.13, prepare=414.894µs, send=10.144929ms, judge_wait=125.321307ms, fetch=10.921009ms, reduce=138ns; duck time-ns stats: p50=115.173285ms, p90=115.475194ms, max=115.509991ms; kernel_model: matmul=3.721396 GFLOP (32.217 GFLOP/s @ duck_max), param_stream=0.594543G (5.147 Gparam/s @ duck_max), weight_stream=638.152 MiB (5.793 GB/s @ duck_max) [2026-04-08 08:47:30.847820 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=753, avg_tile_batch=3.48, prepare=723.431µs, send=18.660461ms, judge_wait=223.570289ms, fetch=20.64367ms, reduce=20ns; duck time-ns stats: p50=197.955094ms, p90=198.205348ms, max=198.81145ms; kernel_model: matmul=7.222591 GFLOP (36.329 GFLOP/s @ duck_max), param_stream=1.036321G (5.213 Gparam/s @ duck_max), weight_stream=1112.334 MiB (5.867 GB/s @ duck_max) [2026-04-08 08:47:31.133916 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=227, expert_tiles=751, avg_tile_batch=3.49, prepare=673.522µs, send=17.22301ms, judge_wait=232.891975ms, fetch=20.639587ms, reduce=21ns; duck time-ns stats: p50=210.548251ms, p90=210.921572ms, max=211.157241ms; kernel_model: matmul=7.222591 GFLOP (34.205 GFLOP/s @ duck_max), param_stream=1.033568G (4.895 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.509 GB/s @ duck_max) [2026-04-08 08:47:31.412782 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=215, expert_tiles=744, avg_tile_batch=3.53, prepare=662.36µs, send=17.225369ms, judge_wait=222.48989ms, fetch=23.810717ms, reduce=19ns; duck time-ns stats: p50=195.121666ms, p90=195.657557ms, max=195.724968ms; kernel_model: matmul=7.222591 GFLOP (36.902 GFLOP/s @ duck_max), param_stream=1.023934G (5.231 Gparam/s @ duck_max), weight_stream=1099.039 MiB (5.888 GB/s @ duck_max) [2026-04-08 08:47:31.687819 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=224, expert_tiles=751, avg_tile_batch=3.49, prepare=657.426µs, send=17.229142ms, judge_wait=221.781378ms, fetch=20.669437ms, reduce=21ns; duck time-ns stats: p50=196.918279ms, p90=197.062471ms, max=197.202415ms; kernel_model: matmul=7.222591 GFLOP (36.625 GFLOP/s @ duck_max), param_stream=1.033568G (5.241 Gparam/s @ duck_max), weight_stream=1109.380 MiB (5.899 GB/s @ duck_max) [2026-04-08 08:47:31.835614 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=188, expert_tiles=437, avg_tile_batch=3.09, prepare=413.309µs, send=10.129248ms, judge_wait=118.197285ms, fetch=11.605199ms, reduce=140ns; duck time-ns stats: p50=105.385811ms, p90=105.48236ms, max=105.548107ms; kernel_model: matmul=3.721396 GFLOP (35.258 GFLOP/s @ duck_max), param_stream=0.601424G (5.698 Gparam/s @ duck_max), weight_stream=645.538 MiB (6.413 GB/s @ duck_max) [2026-04-08 08:47:32.144208 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=234, expert_tiles=758, avg_tile_batch=3.46, prepare=702.997µs, send=19.090831ms, judge_wait=228.965039ms, fetch=20.637575ms, reduce=140ns; duck time-ns stats: p50=202.897463ms, p90=203.191912ms, max=203.414263ms; kernel_model: matmul=7.222591 GFLOP (35.507 GFLOP/s @ duck_max), param_stream=1.043202G (5.128 Gparam/s @ duck_max), weight_stream=1119.720 MiB (5.772 GB/s @ duck_max) [2026-04-08 08:47:32.421906 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=223, expert_tiles=742, avg_tile_batch=3.54, prepare=667.201µs, send=17.225772ms, judge_wait=222.276964ms, fetch=22.874284ms, reduce=132ns; duck time-ns stats: p50=195.615527ms, p90=195.940275ms, max=196.097611ms; kernel_model: matmul=7.222591 GFLOP (36.832 GFLOP/s @ duck_max), param_stream=1.021182G (5.208 Gparam/s @ duck_max), weight_stream=1096.085 MiB (5.861 GB/s @ duck_max) [2026-04-08 08:47:32.695515 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=230, expert_tiles=757, avg_tile_batch=3.47, prepare=660.134µs, send=17.223455ms, judge_wait=220.341386ms, fetch=20.662603ms, reduce=135ns; duck time-ns stats: p50=194.39628ms, p90=194.757315ms, max=195.545085ms; kernel_model: matmul=7.222591 GFLOP (36.936 GFLOP/s @ duck_max), param_stream=1.041826G (5.328 Gparam/s @ duck_max), weight_stream=1118.243 MiB (5.996 GB/s @ duck_max) [2026-04-08 08:47:32.970056 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=223, expert_tiles=743, avg_tile_batch=3.53, prepare=663.464µs, send=17.224759ms, judge_wait=220.315607ms, fetch=21.655845ms, reduce=134ns; duck time-ns stats: p50=193.776119ms, p90=194.177629ms, max=194.324335ms; kernel_model: matmul=7.222591 GFLOP (37.168 GFLOP/s @ duck_max), param_stream=1.022558G (5.262 Gparam/s @ duck_max), weight_stream=1097.562 MiB (5.922 GB/s @ duck_max) [2026-04-08 08:47:33.121768 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=193, expert_tiles=415, avg_tile_batch=3.26, prepare=414.217µs, send=10.180825ms, judge_wait=122.027292ms, fetch=11.591907ms, reduce=20ns; duck time-ns stats: p50=108.269137ms, p90=108.405373ms, max=108.491639ms; kernel_model: matmul=3.721396 GFLOP (34.301 GFLOP/s @ duck_max), param_stream=0.571146G (5.264 Gparam/s @ duck_max), weight_stream=613.039 MiB (5.925 GB/s @ duck_max) [2026-04-08 08:47:33.421546 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=750, avg_tile_batch=3.50, prepare=698.118µs, send=19.042757ms, judge_wait=219.13109ms, fetch=21.66228ms, reduce=146ns; duck time-ns stats: p50=193.76586ms, p90=194.061636ms, max=194.182792ms; kernel_model: matmul=7.222591 GFLOP (37.195 GFLOP/s @ duck_max), param_stream=1.032192G (5.316 Gparam/s @ duck_max), weight_stream=1107.903 MiB (5.983 GB/s @ duck_max) [2026-04-08 08:47:33.695330 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=223, expert_tiles=749, avg_tile_batch=3.50, prepare=666.651µs, send=17.227311ms, judge_wait=219.597172ms, fetch=21.671411ms, reduce=20ns; duck time-ns stats: p50=193.379356ms, p90=193.8325ms, max=194.048755ms; kernel_model: matmul=7.222591 GFLOP (37.220 GFLOP/s @ duck_max), param_stream=1.030816G (5.312 Gparam/s @ duck_max), weight_stream=1106.425 MiB (5.979 GB/s @ duck_max) [2026-04-08 08:47:33.981254 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=236, expert_tiles=755, avg_tile_batch=3.48, prepare=665.223µs, send=17.22484ms, judge_wait=231.618554ms, fetch=21.676864ms, reduce=147ns; duck time-ns stats: p50=204.888555ms, p90=205.240786ms, max=205.403553ms; kernel_model: matmul=7.222591 GFLOP (35.163 GFLOP/s @ duck_max), param_stream=1.039073G (5.059 Gparam/s @ duck_max), weight_stream=1115.289 MiB (5.693 GB/s @ duck_max) [2026-04-08 08:47:34.259977 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=224, expert_tiles=746, avg_tile_batch=3.52, prepare=681.3µs, send=17.21316ms, judge_wait=224.498655ms, fetch=21.670303ms, reduce=134ns; duck time-ns stats: p50=197.976143ms, p90=198.246084ms, max=198.51044ms; kernel_model: matmul=7.222591 GFLOP (36.384 GFLOP/s @ duck_max), param_stream=1.026687G (5.172 Gparam/s @ duck_max), weight_stream=1101.994 MiB (5.821 GB/s @ duck_max) [2026-04-08 08:47:34.408673 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=187, expert_tiles=421, avg_tile_batch=3.21, prepare=422.993µs, send=10.171277ms, judge_wait=119.046032ms, fetch=11.545521ms, reduce=20ns; duck time-ns stats: p50=108.509288ms, p90=108.678056ms, max=108.810283ms; kernel_model: matmul=3.721396 GFLOP (34.201 GFLOP/s @ duck_max), param_stream=0.579404G (5.325 Gparam/s @ duck_max), weight_stream=621.903 MiB (5.993 GB/s @ duck_max) [2026-04-08 08:47:34.708920 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=204, expert_tiles=747, avg_tile_batch=3.51, prepare=697.696µs, send=18.987274ms, judge_wait=219.76783ms, fetch=21.623806ms, reduce=20ns; duck time-ns stats: p50=193.84453ms, p90=194.057056ms, max=194.085011ms; kernel_model: matmul=7.222591 GFLOP (37.214 GFLOP/s @ duck_max), param_stream=1.028063G (5.297 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.962 GB/s @ duck_max) [2026-04-08 08:47:34.984647 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=229, expert_tiles=747, avg_tile_batch=3.51, prepare=662.202µs, send=18.346525ms, judge_wait=220.394689ms, fetch=21.675141ms, reduce=140ns; duck time-ns stats: p50=193.848547ms, p90=194.19758ms, max=194.868166ms; kernel_model: matmul=7.222591 GFLOP (37.064 GFLOP/s @ duck_max), param_stream=1.028063G (5.276 Gparam/s @ duck_max), weight_stream=1103.471 MiB (5.938 GB/s @ duck_max) [2026-04-08 08:47:35.259063 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=219, expert_tiles=742, avg_tile_batch=3.54, prepare=661.337µs, send=17.222224ms, judge_wait=220.166083ms, fetch=21.627621ms, reduce=20ns; duck time-ns stats: p50=193.915566ms, p90=194.159354ms, max=194.219811ms; kernel_model: matmul=7.222591 GFLOP (37.188 GFLOP/s @ duck_max), param_stream=1.021182G (5.258 Gparam/s @ duck_max), weight_stream=1096.085 MiB (5.918 GB/s @ duck_max) [2026-04-08 08:47:35.538379 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=222, expert_tiles=746, avg_tile_batch=3.52, prepare=663.082µs, send=17.223246ms, judge_wait=223.471191ms, fetch=23.267178ms, reduce=139ns; duck time-ns stats: p50=196.267373ms, p90=196.507258ms, max=196.715153ms; kernel_model: matmul=7.222591 GFLOP (36.716 GFLOP/s @ duck_max), param_stream=1.026687G (5.219 Gparam/s @ duck_max), weight_stream=1101.994 MiB (5.874 GB/s @ duck_max) [2026-04-08 08:47:35.686833 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=169, top_k=8, tasks=1352, unique_experts=186, expert_tiles=422, avg_tile_batch=3.20, prepare=414.824µs, send=10.169831ms, judge_wait=118.805646ms, fetch=11.595868ms, reduce=134ns; duck time-ns stats: p50=106.13343ms, p90=106.291204ms, max=106.414635ms; kernel_model: matmul=3.721396 GFLOP (34.971 GFLOP/s @ duck_max), param_stream=0.580780G (5.458 Gparam/s @ duck_max), weight_stream=623.380 MiB (6.143 GB/s @ duck_max) [2026-04-08 08:47:36.018841 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=233, expert_tiles=756, avg_tile_batch=3.47, prepare=3.47662ms, send=19.214175ms, judge_wait=216.537044ms, fetch=21.656163ms, reduce=136ns; duck time-ns stats: p50=192.057639ms, p90=192.295504ms, max=192.56811ms; kernel_model: matmul=7.222591 GFLOP (37.507 GFLOP/s @ duck_max), param_stream=1.040450G (5.403 Gparam/s @ duck_max), weight_stream=1116.766 MiB (6.081 GB/s @ duck_max) [2026-04-08 08:47:36.296160 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=225, expert_tiles=753, avg_tile_batch=3.48, prepare=2.34313ms, send=17.469475ms, judge_wait=219.413218ms, fetch=20.64066ms, reduce=21ns; duck time-ns stats: p50=193.381773ms, p90=193.589287ms, max=193.60794ms; kernel_model: matmul=7.222591 GFLOP (37.305 GFLOP/s @ duck_max), param_stream=1.036321G (5.353 Gparam/s @ duck_max), weight_stream=1112.334 MiB (6.024 GB/s @ duck_max) [2026-04-08 08:47:36.574106 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=223, expert_tiles=742, avg_tile_batch=3.54, prepare=2.967283ms, send=17.225203ms, judge_wait=220.494103ms, fetch=22.354215ms, reduce=20ns; duck time-ns stats: p50=191.869142ms, p90=192.14604ms, max=192.611555ms; kernel_model: matmul=7.222591 GFLOP (37.498 GFLOP/s @ duck_max), param_stream=1.021182G (5.302 Gparam/s @ duck_max), weight_stream=1096.085 MiB (5.967 GB/s @ duck_max) [2026-04-08 08:47:36.851285 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=328, top_k=8, tasks=2624, unique_experts=226, expert_tiles=746, avg_tile_batch=3.52, prepare=3.051003ms, send=18.478788ms, judge_wait=218.758471ms, fetch=21.697972ms, reduce=133ns; duck time-ns stats: p50=192.506271ms, p90=192.800883ms, max=193.08039ms; kernel_model: matmul=7.222591 GFLOP (37.407 GFLOP/s @ duck_max), param_stream=1.026687G (5.317 Gparam/s @ duck_max), weight_stream=1101.994 MiB (5.985 GB/s @ duck_max) [2026-04-08 08:47:36.998137 INFO fp8_moe_dpdk] MoE prefill forward (Rust): batch_size=168, top_k=8, tasks=1344, unique_experts=209, expert_tiles=424, avg_tile_batch=3.17, prepare=1.122607ms, send=10.160734ms, judge_wait=115.005785ms, fetch=11.523819ms, reduce=20ns; duck time-ns stats: p50=101.784439ms, p90=101.927252ms, max=101.997546ms; kernel_model: matmul=3.699376 GFLOP (36.269 GFLOP/s @ duck_max), param_stream=0.583533G (5.721 Gparam/s @ duck_max), weight_stream=626.334 MiB (6.439 GB/s @ duck_max) [2026-04-08 08:47:37.016785 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.241681ms; phases: prepare=5.083µs, send=170.957µs, judge_wait=926.03µs, fetch=99.352µs, reduce=19ns, writeback=407ns; duck time-ns stats: p50=842.188µs, p90=845.222µs, max=850.207µs; effective_read: activated_experts=8, params=0.011010G (12.950 Gparam/s @ duck_max), memory=11.818 MiB (14.575 GB/s @ duck_max), judge_gap=75.823µs, judge_ratio=1.089x [2026-04-08 08:47:37.739853 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.180843ms; phases: prepare=6.548µs, send=248.307µs, judge_wait=789.71µs, fetch=98.365µs, reduce=21ns, writeback=551ns; duck time-ns stats: p50=699.906µs, p90=707.242µs, max=713.493µs; effective_read: activated_experts=8, params=0.011010G (15.431 Gparam/s @ duck_max), memory=11.818 MiB (17.368 GB/s @ duck_max), judge_gap=76.217µs, judge_ratio=1.107x Token # 1: 2020.781ms; value: next_token_ids=tensor([1415], device='cuda:0') mtp accept=1 prop=1415 top1=1415 accp=0.848 next=draft=112036 prop=112036 olap pair=692.8ms serial=1280.6ms gain=587.8ms ratio=0.46 s0=609.2ms s1=671.4ms wait=0.2/45.2ms pred gate=device [2026-04-08 08:47:37.743898 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 984.548µs; phases: prepare=3.946µs, send=62.459µs, judge_wait=786.062µs, fetch=94.394µs, reduce=19ns, writeback=396ns; duck time-ns stats: p50=702.433µs, p90=707.024µs, max=709.745µs; effective_read: activated_experts=8, params=0.011010G (15.513 Gparam/s @ duck_max), memory=11.818 MiB (17.459 GB/s @ duck_max), judge_gap=76.317µs, judge_ratio=1.108x Token # 2: 3.896ms; value: next_token_ids=tensor([112036], device='cuda:0') mtp accept=1 prop=112036 top1=112036 accp=1.000 next=pair draft=49672 prop=49672 pred gate=device [2026-04-08 08:47:37.857232 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 976.748µs; phases: prepare=4.116µs, send=61.304µs, judge_wait=779.443µs, fetch=93.521µs, reduce=20ns, writeback=615ns; duck time-ns stats: p50=693.08µs, p90=699.951µs, max=703.552µs; effective_read: activated_experts=8, params=0.011010G (15.649 Gparam/s @ duck_max), memory=11.818 MiB (17.613 GB/s @ duck_max), judge_gap=75.891µs, judge_ratio=1.108x Token # 3: 113.425ms; value: next_token_ids=tensor([2284], device='cuda:0') mtp accept=0 prop=49672 top1=49672 accp=0.822 next=draft=50492 prop=50492 olap pair=108.1ms serial=189.6ms gain=81.6ms ratio=0.43 s0=4.4ms s1=185.2ms wait=0.1/52.4ms pred gate=device [2026-04-08 08:47:37.972119 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 1.001497ms; phases: prepare=3.875µs, send=62.232µs, judge_wait=786.034µs, fetch=90.528µs, reduce=20ns, writeback=438ns; duck time-ns stats: p50=699.722µs, p90=703.156µs, max=707.634µs; effective_read: activated_experts=8, params=0.011010G (15.559 Gparam/s @ duck_max), memory=11.818 MiB (17.511 GB/s @ duck_max), judge_gap=78.4µs, judge_ratio=1.111x Token # 4: 114.950ms; value: next_token_ids=tensor([50492], device='cuda:0') mtp accept=1 prop=50492 top1=50492 accp=0.916 next=draft=1959 prop=1959 olap pair=109.5ms serial=193.4ms gain=83.9ms ratio=0.43 s0=4.5ms s1=188.9ms wait=0.1/52.1ms pred gate=device [2026-04-08 08:47:37.976091 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 975.231µs; phases: prepare=3.036µs, send=60.612µs, judge_wait=782.597µs, fetch=91.729µs, reduce=20ns, writeback=544ns; duck time-ns stats: p50=697.975µs, p90=702.848µs, max=706.793µs; effective_read: activated_experts=8, params=0.011010G (15.577 Gparam/s @ duck_max), memory=11.818 MiB (17.532 GB/s @ duck_max), judge_gap=75.804µs, judge_ratio=1.107x Token # 5: 3.861ms; value: next_token_ids=tensor([1959], device='cuda:0') mtp accept=1 prop=1959 top1=1959 accp=1.000 next=pair draft=3500 prop=3500 pred gate=device [2026-04-08 08:47:38.091660 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 974.683µs; phases: prepare=3.81µs, send=62.128µs, judge_wait=780.341µs, fetch=90.988µs, reduce=19ns, writeback=514ns; duck time-ns stats: p50=698.688µs, p90=703.926µs, max=705.114µs; effective_read: activated_experts=8, params=0.011010G (15.615 Gparam/s @ duck_max), memory=11.818 MiB (17.574 GB/s @ duck_max), judge_gap=75.227µs, judge_ratio=1.107x Token # 6: 115.612ms; value: next_token_ids=tensor([3500], device='cuda:0') mtp accept=1 prop=3500 top1=3500 accp=1.000 next=draft=5294 prop=5294 olap pair=110.3ms serial=195.3ms gain=85.0ms ratio=0.44 s0=5.5ms s1=189.8ms wait=0.1/51.1ms pred gate=device [2026-04-08 08:47:38.095550 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 968.294µs; phases: prepare=3.281µs, send=62.1µs, judge_wait=774.167µs, fetch=91.383µs, reduce=21ns, writeback=511ns; duck time-ns stats: p50=690.81µs, p90=697.901µs, max=700.758µs; effective_read: activated_experts=8, params=0.011010G (15.712 Gparam/s @ duck_max), memory=11.818 MiB (17.683 GB/s @ duck_max), judge_gap=73.409µs, judge_ratio=1.105x Token # 7: 3.851ms; value: next_token_ids=tensor([5294], device='cuda:0') mtp accept=1 prop=5294 top1=5294 accp=1.000 next=pair draft=84756 prop=84756 pred gate=device [2026-04-08 08:47:38.210853 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 987.941µs; phases: prepare=3.974µs, send=62.041µs, judge_wait=789.796µs, fetch=93.794µs, reduce=21ns, writeback=440ns; duck time-ns stats: p50=695.356µs, p90=706.388µs, max=708.789µs; effective_read: activated_experts=8, params=0.011010G (15.534 Gparam/s @ duck_max), memory=11.818 MiB (17.483 GB/s @ duck_max), judge_gap=81.007µs, judge_ratio=1.114x Token # 8: 115.435ms; value: next_token_ids=tensor([84756], device='cuda:0') mtp accept=1 prop=84756 top1=84756 accp=1.000 next=draft=3467 prop=3467 olap pair=110.0ms serial=194.5ms gain=84.5ms ratio=0.43 s0=5.0ms s1=189.6ms wait=0.1/51.8ms pred gate=device [2026-04-08 08:47:38.214876 INFO fp8_moe_dpdk] MoE forward e2e time (Rust): 978.234µs; phases: prepare=3.325µs, send=61.879µs, judge_wait=783.91µs, fetch=92.155µs, reduce=20ns, writeback=603ns; duck time-ns stats: p50=697.672µs, p90=703.333µs, max=706.889µs; effective_read: activated_experts=8, params=0.011010G (15.575 Gparam/s @ duck_max), memory=11.818 MiB (17.530 GB/s @ duck_max), judge_gap=77.021µs, judge_ratio=1.109x Token # 9: 3.827ms; value: next_token_ids=tensor([3467], device='cuda:0') mtp accept=1 prop=3467 top1=3467 accp=1.000 next=pair draft=1148 prop=1148 pred gate=device Token # 10: 115.684ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=1 prop=1148 top1=1148 accp=1.000 next=draft=6656 prop=6656 olap pair=110.3ms serial=196.0ms gain=85.7ms ratio=0.44 s0=4.3ms s1=191.7ms wait=0.1/52.8ms pred gate=device Token # 11: 3.819ms; value: next_token_ids=tensor([6656], device='cuda:0') mtp accept=1 prop=6656 top1=6656 accp=1.000 next=pair draft=5409 prop=5409 pred gate=device Token # 12: 118.219ms; value: next_token_ids=tensor([5409], device='cuda:0') mtp accept=1 prop=5409 top1=5409 accp=0.962 next=draft=223 prop=223 olap pair=110.6ms serial=195.5ms gain=84.9ms ratio=0.43 s0=6.0ms s1=189.6ms wait=0.2/51.0ms pred gate=device Token # 13: 3.824ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.890 next=pair draft=5149 prop=5149 pred gate=device Token # 14: 115.960ms; value: next_token_ids=tensor([5149], device='cuda:0') mtp accept=1 prop=5149 top1=5149 accp=0.887 next=draft=1057 prop=1057 olap pair=110.6ms serial=196.3ms gain=85.6ms ratio=0.44 s0=3.9ms s1=192.4ms wait=0.1/53.3ms pred gate=device Token # 15: 3.809ms; value: next_token_ids=tensor([1057], device='cuda:0') mtp accept=1 prop=1057 top1=1057 accp=0.999 next=pair draft=22970 prop=22970 pred gate=device Token # 16: 116.700ms; value: next_token_ids=tensor([22970], device='cuda:0') mtp accept=1 prop=22970 top1=22970 accp=0.968 next=draft=428 prop=428 olap pair=111.4ms serial=197.0ms gain=85.6ms ratio=0.43 s0=4.0ms s1=193.0ms wait=0.1/53.3ms pred gate=device Token # 17: 3.868ms; value: next_token_ids=tensor([428], device='cuda:0') mtp accept=1 prop=428 top1=428 accp=1.000 next=pair draft=664 prop=664 pred gate=device Token # 18: 116.165ms; value: next_token_ids=tensor([664], device='cuda:0') mtp accept=1 prop=664 top1=664 accp=1.000 next=draft=999 prop=999 olap pair=110.8ms serial=196.7ms gain=86.0ms ratio=0.44 s0=3.9ms s1=192.8ms wait=0.1/53.4ms pred gate=device Token # 19: 3.874ms; value: next_token_ids=tensor([999], device='cuda:0') mtp accept=1 prop=999 top1=999 accp=1.000 next=pair draft=15 prop=15 pred gate=device Token # 20: 115.667ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=draft=1248 prop=1248 olap pair=110.4ms serial=196.1ms gain=85.7ms ratio=0.44 s0=4.1ms s1=192.0ms wait=0.1/53.1ms pred gate=device Token # 21: 3.920ms; value: next_token_ids=tensor([862], device='cuda:0') mtp accept=0 prop=1248 top1=862 accp=0.007 next=pair draft=79 prop=79 pred gate=device Token # 22: 115.988ms; value: next_token_ids=tensor([79], device='cuda:0') mtp accept=1 prop=79 top1=79 accp=1.000 next=draft=430 prop=430 olap pair=110.6ms serial=196.5ms gain=85.9ms ratio=0.44 s0=4.2ms s1=192.3ms wait=0.1/52.9ms pred gate=device Token # 23: 3.860ms; value: next_token_ids=tensor([430], device='cuda:0') mtp accept=1 prop=430 top1=430 accp=1.000 next=pair draft=44987 prop=44987 pred gate=device Token # 24: 116.169ms; value: next_token_ids=tensor([44987], device='cuda:0') mtp accept=1 prop=44987 top1=44987 accp=0.982 next=draft=303 prop=303 olap pair=110.8ms serial=196.6ms gain=85.8ms ratio=0.44 s0=4.3ms s1=192.3ms wait=0.1/52.7ms pred gate=device Token # 25: 3.822ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.989 next=pair draft=60692 prop=60692 pred gate=device Token # 26: 116.533ms; value: next_token_ids=tensor([1877], device='cuda:0') mtp accept=0 prop=60692 top1=1877 accp=0.082 next=draft=68523 prop=68523 olap pair=111.2ms serial=196.9ms gain=85.7ms ratio=0.44 s0=6.6ms s1=190.3ms wait=0.2/49.9ms pred gate=device Token # 27: 116.675ms; value: next_token_ids=tensor([12799], device='cuda:0') mtp accept=0 prop=68523 top1=12799 accp=0.106 next=draft=27400 prop=27400 olap pair=111.0ms serial=196.4ms gain=85.4ms ratio=0.43 s0=7.0ms s1=189.4ms wait=0.2/50.0ms pred gate=device Token # 28: 116.693ms; value: next_token_ids=tensor([27400], device='cuda:0') mtp accept=1 prop=27400 top1=27400 accp=0.995 next=draft=116863 prop=11799 olap pair=111.3ms serial=196.9ms gain=85.6ms ratio=0.43 s0=4.0ms s1=192.9ms wait=0.1/53.1ms pred gate=device Token # 29: 3.815ms; value: next_token_ids=tensor([11799], device='cuda:0') mtp accept=1 prop=11799 top1=11799 accp=0.128 next=pair draft=19 prop=19 pred gate=device Token # 30: 116.070ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=0.993 next=draft=4792 prop=4792 olap pair=110.7ms serial=196.1ms gain=85.4ms ratio=0.44 s0=4.2ms s1=191.9ms wait=0.1/52.8ms pred gate=device Token # 31: 3.897ms; value: next_token_ids=tensor([4792], device='cuda:0') mtp accept=1 prop=4792 top1=4792 accp=0.707 next=pair draft=23085 prop=23085 pred gate=device Token # 32: 116.367ms; value: next_token_ids=tensor([23085], device='cuda:0') mtp accept=1 prop=23085 top1=23085 accp=1.000 next=draft=58 prop=58 olap pair=111.0ms serial=197.1ms gain=86.1ms ratio=0.44 s0=4.3ms s1=192.8ms wait=0.1/52.9ms pred gate=device Token # 33: 3.828ms; value: next_token_ids=tensor([58], device='cuda:0') mtp accept=1 prop=58 top1=58 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 34: 116.430ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.836 next=draft=24091 prop=24091 olap pair=111.0ms serial=196.2ms gain=85.2ms ratio=0.43 s0=4.3ms s1=191.9ms wait=0.1/52.9ms pred gate=device Token # 35: 3.824ms; value: next_token_ids=tensor([24091], device='cuda:0') mtp accept=1 prop=24091 top1=24091 accp=1.000 next=pair draft=18 prop=18 pred gate=device Token # 36: 117.207ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=940 prop=940 olap pair=111.8ms serial=197.3ms gain=85.5ms ratio=0.43 s0=4.6ms s1=192.7ms wait=0.1/52.5ms pred gate=device Token # 37: 3.881ms; value: next_token_ids=tensor([940], device='cuda:0') mtp accept=1 prop=940 top1=940 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 38: 115.968ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=2111 prop=2111 olap pair=110.5ms serial=196.3ms gain=85.7ms ratio=0.44 s0=3.9ms s1=192.3ms wait=0.1/53.3ms pred gate=device Token # 39: 3.862ms; value: next_token_ids=tensor([2111], device='cuda:0') mtp accept=1 prop=2111 top1=2111 accp=1.000 next=pair draft=4792 prop=4792 pred gate=device Token # 40: 116.901ms; value: next_token_ids=tensor([4792], device='cuda:0') mtp accept=1 prop=4792 top1=4792 accp=1.000 next=draft=48 prop=48 olap pair=110.8ms serial=196.0ms gain=85.2ms ratio=0.43 s0=7.6ms s1=188.5ms wait=0.2/49.0ms pred gate=device Token # 41: 4.882ms; value: next_token_ids=tensor([48], device='cuda:0') mtp accept=1 prop=48 top1=48 accp=1.000 next=pair draft=1457 prop=1457 pred gate=device Token # 42: 116.513ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=draft=301 prop=301 olap pair=111.0ms serial=196.9ms gain=85.8ms ratio=0.44 s0=6.8ms s1=190.1ms wait=0.2/50.1ms pred gate=device Token # 43: 3.889ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=1.000 next=pair draft=116863 prop=116863 pred gate=device Token # 44: 117.125ms; value: next_token_ids=tensor([116863], device='cuda:0') mtp accept=1 prop=116863 top1=116863 accp=1.000 next=draft=3699 prop=3699 olap pair=111.7ms serial=198.4ms gain=86.7ms ratio=0.44 s0=3.8ms s1=194.6ms wait=0.1/53.5ms pred gate=device Token # 45: 3.913ms; value: next_token_ids=tensor([3699], device='cuda:0') mtp accept=1 prop=3699 top1=3699 accp=1.000 next=pair draft=47 prop=47 pred gate=device Token # 46: 115.915ms; value: next_token_ids=tensor([47], device='cuda:0') mtp accept=1 prop=47 top1=47 accp=1.000 next=draft=36101 prop=36101 olap pair=110.5ms serial=196.3ms gain=85.7ms ratio=0.44 s0=3.7ms s1=192.5ms wait=0.1/53.4ms pred gate=device Token # 47: 3.886ms; value: next_token_ids=tensor([36101], device='cuda:0') mtp accept=1 prop=36101 top1=36101 accp=1.000 next=pair draft=2971 prop=2971 pred gate=device Token # 48: 116.580ms; value: next_token_ids=tensor([2971], device='cuda:0') mtp accept=1 prop=2971 top1=2971 accp=1.000 next=draft=320 prop=320 olap pair=111.2ms serial=197.6ms gain=86.4ms ratio=0.44 s0=4.5ms s1=193.0ms wait=0.2/52.0ms pred gate=device Token # 49: 3.816ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.951 next=pair draft=28669 prop=28669 pred gate=device Token # 50: 116.437ms; value: next_token_ids=tensor([28669], device='cuda:0') mtp accept=1 prop=28669 top1=3500 accp=0.270 next=draft=4427 prop=4427 olap pair=110.8ms serial=196.3ms gain=85.5ms ratio=0.44 s0=5.4ms s1=190.9ms wait=0.2/51.1ms pred gate=device Token # 51: 3.871ms; value: next_token_ids=tensor([4427], device='cuda:0') mtp accept=1 prop=4427 top1=4427 accp=0.999 next=pair draft=18876 prop=18876 pred gate=device Token # 52: 115.803ms; value: next_token_ids=tensor([18876], device='cuda:0') mtp accept=1 prop=18876 top1=18876 accp=1.000 next=draft=3500 prop=3500 olap pair=110.3ms serial=195.9ms gain=85.6ms ratio=0.44 s0=4.3ms s1=191.6ms wait=0.1/52.4ms pred gate=device Token # 53: 3.849ms; value: next_token_ids=tensor([3500], device='cuda:0') mtp accept=1 prop=3500 top1=3500 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 54: 116.046ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=303 accp=0.230 next=draft=1877 prop=1909 olap pair=110.7ms serial=196.8ms gain=86.0ms ratio=0.44 s0=4.4ms s1=192.4ms wait=0.1/52.3ms pred gate=device Token # 55: 3.822ms; value: next_token_ids=tensor([1909], device='cuda:0') mtp accept=1 prop=1909 top1=1877 accp=0.529 next=pair draft=68523 prop=68523 pred gate=device Token # 56: 116.094ms; value: next_token_ids=tensor([68523], device='cuda:0') mtp accept=1 prop=68523 top1=68523 accp=0.907 next=draft=24935 prop=62884 olap pair=110.7ms serial=196.4ms gain=85.7ms ratio=0.44 s0=4.0ms s1=192.4ms wait=0.1/53.3ms pred gate=device Token # 57: 3.829ms; value: next_token_ids=tensor([62884], device='cuda:0') mtp accept=1 prop=62884 top1=62884 accp=0.130 next=pair draft=76399 prop=76399 pred gate=device Token # 58: 116.658ms; value: next_token_ids=tensor([76399], device='cuda:0') mtp accept=1 prop=76399 top1=76399 accp=0.886 next=draft=14668 prop=14668 olap pair=111.2ms serial=197.3ms gain=86.2ms ratio=0.44 s0=4.4ms s1=193.0ms wait=0.1/52.6ms pred gate=device Token # 59: 3.821ms; value: next_token_ids=tensor([14668], device='cuda:0') mtp accept=1 prop=14668 top1=14668 accp=1.000 next=pair draft=548 prop=548 pred gate=device Token # 60: 116.691ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=1 prop=548 top1=548 accp=0.796 next=draft=29305 prop=29305 olap pair=111.3ms serial=197.5ms gain=86.2ms ratio=0.44 s0=4.4ms s1=193.0ms wait=0.1/52.4ms pred gate=device Token # 61: 3.931ms; value: next_token_ids=tensor([29305], device='cuda:0') mtp accept=1 prop=29305 top1=29305 accp=1.000 next=pair draft=4981 prop=4981 pred gate=device Token # 62: 116.294ms; value: next_token_ids=tensor([4981], device='cuda:0') mtp accept=1 prop=4981 top1=4981 accp=1.000 next=draft=61137 prop=61137 olap pair=110.9ms serial=196.4ms gain=85.6ms ratio=0.44 s0=5.8ms s1=190.6ms wait=0.2/50.6ms pred gate=device Token # 63: 3.822ms; value: next_token_ids=tensor([61137], device='cuda:0') mtp accept=1 prop=61137 top1=61137 accp=1.000 next=pair draft=15121 prop=15121 pred gate=device Token # 64: 116.486ms; value: next_token_ids=tensor([15121], device='cuda:0') mtp accept=1 prop=15121 top1=15121 accp=1.000 next=draft=303 prop=303 olap pair=111.0ms serial=197.1ms gain=86.1ms ratio=0.44 s0=4.1ms s1=193.0ms wait=0.1/53.0ms pred gate=device Token # 65: 3.835ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=3440 prop=2541 pred gate=device Token # 66: 115.594ms; value: next_token_ids=tensor([4602], device='cuda:0') mtp accept=0 prop=2541 top1=3440 accp=0.932 next=draft=69401 prop=69401 olap pair=110.2ms serial=195.4ms gain=85.2ms ratio=0.44 s0=4.4ms s1=191.0ms wait=0.1/52.5ms pred gate=device Token # 67: 116.505ms; value: next_token_ids=tensor([44287], device='cuda:0') mtp accept=0 prop=69401 top1=44287 accp=0.152 next=draft=1472 prop=1472 olap pair=111.0ms serial=197.0ms gain=86.0ms ratio=0.44 s0=4.5ms s1=192.6ms wait=0.1/52.3ms pred gate=device Token # 68: 117.271ms; value: next_token_ids=tensor([1472], device='cuda:0') mtp accept=1 prop=1472 top1=1472 accp=1.000 next=draft=82821 prop=82821 olap pair=111.7ms serial=197.5ms gain=85.7ms ratio=0.43 s0=4.4ms s1=193.1ms wait=0.1/52.8ms pred gate=device Token # 69: 3.913ms; value: next_token_ids=tensor([82821], device='cuda:0') mtp accept=1 prop=82821 top1=82821 accp=0.992 next=pair draft=2111 prop=2111 pred gate=device Token # 70: 117.239ms; value: next_token_ids=tensor([2111], device='cuda:0') mtp accept=1 prop=2111 top1=2111 accp=1.000 next=draft=2426 prop=2426 olap pair=111.8ms serial=198.1ms gain=86.3ms ratio=0.44 s0=4.1ms s1=194.1ms wait=0.1/53.1ms pred gate=device Token # 71: 3.805ms; value: next_token_ids=tensor([2426], device='cuda:0') mtp accept=1 prop=2426 top1=2426 accp=0.997 next=pair draft=94353 prop=94353 pred gate=device Token # 72: 117.189ms; value: next_token_ids=tensor([94353], device='cuda:0') mtp accept=1 prop=94353 top1=94353 accp=1.000 next=draft=471 prop=471 olap pair=111.0ms serial=196.3ms gain=85.3ms ratio=0.43 s0=6.9ms s1=189.4ms wait=0.2/49.9ms pred gate=device Token # 73: 4.830ms; value: next_token_ids=tensor([471], device='cuda:0') mtp accept=1 prop=471 top1=471 accp=1.000 next=pair draft=1457 prop=1457 pred gate=device Token # 74: 115.657ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=draft=844 prop=844 olap pair=110.3ms serial=195.8ms gain=85.5ms ratio=0.44 s0=4.2ms s1=191.5ms wait=0.1/52.7ms pred gate=device Token # 75: 3.827ms; value: next_token_ids=tensor([844], device='cuda:0') mtp accept=1 prop=844 top1=844 accp=1.000 next=pair draft=13880 prop=13880 pred gate=device Token # 76: 116.408ms; value: next_token_ids=tensor([13880], device='cuda:0') mtp accept=1 prop=13880 top1=13880 accp=1.000 next=draft=303 prop=303 olap pair=111.0ms serial=196.0ms gain=85.0ms ratio=0.43 s0=4.2ms s1=191.9ms wait=0.1/53.1ms pred gate=device Token # 77: 3.813ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.998 next=pair draft=2490 prop=2490 pred gate=device Token # 78: 116.670ms; value: next_token_ids=tensor([2490], device='cuda:0') mtp accept=1 prop=2490 top1=2490 accp=1.000 next=draft=69401 prop=69401 olap pair=111.3ms serial=196.6ms gain=85.3ms ratio=0.43 s0=4.1ms s1=192.5ms wait=0.1/53.1ms pred gate=device Token # 79: 3.876ms; value: next_token_ids=tensor([69401], device='cuda:0') mtp accept=1 prop=69401 top1=69401 accp=1.000 next=pair draft=2051 prop=2051 pred gate=device Token # 80: 116.040ms; value: next_token_ids=tensor([2051], device='cuda:0') mtp accept=1 prop=2051 top1=2051 accp=1.000 next=draft=9511 prop=9511 olap pair=110.5ms serial=196.0ms gain=85.5ms ratio=0.44 s0=4.2ms s1=191.8ms wait=0.1/52.8ms pred gate=device Token # 81: 3.899ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=9511 top1=303 accp=0.052 next=pair draft=24935 prop=24935 pred gate=device Token # 82: 116.292ms; value: next_token_ids=tensor([8563], device='cuda:0') mtp accept=0 prop=24935 top1=8563 accp=0.146 next=draft=24935 prop=24935 olap pair=110.9ms serial=196.3ms gain=85.5ms ratio=0.44 s0=4.4ms s1=192.0ms wait=0.1/52.4ms pred gate=device Token # 83: 116.500ms; value: next_token_ids=tensor([24935], device='cuda:0') mtp accept=1 prop=24935 top1=24935 accp=0.871 next=draft=53091 prop=53091 olap pair=111.1ms serial=197.2ms gain=86.1ms ratio=0.44 s0=3.9ms s1=193.2ms wait=0.1/53.3ms pred gate=device Token # 84: 3.895ms; value: next_token_ids=tensor([53091], device='cuda:0') mtp accept=1 prop=53091 top1=53091 accp=0.963 next=pair draft=4374 prop=4374 pred gate=device Token # 85: 116.522ms; value: next_token_ids=tensor([4374], device='cuda:0') mtp accept=1 prop=4374 top1=4374 accp=1.000 next=draft=1465 prop=1465 olap pair=111.1ms serial=197.4ms gain=86.3ms ratio=0.44 s0=3.9ms s1=193.5ms wait=0.1/53.3ms pred gate=device Token # 86: 3.967ms; value: next_token_ids=tensor([1465], device='cuda:0') mtp accept=1 prop=1465 top1=1465 accp=1.000 next=pair draft=13582 prop=13582 pred gate=device Token # 87: 116.215ms; value: next_token_ids=tensor([13582], device='cuda:0') mtp accept=1 prop=13582 top1=13582 accp=1.000 next=draft=21 prop=21 olap pair=110.8ms serial=196.7ms gain=85.9ms ratio=0.44 s0=4.2ms s1=192.5ms wait=0.1/52.8ms pred gate=device Token # 88: 3.813ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 89: 116.163ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=20 prop=20 olap pair=110.9ms serial=196.6ms gain=85.8ms ratio=0.44 s0=4.8ms s1=191.8ms wait=0.1/51.5ms pred gate=device Token # 90: 3.844ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=1237 prop=1237 pred gate=device Token # 91: 116.376ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=draft=28769 prop=28769 olap pair=111.0ms serial=197.1ms gain=86.1ms ratio=0.44 s0=4.5ms s1=192.6ms wait=0.1/52.2ms pred gate=device Token # 92: 3.850ms; value: next_token_ids=tensor([28769], device='cuda:0') mtp accept=1 prop=28769 top1=28769 accp=1.000 next=pair draft=36 prop=36 pred gate=device Token # 93: 116.035ms; value: next_token_ids=tensor([36], device='cuda:0') mtp accept=1 prop=36 top1=36 accp=1.000 next=draft=10602 prop=10602 olap pair=110.6ms serial=196.4ms gain=85.8ms ratio=0.44 s0=4.4ms s1=192.0ms wait=0.1/52.3ms pred gate=device Token # 94: 3.796ms; value: next_token_ids=tensor([10602], device='cuda:0') mtp accept=1 prop=10602 top1=10602 accp=1.000 next=pair draft=1227 prop=1227 pred gate=device Token # 95: 116.161ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=1 prop=1227 top1=1227 accp=0.994 next=draft=111671 prop=93130 olap pair=110.8ms serial=196.5ms gain=85.7ms ratio=0.44 s0=4.5ms s1=192.0ms wait=0.1/52.1ms pred gate=device Token # 96: 3.886ms; value: next_token_ids=tensor([88170], device='cuda:0') mtp accept=0 prop=93130 top1=88170 accp=0.084 next=pair draft=4891 prop=4891 pred gate=device Token # 97: 115.480ms; value: next_token_ids=tensor([4891], device='cuda:0') mtp accept=1 prop=4891 top1=4891 accp=1.000 next=draft=90974 prop=90974 olap pair=110.0ms serial=195.2ms gain=85.2ms ratio=0.44 s0=4.5ms s1=190.7ms wait=0.1/52.2ms pred gate=device Token # 98: 3.858ms; value: next_token_ids=tensor([90974], device='cuda:0') mtp accept=1 prop=90974 top1=90974 accp=1.000 next=pair draft=6772 prop=6772 pred gate=device Token # 99: 115.791ms; value: next_token_ids=tensor([6772], device='cuda:0') mtp accept=1 prop=6772 top1=6772 accp=0.999 next=draft=856 prop=856 olap pair=110.4ms serial=195.8ms gain=85.4ms ratio=0.44 s0=4.5ms s1=191.4ms wait=0.1/52.0ms pred gate=device Token # 100: 3.861ms; value: next_token_ids=tensor([856], device='cuda:0') mtp accept=1 prop=856 top1=856 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 101: 116.138ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=14882 prop=14882 olap pair=110.8ms serial=196.6ms gain=85.9ms ratio=0.44 s0=4.4ms s1=192.2ms wait=0.1/52.1ms pred gate=device Token # 102: 3.754ms; value: next_token_ids=tensor([14882], device='cuda:0') mtp accept=1 prop=14882 top1=14882 accp=1.000 next=pair draft=28828 prop=28828 pred gate=device Token # 103: 116.454ms; value: next_token_ids=tensor([28828], device='cuda:0') mtp accept=1 prop=28828 top1=28828 accp=1.000 next=draft=2283 prop=2283 olap pair=111.1ms serial=197.6ms gain=86.5ms ratio=0.44 s0=4.5ms s1=193.1ms wait=0.1/47.1ms pred gate=device Token # 104: 3.752ms; value: next_token_ids=tensor([2283], device='cuda:0') mtp accept=1 prop=2283 top1=2283 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 105: 116.191ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=47 prop=47 olap pair=110.8ms serial=197.1ms gain=86.3ms ratio=0.44 s0=4.8ms s1=192.3ms wait=0.1/46.6ms pred gate=device Token # 106: 3.861ms; value: next_token_ids=tensor([47], device='cuda:0') mtp accept=1 prop=47 top1=47 accp=0.999 next=pair draft=8835 prop=8835 pred gate=device Token # 107: 116.001ms; value: next_token_ids=tensor([8835], device='cuda:0') mtp accept=1 prop=8835 top1=8835 accp=1.000 next=draft=19 prop=19 olap pair=110.7ms serial=196.8ms gain=86.1ms ratio=0.44 s0=4.2ms s1=192.6ms wait=0.1/47.9ms pred gate=device Token # 108: 3.786ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=pair draft=7402 prop=7402 pred gate=device Token # 109: 116.357ms; value: next_token_ids=tensor([7402], device='cuda:0') mtp accept=1 prop=7402 top1=7402 accp=1.000 next=draft=28561 prop=28561 olap pair=111.0ms serial=197.2ms gain=86.3ms ratio=0.44 s0=4.0ms s1=193.2ms wait=0.1/48.1ms pred gate=device Token # 110: 3.834ms; value: next_token_ids=tensor([2295], device='cuda:0') mtp accept=0 prop=28561 top1=2295 accp=0.134 next=pair draft=5198 prop=5198 pred gate=device Token # 111: 116.246ms; value: next_token_ids=tensor([5198], device='cuda:0') mtp accept=1 prop=5198 top1=5198 accp=1.000 next=draft=16 prop=16 olap pair=110.9ms serial=196.5ms gain=85.6ms ratio=0.44 s0=4.7ms s1=191.8ms wait=0.1/47.2ms pred gate=device Token # 112: 3.861ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=pair draft=19 prop=19 pred gate=device Token # 113: 116.459ms; value: next_token_ids=tensor([19], device='cuda:0') mtp accept=1 prop=19 top1=19 accp=1.000 next=draft=27181 prop=27181 olap pair=111.2ms serial=197.7ms gain=86.5ms ratio=0.44 s0=4.0ms s1=193.7ms wait=0.1/48.1ms pred gate=device Token # 114: 3.749ms; value: next_token_ids=tensor([27181], device='cuda:0') mtp accept=1 prop=27181 top1=27181 accp=0.896 next=pair draft=39932 prop=2056 pred gate=device Token # 115: 117.540ms; value: next_token_ids=tensor([2056], device='cuda:0') mtp accept=1 prop=2056 top1=2056 accp=0.907 next=draft=768 prop=3099 olap pair=111.6ms serial=198.3ms gain=86.7ms ratio=0.44 s0=4.1ms s1=194.1ms wait=0.1/48.0ms pred gate=device Token # 116: 4.605ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=0 prop=3099 top1=768 accp=0.627 next=pair draft=1959 prop=1959 pred gate=device Token # 117: 116.543ms; value: next_token_ids=tensor([1300], device='cuda:0') mtp accept=0 prop=1959 top1=1300 accp=0.174 next=draft=5402 prop=5402 olap pair=111.1ms serial=197.3ms gain=86.2ms ratio=0.44 s0=5.3ms s1=192.1ms wait=0.2/46.3ms pred gate=device Token # 118: 116.641ms; value: next_token_ids=tensor([5402], device='cuda:0') mtp accept=1 prop=5402 top1=5402 accp=0.995 next=draft=4538 prop=8579 olap pair=111.3ms serial=197.7ms gain=86.4ms ratio=0.44 s0=5.0ms s1=192.7ms wait=0.1/46.3ms pred gate=device Token # 119: 3.746ms; value: next_token_ids=tensor([8579], device='cuda:0') mtp accept=1 prop=8579 top1=8579 accp=0.911 next=pair draft=1959 prop=1959 pred gate=device Token # 120: 116.259ms; value: next_token_ids=tensor([1959], device='cuda:0') mtp accept=1 prop=1959 top1=1959 accp=0.999 next=draft=3500 prop=3500 olap pair=110.9ms serial=196.9ms gain=86.0ms ratio=0.44 s0=5.0ms s1=191.9ms wait=0.1/46.4ms pred gate=device Token # 121: 3.735ms; value: next_token_ids=tensor([3500], device='cuda:0') mtp accept=1 prop=3500 top1=3500 accp=0.999 next=pair draft=5294 prop=389 pred gate=device Token # 122: 116.541ms; value: next_token_ids=tensor([5294], device='cuda:0') mtp accept=0 prop=389 top1=5294 accp=0.744 next=draft=84756 prop=84756 olap pair=111.2ms serial=197.5ms gain=86.3ms ratio=0.44 s0=5.0ms s1=192.5ms wait=0.1/46.4ms pred gate=device Token # 123: 115.844ms; value: next_token_ids=tensor([84756], device='cuda:0') mtp accept=1 prop=84756 top1=84756 accp=0.989 next=draft=2920 prop=2920 olap pair=110.5ms serial=196.1ms gain=85.6ms ratio=0.44 s0=5.0ms s1=191.1ms wait=0.1/46.4ms pred gate=device Token # 124: 3.761ms; value: next_token_ids=tensor([1148], device='cuda:0') mtp accept=0 prop=2920 top1=1148 accp=0.163 next=pair draft=6656 prop=6656 pred gate=device Token # 125: 115.993ms; value: next_token_ids=tensor([6656], device='cuda:0') mtp accept=1 prop=6656 top1=6656 accp=0.950 next=draft=4497 prop=4497 olap pair=110.7ms serial=196.6ms gain=85.9ms ratio=0.44 s0=4.7ms s1=191.9ms wait=0.1/46.8ms pred gate=device Token # 126: 3.749ms; value: next_token_ids=tensor([4497], device='cuda:0') mtp accept=1 prop=4497 top1=4497 accp=1.000 next=pair draft=39932 prop=39932 pred gate=device Token # 127: 116.253ms; value: next_token_ids=tensor([39932], device='cuda:0') mtp accept=1 prop=39932 top1=39932 accp=0.830 next=draft=3803 prop=3803 olap pair=111.0ms serial=197.4ms gain=86.4ms ratio=0.44 s0=3.9ms s1=193.5ms wait=0.1/48.2ms pred gate=device Token # 128: 3.745ms; value: next_token_ids=tensor([3803], device='cuda:0') mtp accept=1 prop=3803 top1=3803 accp=0.771 next=pair draft=1959 prop=1959 pred gate=device Token # 129: 116.563ms; value: next_token_ids=tensor([1959], device='cuda:0') mtp accept=1 prop=1959 top1=1959 accp=0.996 next=draft=22611 prop=22611 olap pair=111.3ms serial=198.0ms gain=86.7ms ratio=0.44 s0=3.9ms s1=194.2ms wait=0.1/48.1ms pred gate=device Token # 130: 3.750ms; value: next_token_ids=tensor([22611], device='cuda:0') mtp accept=1 prop=22611 top1=22611 accp=0.996 next=pair draft=55498 prop=55498 pred gate=device Token # 131: 116.476ms; value: next_token_ids=tensor([55498], device='cuda:0') mtp accept=1 prop=55498 top1=12799 accp=0.279 next=draft=320 prop=320 olap pair=111.2ms serial=197.8ms gain=86.6ms ratio=0.44 s0=3.9ms s1=193.9ms wait=0.1/48.2ms pred gate=device Token # 132: 3.790ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.999 next=pair draft=7346 prop=7346 pred gate=device Token # 133: 115.364ms; value: next_token_ids=tensor([7346], device='cuda:0') mtp accept=1 prop=7346 top1=7346 accp=0.407 next=draft=303 prop=303 olap pair=110.1ms serial=195.6ms gain=85.5ms ratio=0.44 s0=3.8ms s1=191.8ms wait=0.1/48.5ms pred gate=device Token # 134: 3.758ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=4754 prop=4754 pred gate=device Token # 135: 115.322ms; value: next_token_ids=tensor([3500], device='cuda:0') mtp accept=0 prop=4754 top1=20808 accp=0.252 next=draft=12799 prop=12799 olap pair=110.1ms serial=195.6ms gain=85.4ms ratio=0.44 s0=4.7ms s1=190.9ms wait=0.1/46.8ms pred gate=device Token # 136: 116.001ms; value: next_token_ids=tensor([12799], device='cuda:0') mtp accept=1 prop=12799 top1=12799 accp=0.998 next=draft=525 prop=525 olap pair=110.7ms serial=196.4ms gain=85.8ms ratio=0.44 s0=5.0ms s1=191.4ms wait=0.1/46.4ms pred gate=device Token # 137: 3.726ms; value: next_token_ids=tensor([9128], device='cuda:0') mtp accept=0 prop=525 top1=9128 accp=0.268 next=pair draft=28057 prop=28057 pred gate=device Token # 138: 115.274ms; value: next_token_ids=tensor([3907], device='cuda:0') mtp accept=0 prop=28057 top1=3907 accp=0.013 next=draft=116863 prop=116863 olap pair=110.0ms serial=195.3ms gain=85.2ms ratio=0.44 s0=5.0ms s1=190.2ms wait=0.1/46.3ms pred gate=device Token # 139: 115.746ms; value: next_token_ids=tensor([1742], device='cuda:0') mtp accept=0 prop=116863 top1=1742 accp=0.000 next=draft=301 prop=301 olap pair=110.4ms serial=196.1ms gain=85.6ms ratio=0.44 s0=5.1ms s1=191.0ms wait=0.1/46.3ms pred gate=device Token # 140: 116.130ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=0.999 next=draft=28057 prop=28057 olap pair=110.8ms serial=196.7ms gain=86.0ms ratio=0.44 s0=4.9ms s1=191.8ms wait=0.1/46.4ms pred gate=device Token # 141: 3.740ms; value: next_token_ids=tensor([28057], device='cuda:0') mtp accept=1 prop=28057 top1=28057 accp=1.000 next=pair draft=12145 prop=12145 pred gate=device Token # 142: 116.319ms; value: next_token_ids=tensor([7112], device='cuda:0') mtp accept=0 prop=12145 top1=7112 accp=0.107 next=draft=768 prop=768 olap pair=110.9ms serial=197.0ms gain=86.0ms ratio=0.44 s0=4.5ms s1=192.5ms wait=0.1/47.4ms pred gate=device Token # 143: 116.827ms; value: next_token_ids=tensor([768], device='cuda:0') mtp accept=1 prop=768 top1=768 accp=1.000 next=draft=2541 prop=2541 olap pair=111.5ms serial=198.2ms gain=86.7ms ratio=0.44 s0=3.9ms s1=194.3ms wait=0.1/48.4ms pred gate=device Token # 144: 3.698ms; value: next_token_ids=tensor([1057], device='cuda:0') mtp accept=0 prop=2541 top1=1057 accp=0.079 next=pair draft=22885 prop=22885 pred gate=device Token # 145: 116.299ms; value: next_token_ids=tensor([22885], device='cuda:0') mtp accept=1 prop=22885 top1=22885 accp=0.804 next=draft=90738 prop=90738 olap pair=111.0ms serial=197.2ms gain=86.2ms ratio=0.44 s0=4.4ms s1=192.8ms wait=0.1/47.6ms pred gate=device Token # 146: 3.770ms; value: next_token_ids=tensor([90738], device='cuda:0') mtp accept=1 prop=90738 top1=90738 accp=0.963 next=pair draft=1237 prop=1237 pred gate=device Token # 147: 116.450ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=draft=23085 prop=23085 olap pair=111.1ms serial=197.6ms gain=86.5ms ratio=0.44 s0=3.9ms s1=193.7ms wait=0.1/48.4ms pred gate=device Token # 148: 3.775ms; value: next_token_ids=tensor([23085], device='cuda:0') mtp accept=1 prop=23085 top1=23085 accp=1.000 next=pair draft=58 prop=58 pred gate=device Token # 149: 116.073ms; value: next_token_ids=tensor([58], device='cuda:0') mtp accept=1 prop=58 top1=58 accp=1.000 next=draft=223 prop=223 olap pair=110.8ms serial=196.9ms gain=86.1ms ratio=0.44 s0=4.4ms s1=192.6ms wait=0.1/47.4ms pred gate=device Token # 150: 3.829ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=pair draft=24091 prop=24091 pred gate=device Token # 151: 116.648ms; value: next_token_ids=tensor([24091], device='cuda:0') mtp accept=1 prop=24091 top1=24091 accp=1.000 next=draft=18 prop=18 olap pair=111.4ms serial=197.5ms gain=86.1ms ratio=0.44 s0=5.9ms s1=191.7ms wait=0.2/45.6ms pred gate=device Token # 152: 3.739ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=pair draft=1227 prop=303 pred gate=device Token # 153: 115.575ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.762 next=draft=20159 prop=4621 olap pair=110.3ms serial=196.0ms gain=85.7ms ratio=0.44 s0=4.5ms s1=191.5ms wait=0.1/47.1ms pred gate=device Token # 154: 3.709ms; value: next_token_ids=tensor([20159], device='cuda:0') mtp accept=0 prop=4621 top1=20159 accp=0.879 next=pair draft=389 prop=389 pred gate=device Token # 155: 115.898ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=0.999 next=draft=23968 prop=23968 olap pair=110.6ms serial=196.6ms gain=86.0ms ratio=0.44 s0=4.5ms s1=192.1ms wait=0.1/47.1ms pred gate=device Token # 156: 3.718ms; value: next_token_ids=tensor([23968], device='cuda:0') mtp accept=1 prop=23968 top1=23968 accp=0.970 next=pair draft=48 prop=48 pred gate=device Token # 157: 115.993ms; value: next_token_ids=tensor([90738], device='cuda:0') mtp accept=0 prop=48 top1=90738 accp=0.017 next=draft=303 prop=303 olap pair=110.7ms serial=196.6ms gain=85.9ms ratio=0.44 s0=4.6ms s1=192.0ms wait=0.1/46.9ms pred gate=device Token # 158: 116.184ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.922 next=draft=1207 prop=1207 olap pair=110.8ms serial=196.9ms gain=86.1ms ratio=0.44 s0=4.6ms s1=192.4ms wait=0.1/47.1ms pred gate=device Token # 159: 3.724ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=1 prop=1207 top1=1207 accp=0.705 next=pair draft=2431 prop=2431 pred gate=device Token # 160: 115.924ms; value: next_token_ids=tensor([15206], device='cuda:0') mtp accept=0 prop=2431 top1=15206 accp=0.200 next=draft=48 prop=48 olap pair=110.7ms serial=196.8ms gain=86.1ms ratio=0.44 s0=4.5ms s1=192.3ms wait=0.1/47.0ms pred gate=device Token # 161: 116.350ms; value: next_token_ids=tensor([17899], device='cuda:0') mtp accept=0 prop=48 top1=17899 accp=0.002 next=draft=301 prop=301 olap pair=111.0ms serial=197.5ms gain=86.4ms ratio=0.44 s0=4.5ms s1=193.0ms wait=0.1/47.1ms pred gate=device Token # 162: 117.222ms; value: next_token_ids=tensor([1316], device='cuda:0') mtp accept=0 prop=301 top1=1316 accp=0.312 next=draft=20159 prop=20159 olap pair=111.2ms serial=197.0ms gain=85.9ms ratio=0.44 s0=6.4ms s1=190.6ms wait=0.2/45.0ms pred gate=device Token # 163: 116.569ms; value: next_token_ids=tensor([20159], device='cuda:0') mtp accept=1 prop=20159 top1=20159 accp=0.674 next=draft=301 prop=301 olap pair=111.0ms serial=196.9ms gain=85.9ms ratio=0.44 s0=5.5ms s1=191.3ms wait=0.2/46.4ms pred gate=device Token # 164: 3.782ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=0 prop=301 top1=1227 accp=0.142 next=pair draft=548 prop=548 pred gate=device Token # 165: 116.415ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=1 prop=548 top1=548 accp=0.997 next=draft=2111 prop=2111 olap pair=111.0ms serial=197.3ms gain=86.3ms ratio=0.44 s0=4.8ms s1=192.5ms wait=0.1/46.7ms pred gate=device Token # 166: 3.726ms; value: next_token_ids=tensor([6365], device='cuda:0') mtp accept=0 prop=2111 top1=6365 accp=0.245 next=pair draft=2467 prop=2467 pred gate=device Token # 167: 115.219ms; value: next_token_ids=tensor([2467], device='cuda:0') mtp accept=1 prop=2467 top1=2467 accp=0.977 next=draft=99924 prop=99924 olap pair=109.9ms serial=195.2ms gain=85.4ms ratio=0.44 s0=3.9ms s1=191.3ms wait=0.1/48.2ms pred gate=device Token # 168: 3.831ms; value: next_token_ids=tensor([99924], device='cuda:0') mtp accept=1 prop=99924 top1=99924 accp=1.000 next=pair draft=5968 prop=5968 pred gate=device Token # 169: 115.721ms; value: next_token_ids=tensor([5968], device='cuda:0') mtp accept=1 prop=5968 top1=5968 accp=0.987 next=draft=41354 prop=41354 olap pair=110.5ms serial=196.4ms gain=85.9ms ratio=0.44 s0=4.2ms s1=192.2ms wait=0.1/47.9ms pred gate=device Token # 170: 3.795ms; value: next_token_ids=tensor([1931], device='cuda:0') mtp accept=0 prop=41354 top1=13880 accp=0.010 next=pair draft=41354 prop=41354 pred gate=device Token # 171: 116.230ms; value: next_token_ids=tensor([10688], device='cuda:0') mtp accept=0 prop=41354 top1=10688 accp=0.108 next=draft=1237 prop=1237 olap pair=110.8ms serial=197.0ms gain=86.1ms ratio=0.44 s0=4.5ms s1=192.4ms wait=0.1/47.0ms pred gate=device Token # 172: 115.956ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=draft=48 prop=48 olap pair=110.5ms serial=196.4ms gain=85.9ms ratio=0.44 s0=4.0ms s1=192.4ms wait=0.1/48.1ms pred gate=device Token # 173: 3.844ms; value: next_token_ids=tensor([48], device='cuda:0') mtp accept=1 prop=48 top1=48 accp=0.997 next=pair draft=1457 prop=1457 pred gate=device Token # 174: 116.032ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=draft=14164 prop=14164 olap pair=110.7ms serial=196.4ms gain=85.7ms ratio=0.44 s0=4.2ms s1=192.2ms wait=0.1/47.9ms pred gate=device Token # 175: 3.799ms; value: next_token_ids=tensor([14164], device='cuda:0') mtp accept=1 prop=14164 top1=303 accp=0.938 next=pair draft=48 prop=48 pred gate=device Token # 176: 115.901ms; value: next_token_ids=tensor([7174], device='cuda:0') mtp accept=0 prop=48 top1=1909 accp=0.010 next=draft=2541 prop=2490 olap pair=110.6ms serial=195.9ms gain=85.3ms ratio=0.44 s0=4.5ms s1=191.4ms wait=0.1/47.7ms pred gate=device Token # 177: 116.870ms; value: next_token_ids=tensor([2490], device='cuda:0') mtp accept=1 prop=2490 top1=2490 accp=0.298 next=draft=69401 prop=69401 olap pair=111.4ms serial=196.5ms gain=85.1ms ratio=0.43 s0=4.8ms s1=191.7ms wait=0.1/47.1ms pred gate=device Token # 178: 3.814ms; value: next_token_ids=tensor([69401], device='cuda:0') mtp accept=1 prop=69401 top1=69401 accp=1.000 next=pair draft=2051 prop=2051 pred gate=device Token # 179: 116.884ms; value: next_token_ids=tensor([2051], device='cuda:0') mtp accept=1 prop=2051 top1=2051 accp=1.000 next=draft=9511 prop=9511 olap pair=111.5ms serial=196.2ms gain=84.7ms ratio=0.43 s0=4.8ms s1=191.5ms wait=0.1/47.3ms pred gate=device Token # 180: 3.772ms; value: next_token_ids=tensor([9511], device='cuda:0') mtp accept=1 prop=9511 top1=9511 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 181: 115.916ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.997 next=draft=1530 prop=1530 olap pair=110.5ms serial=195.8ms gain=85.3ms ratio=0.44 s0=4.3ms s1=191.5ms wait=0.1/47.7ms pred gate=device Token # 182: 3.748ms; value: next_token_ids=tensor([1530], device='cuda:0') mtp accept=1 prop=1530 top1=1530 accp=0.999 next=pair draft=76399 prop=76399 pred gate=device Token # 183: 116.541ms; value: next_token_ids=tensor([15870], device='cuda:0') mtp accept=0 prop=76399 top1=15870 accp=0.000 next=draft=3945 prop=3945 olap pair=111.2ms serial=197.7ms gain=86.5ms ratio=0.44 s0=4.5ms s1=193.2ms wait=0.1/47.0ms pred gate=device Token # 184: 115.951ms; value: next_token_ids=tensor([3945], device='cuda:0') mtp accept=1 prop=3945 top1=3945 accp=0.898 next=draft=2671 prop=2671 olap pair=110.5ms serial=196.3ms gain=85.8ms ratio=0.44 s0=4.5ms s1=191.8ms wait=0.1/46.9ms pred gate=device Token # 185: 3.869ms; value: next_token_ids=tensor([2671], device='cuda:0') mtp accept=1 prop=2671 top1=2671 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 186: 116.183ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.993 next=draft=1909 prop=1909 olap pair=110.9ms serial=197.1ms gain=86.3ms ratio=0.44 s0=4.5ms s1=192.6ms wait=0.1/46.9ms pred gate=device Token # 187: 3.776ms; value: next_token_ids=tensor([1909], device='cuda:0') mtp accept=1 prop=1909 top1=1909 accp=0.974 next=pair draft=68523 prop=68523 pred gate=device Token # 188: 115.890ms; value: next_token_ids=tensor([68523], device='cuda:0') mtp accept=1 prop=68523 top1=68523 accp=0.996 next=draft=2490 prop=2490 olap pair=110.6ms serial=196.6ms gain=86.0ms ratio=0.44 s0=4.0ms s1=192.6ms wait=0.1/48.2ms pred gate=device Token # 189: 3.744ms; value: next_token_ids=tensor([2490], device='cuda:0') mtp accept=1 prop=2490 top1=29458 accp=0.232 next=pair draft=3279 prop=2541 pred gate=device Token # 190: 115.711ms; value: next_token_ids=tensor([1275], device='cuda:0') mtp accept=0 prop=2541 top1=1275 accp=0.001 next=draft=34081 prop=34081 olap pair=110.5ms serial=196.4ms gain=86.0ms ratio=0.44 s0=3.9ms s1=192.6ms wait=0.1/48.4ms pred gate=device Token # 191: 116.533ms; value: next_token_ids=tensor([34081], device='cuda:0') mtp accept=1 prop=34081 top1=34081 accp=0.992 next=draft=4339 prop=4339 olap pair=111.1ms serial=197.5ms gain=86.4ms ratio=0.44 s0=3.9ms s1=193.5ms wait=0.1/48.3ms pred gate=device Token # 192: 3.749ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=1.000 next=pair draft=9501 prop=9501 pred gate=device Token # 193: 115.925ms; value: next_token_ids=tensor([9501], device='cuda:0') mtp accept=1 prop=9501 top1=9501 accp=1.000 next=draft=90738 prop=90738 olap pair=110.6ms serial=196.6ms gain=86.0ms ratio=0.44 s0=4.4ms s1=192.2ms wait=0.1/47.7ms pred gate=device Token # 194: 3.872ms; value: next_token_ids=tensor([90738], device='cuda:0') mtp accept=1 prop=90738 top1=90738 accp=1.000 next=pair draft=572 prop=572 pred gate=device Token # 195: 116.383ms; value: next_token_ids=tensor([572], device='cuda:0') mtp accept=1 prop=572 top1=572 accp=1.000 next=draft=303 prop=303 olap pair=111.0ms serial=197.3ms gain=86.3ms ratio=0.44 s0=5.4ms s1=191.9ms wait=0.2/46.4ms pred gate=device Token # 196: 3.796ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=1275 prop=28987 pred gate=device Token # 197: 115.779ms; value: next_token_ids=tensor([1275], device='cuda:0') mtp accept=0 prop=28987 top1=1275 accp=0.509 next=draft=28608 prop=28608 olap pair=110.4ms serial=195.9ms gain=85.4ms ratio=0.44 s0=6.1ms s1=189.8ms wait=0.2/45.4ms pred gate=device Token # 198: 116.125ms; value: next_token_ids=tensor([28608], device='cuda:0') mtp accept=1 prop=28608 top1=28608 accp=0.999 next=draft=39 prop=39 olap pair=110.6ms serial=196.6ms gain=86.0ms ratio=0.44 s0=3.8ms s1=192.8ms wait=0.1/48.4ms pred gate=device Token # 199: 3.799ms; value: next_token_ids=tensor([39], device='cuda:0') mtp accept=1 prop=39 top1=39 accp=1.000 next=pair draft=17 prop=17 pred gate=device Token # 200: 116.014ms; value: next_token_ids=tensor([17], device='cuda:0') mtp accept=1 prop=17 top1=17 accp=1.000 next=draft=6459 prop=6459 olap pair=110.7ms serial=196.9ms gain=86.2ms ratio=0.44 s0=3.8ms s1=193.0ms wait=0.1/48.4ms pred gate=device Token # 201: 3.837ms; value: next_token_ids=tensor([6459], device='cuda:0') mtp accept=1 prop=6459 top1=6459 accp=1.000 next=pair draft=48 prop=48 pred gate=device Token # 202: 118.783ms; value: next_token_ids=tensor([48], device='cuda:0') mtp accept=1 prop=48 top1=48 accp=1.000 next=draft=10756 prop=10756 olap pair=113.5ms serial=199.9ms gain=86.5ms ratio=0.43 s0=4.6ms s1=195.4ms wait=0.1/47.5ms pred gate=device Token # 203: 3.778ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=1 prop=10756 top1=10756 accp=0.999 next=pair draft=2827 prop=2827 pred gate=device Token # 204: 117.155ms; value: next_token_ids=tensor([2827], device='cuda:0') mtp accept=1 prop=2827 top1=2827 accp=1.000 next=draft=25650 prop=25650 olap pair=110.9ms serial=196.6ms gain=85.7ms ratio=0.44 s0=5.4ms s1=191.2ms wait=0.1/46.3ms pred gate=device Token # 205: 4.488ms; value: next_token_ids=tensor([25650], device='cuda:0') mtp accept=1 prop=25650 top1=25650 accp=0.959 next=pair draft=621 prop=621 pred gate=device Token # 206: 116.143ms; value: next_token_ids=tensor([621], device='cuda:0') mtp accept=1 prop=621 top1=621 accp=1.000 next=draft=48 prop=48 olap pair=110.7ms serial=196.3ms gain=85.5ms ratio=0.44 s0=5.6ms s1=190.6ms wait=0.2/46.3ms pred gate=device Token # 207: 3.814ms; value: next_token_ids=tensor([48], device='cuda:0') mtp accept=1 prop=48 top1=48 accp=0.989 next=pair draft=1457 prop=1457 pred gate=device Token # 208: 115.780ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=draft=572 prop=572 olap pair=110.4ms serial=196.1ms gain=85.7ms ratio=0.44 s0=4.2ms s1=191.9ms wait=0.1/48.1ms pred gate=device Token # 209: 3.768ms; value: next_token_ids=tensor([572], device='cuda:0') mtp accept=1 prop=572 top1=572 accp=0.998 next=pair draft=8738 prop=8738 pred gate=device Token # 210: 115.875ms; value: next_token_ids=tensor([8738], device='cuda:0') mtp accept=1 prop=8738 top1=8738 accp=0.958 next=draft=303 prop=303 olap pair=110.6ms serial=196.1ms gain=85.5ms ratio=0.44 s0=4.3ms s1=191.8ms wait=0.1/47.7ms pred gate=device Token # 211: 3.907ms; value: next_token_ids=tensor([54182], device='cuda:0') mtp accept=0 prop=303 top1=54182 accp=0.019 next=pair draft=27 prop=27 pred gate=device Token # 212: 115.926ms; value: next_token_ids=tensor([27], device='cuda:0') mtp accept=1 prop=27 top1=27 accp=1.000 next=draft=489 prop=489 olap pair=110.6ms serial=196.5ms gain=85.9ms ratio=0.44 s0=4.0ms s1=192.5ms wait=0.1/48.0ms pred gate=device Token # 213: 3.812ms; value: next_token_ids=tensor([489], device='cuda:0') mtp accept=1 prop=489 top1=489 accp=0.999 next=pair draft=4846 prop=4846 pred gate=device Token # 214: 115.996ms; value: next_token_ids=tensor([4846], device='cuda:0') mtp accept=1 prop=4846 top1=4846 accp=1.000 next=draft=56 prop=56 olap pair=110.7ms serial=196.3ms gain=85.6ms ratio=0.44 s0=4.2ms s1=192.1ms wait=0.1/47.8ms pred gate=device Token # 215: 3.772ms; value: next_token_ids=tensor([56], device='cuda:0') mtp accept=1 prop=56 top1=56 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 216: 116.144ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=121994 prop=24935 olap pair=110.8ms serial=196.3ms gain=85.5ms ratio=0.44 s0=4.4ms s1=191.9ms wait=0.1/47.5ms pred gate=device Token # 217: 3.760ms; value: next_token_ids=tensor([24935], device='cuda:0') mtp accept=1 prop=24935 top1=24935 accp=0.225 next=pair draft=9422 prop=10861 pred gate=device Token # 218: 116.802ms; value: next_token_ids=tensor([35881], device='cuda:0') mtp accept=0 prop=10861 top1=6856 accp=0.069 next=draft=90974 prop=90974 olap pair=110.9ms serial=194.8ms gain=83.9ms ratio=0.43 s0=8.6ms s1=186.2ms wait=0.2/42.6ms pred gate=device Token # 219: 116.813ms; value: next_token_ids=tensor([10861], device='cuda:0') mtp accept=0 prop=90974 top1=10861 accp=0.038 next=draft=478 prop=478 olap pair=111.3ms serial=198.1ms gain=86.8ms ratio=0.44 s0=3.8ms s1=194.3ms wait=0.1/48.5ms pred gate=device Token # 220: 116.509ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=draft=39932 prop=39932 olap pair=110.9ms serial=196.4ms gain=85.5ms ratio=0.44 s0=7.3ms s1=189.1ms wait=0.2/44.3ms pred gate=device Token # 221: 3.730ms; value: next_token_ids=tensor([7511], device='cuda:0') mtp accept=0 prop=39932 top1=7511 accp=0.327 next=pair draft=985 prop=985 pred gate=device Token # 222: 116.534ms; value: next_token_ids=tensor([985], device='cuda:0') mtp accept=1 prop=985 top1=985 accp=0.978 next=draft=768 prop=768 olap pair=111.3ms serial=197.5ms gain=86.2ms ratio=0.44 s0=4.1ms s1=193.5ms wait=0.1/48.3ms pred gate=device Token # 223: 3.749ms; value: next_token_ids=tensor([7524], device='cuda:0') mtp accept=0 prop=768 top1=7524 accp=0.274 next=pair draft=15 prop=15 pred gate=device Token # 224: 116.426ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=0.995 next=draft=223 prop=223 olap pair=110.7ms serial=196.3ms gain=85.6ms ratio=0.44 s0=5.8ms s1=190.5ms wait=0.2/46.1ms pred gate=device Token # 225: 3.765ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=0.808 next=pair draft=28057 prop=8842 pred gate=device Token # 226: 116.019ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=0.478 next=draft=768 prop=768 olap pair=110.8ms serial=196.8ms gain=86.1ms ratio=0.44 s0=4.5ms s1=192.3ms wait=0.1/47.5ms pred gate=device Token # 227: 3.778ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=0 prop=768 top1=389 accp=0.302 next=pair draft=53091 prop=53091 pred gate=device Token # 228: 115.930ms; value: next_token_ids=tensor([53091], device='cuda:0') mtp accept=1 prop=53091 top1=53091 accp=1.000 next=draft=4374 prop=4374 olap pair=110.6ms serial=195.9ms gain=85.3ms ratio=0.44 s0=4.0ms s1=191.9ms wait=0.1/48.3ms pred gate=device Token # 229: 3.822ms; value: next_token_ids=tensor([4374], device='cuda:0') mtp accept=1 prop=4374 top1=4374 accp=1.000 next=pair draft=1465 prop=1465 pred gate=device Token # 230: 115.936ms; value: next_token_ids=tensor([1465], device='cuda:0') mtp accept=1 prop=1465 top1=1465 accp=1.000 next=draft=13582 prop=13582 olap pair=110.6ms serial=196.2ms gain=85.6ms ratio=0.44 s0=4.8ms s1=191.3ms wait=0.1/47.2ms pred gate=device Token # 231: 3.851ms; value: next_token_ids=tensor([13582], device='cuda:0') mtp accept=1 prop=13582 top1=13582 accp=1.000 next=pair draft=21 prop=21 pred gate=device Token # 232: 116.138ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=draft=16 prop=16 olap pair=110.9ms serial=197.2ms gain=86.3ms ratio=0.44 s0=3.9ms s1=193.3ms wait=0.1/48.3ms pred gate=device Token # 233: 3.884ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=pair draft=20 prop=20 pred gate=device Token # 234: 115.576ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=draft=303 prop=303 olap pair=110.3ms serial=195.9ms gain=85.6ms ratio=0.44 s0=4.0ms s1=191.9ms wait=0.1/48.3ms pred gate=device Token # 235: 3.759ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=450 prop=450 pred gate=device Token # 236: 115.859ms; value: next_token_ids=tensor([450], device='cuda:0') mtp accept=1 prop=450 top1=450 accp=0.488 next=draft=28769 prop=28769 olap pair=110.5ms serial=196.3ms gain=85.8ms ratio=0.44 s0=4.0ms s1=192.3ms wait=0.1/48.2ms pred gate=device Token # 237: 3.763ms; value: next_token_ids=tensor([28769], device='cuda:0') mtp accept=1 prop=28769 top1=28769 accp=1.000 next=pair draft=36 prop=36 pred gate=device Token # 238: 115.877ms; value: next_token_ids=tensor([36], device='cuda:0') mtp accept=1 prop=36 top1=36 accp=1.000 next=draft=10602 prop=10602 olap pair=110.5ms serial=196.3ms gain=85.8ms ratio=0.44 s0=3.9ms s1=192.4ms wait=0.1/48.3ms pred gate=device Token # 239: 3.785ms; value: next_token_ids=tensor([10602], device='cuda:0') mtp accept=1 prop=10602 top1=10602 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 240: 115.720ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.999 next=draft=6715 prop=6715 olap pair=110.4ms serial=196.2ms gain=85.8ms ratio=0.44 s0=4.0ms s1=192.2ms wait=0.1/48.1ms pred gate=device Token # 241: 3.767ms; value: next_token_ids=tensor([6715], device='cuda:0') mtp accept=1 prop=6715 top1=6715 accp=0.902 next=pair draft=28608 prop=28608 pred gate=device Token # 242: 115.794ms; value: next_token_ids=tensor([14769], device='cuda:0') mtp accept=0 prop=28608 top1=14769 accp=0.039 next=draft=10756 prop=10756 olap pair=110.5ms serial=196.4ms gain=85.9ms ratio=0.44 s0=3.9ms s1=192.5ms wait=0.1/48.4ms pred gate=device Token # 243: 116.473ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=1 prop=10756 top1=10756 accp=1.000 next=draft=1237 prop=1237 olap pair=111.1ms serial=197.6ms gain=86.5ms ratio=0.44 s0=4.1ms s1=193.5ms wait=0.1/47.9ms pred gate=device Token # 244: 3.793ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=0 prop=1237 top1=8842 accp=0.024 next=pair draft=1237 prop=1237 pred gate=device Token # 245: 116.493ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=draft=28608 prop=28608 olap pair=111.1ms serial=197.6ms gain=86.5ms ratio=0.44 s0=4.4ms s1=193.1ms wait=0.1/47.1ms pred gate=device Token # 246: 3.781ms; value: next_token_ids=tensor([28608], device='cuda:0') mtp accept=1 prop=28608 top1=28608 accp=1.000 next=pair draft=39 prop=39 pred gate=device Token # 247: 116.337ms; value: next_token_ids=tensor([39], device='cuda:0') mtp accept=1 prop=39 top1=39 accp=1.000 next=draft=14164 prop=14164 olap pair=111.1ms serial=197.6ms gain=86.6ms ratio=0.44 s0=4.3ms s1=193.4ms wait=0.1/47.6ms pred gate=device Token # 248: 3.761ms; value: next_token_ids=tensor([14164], device='cuda:0') mtp accept=1 prop=14164 top1=14164 accp=0.990 next=pair draft=28608 prop=28608 pred gate=device Token # 249: 115.788ms; value: next_token_ids=tensor([28608], device='cuda:0') mtp accept=1 prop=28608 top1=28608 accp=0.835 next=draft=39 prop=39 olap pair=110.6ms serial=196.6ms gain=86.1ms ratio=0.44 s0=4.1ms s1=192.5ms wait=0.1/47.8ms pred gate=device Token # 250: 3.765ms; value: next_token_ids=tensor([39], device='cuda:0') mtp accept=1 prop=39 top1=39 accp=1.000 next=pair draft=8842 prop=8842 pred gate=device Token # 251: 116.217ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=1.000 next=draft=109602 prop=109602 olap pair=110.9ms serial=195.9ms gain=84.9ms ratio=0.43 s0=4.4ms s1=191.5ms wait=0.1/47.5ms pred gate=device Token # 252: 3.729ms; value: next_token_ids=tensor([4602], device='cuda:0') mtp accept=0 prop=109602 top1=4602 accp=0.009 next=pair draft=3343 prop=41992 pred gate=device Token # 253: 116.653ms; value: next_token_ids=tensor([23945], device='cuda:0') mtp accept=0 prop=41992 top1=23945 accp=0.061 next=draft=2827 prop=2827 olap pair=111.3ms serial=197.1ms gain=85.8ms ratio=0.44 s0=4.3ms s1=192.8ms wait=0.1/47.7ms pred gate=device Token # 254: 116.294ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=0 prop=2827 top1=10756 accp=0.419 next=draft=445 prop=445 olap pair=110.9ms serial=196.1ms gain=85.2ms ratio=0.43 s0=6.3ms s1=189.8ms wait=0.2/45.4ms pred gate=device Token # 255: 115.967ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=1 prop=445 top1=445 accp=0.988 next=draft=36101 prop=36101 olap pair=110.6ms serial=196.6ms gain=85.9ms ratio=0.44 s0=4.0ms s1=192.6ms wait=0.1/48.2ms pred gate=device Token # 256: 3.747ms; value: next_token_ids=tensor([1833], device='cuda:0') mtp accept=0 prop=36101 top1=12718 accp=0.054 next=pair draft=2827 prop=2827 pred gate=device Token # 257: 116.299ms; value: next_token_ids=tensor([2827], device='cuda:0') mtp accept=1 prop=2827 top1=2827 accp=0.891 next=draft=1415 prop=1415 olap pair=111.0ms serial=196.9ms gain=85.9ms ratio=0.44 s0=4.4ms s1=192.6ms wait=0.1/47.3ms pred gate=device Token # 258: 3.842ms; value: next_token_ids=tensor([1415], device='cuda:0') mtp accept=1 prop=1415 top1=1415 accp=0.747 next=pair draft=34864 prop=34864 pred gate=device Token # 259: 115.921ms; value: next_token_ids=tensor([34864], device='cuda:0') mtp accept=1 prop=34864 top1=34864 accp=1.000 next=draft=320 prop=320 olap pair=110.6ms serial=196.6ms gain=85.9ms ratio=0.44 s0=4.0ms s1=192.6ms wait=0.1/48.2ms pred gate=device Token # 260: 3.819ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=320 top1=303 accp=0.024 next=pair draft=4009 prop=4009 pred gate=device Token # 261: 116.131ms; value: next_token_ids=tensor([4009], device='cuda:0') mtp accept=1 prop=4009 top1=4009 accp=0.940 next=draft=2431 prop=2431 olap pair=110.8ms serial=196.8ms gain=86.0ms ratio=0.44 s0=4.1ms s1=192.7ms wait=0.1/48.0ms pred gate=device Token # 262: 3.767ms; value: next_token_ids=tensor([2431], device='cuda:0') mtp accept=1 prop=2431 top1=4339 accp=0.360 next=pair draft=33298 prop=10447 pred gate=device Token # 263: 116.176ms; value: next_token_ids=tensor([10447], device='cuda:0') mtp accept=1 prop=10447 top1=10447 accp=0.235 next=draft=4339 prop=1275 olap pair=110.9ms serial=196.7ms gain=85.8ms ratio=0.44 s0=4.0ms s1=192.7ms wait=0.1/48.2ms pred gate=device Token # 264: 3.740ms; value: next_token_ids=tensor([1275], device='cuda:0') mtp accept=1 prop=1275 top1=1275 accp=0.219 next=pair draft=4339 prop=4339 pred gate=device Token # 265: 115.861ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=0 prop=4339 top1=7557 accp=0.000 next=draft=10780 prop=10780 olap pair=110.6ms serial=196.7ms gain=86.1ms ratio=0.44 s0=4.0ms s1=192.7ms wait=0.1/48.2ms pred gate=device Token # 266: 116.346ms; value: next_token_ids=tensor([25650], device='cuda:0') mtp accept=0 prop=10780 top1=10780 accp=0.800 next=draft=621 prop=621 olap pair=111.0ms serial=197.0ms gain=86.0ms ratio=0.44 s0=4.5ms s1=192.5ms wait=0.1/47.2ms pred gate=device Token # 267: 116.426ms; value: next_token_ids=tensor([621], device='cuda:0') mtp accept=1 prop=621 top1=621 accp=1.000 next=draft=7557 prop=7557 olap pair=111.0ms serial=196.6ms gain=85.7ms ratio=0.44 s0=4.0ms s1=192.6ms wait=0.1/48.3ms pred gate=device Token # 268: 3.752ms; value: next_token_ids=tensor([13097], device='cuda:0') mtp accept=0 prop=7557 top1=13097 accp=0.096 next=pair draft=6034 prop=6034 pred gate=device Token # 269: 115.957ms; value: next_token_ids=tensor([6034], device='cuda:0') mtp accept=1 prop=6034 top1=6034 accp=0.967 next=draft=572 prop=572 olap pair=110.5ms serial=195.5ms gain=85.0ms ratio=0.43 s0=6.8ms s1=188.6ms wait=0.2/44.9ms pred gate=device Token # 270: 3.780ms; value: next_token_ids=tensor([572], device='cuda:0') mtp accept=1 prop=572 top1=572 accp=1.000 next=pair draft=876 prop=876 pred gate=device Token # 271: 116.455ms; value: next_token_ids=tensor([876], device='cuda:0') mtp accept=1 prop=876 top1=876 accp=0.999 next=draft=15 prop=15 olap pair=111.1ms serial=197.7ms gain=86.6ms ratio=0.44 s0=3.8ms s1=193.9ms wait=0.1/48.4ms pred gate=device Token # 272: 3.721ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 273: 116.384ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=1909 prop=1909 olap pair=111.1ms serial=196.2ms gain=85.1ms ratio=0.43 s0=4.2ms s1=192.0ms wait=0.1/48.0ms pred gate=device Token # 274: 3.704ms; value: next_token_ids=tensor([1909], device='cuda:0') mtp accept=1 prop=1909 top1=1909 accp=1.000 next=pair draft=2541 prop=2541 pred gate=device Token # 275: 116.143ms; value: next_token_ids=tensor([44596], device='cuda:0') mtp accept=0 prop=2541 top1=44596 accp=0.152 next=draft=14475 prop=14475 olap pair=110.9ms serial=196.6ms gain=85.7ms ratio=0.44 s0=6.0ms s1=190.6ms wait=0.2/45.8ms pred gate=device Token # 276: 116.860ms; value: next_token_ids=tensor([2111], device='cuda:0') mtp accept=0 prop=14475 top1=2111 accp=0.348 next=draft=2426 prop=2426 olap pair=111.5ms serial=196.7ms gain=85.1ms ratio=0.43 s0=4.3ms s1=192.3ms wait=0.1/47.9ms pred gate=device Token # 277: 116.441ms; value: next_token_ids=tensor([2426], device='cuda:0') mtp accept=1 prop=2426 top1=2426 accp=0.908 next=draft=48 prop=48 olap pair=111.0ms serial=196.5ms gain=85.5ms ratio=0.44 s0=4.1ms s1=192.4ms wait=0.1/48.2ms pred gate=device Token # 278: 3.823ms; value: next_token_ids=tensor([48], device='cuda:0') mtp accept=1 prop=48 top1=48 accp=1.000 next=pair draft=1457 prop=1457 pred gate=device Token # 279: 116.143ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=draft=303 prop=303 olap pair=110.8ms serial=197.0ms gain=86.2ms ratio=0.44 s0=3.9ms s1=193.2ms wait=0.1/48.4ms pred gate=device Token # 280: 3.737ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=1833 prop=1833 pred gate=device Token # 281: 116.767ms; value: next_token_ids=tensor([1833], device='cuda:0') mtp accept=1 prop=1833 top1=1833 accp=1.000 next=draft=2426 prop=2426 olap pair=111.4ms serial=197.7ms gain=86.3ms ratio=0.44 s0=6.4ms s1=191.3ms wait=0.2/45.4ms pred gate=device Token # 282: 3.719ms; value: next_token_ids=tensor([2426], device='cuda:0') mtp accept=1 prop=2426 top1=2426 accp=1.000 next=pair draft=450 prop=450 pred gate=device Token # 283: 117.952ms; value: next_token_ids=tensor([450], device='cuda:0') mtp accept=1 prop=450 top1=450 accp=0.968 next=draft=22 prop=22 olap pair=111.8ms serial=198.4ms gain=86.5ms ratio=0.44 s0=5.7ms s1=192.6ms wait=0.2/46.2ms pred gate=device Token # 284: 4.760ms; value: next_token_ids=tensor([22], device='cuda:0') mtp accept=1 prop=22 top1=22 accp=1.000 next=pair draft=558 prop=558 pred gate=device Token # 285: 116.101ms; value: next_token_ids=tensor([558], device='cuda:0') mtp accept=1 prop=558 top1=558 accp=0.819 next=draft=39 prop=39 olap pair=110.7ms serial=196.7ms gain=86.0ms ratio=0.44 s0=4.4ms s1=192.3ms wait=0.1/47.6ms pred gate=device Token # 286: 3.735ms; value: next_token_ids=tensor([39], device='cuda:0') mtp accept=1 prop=39 top1=39 accp=1.000 next=pair draft=59823 prop=59823 pred gate=device Token # 287: 116.388ms; value: next_token_ids=tensor([59823], device='cuda:0') mtp accept=1 prop=59823 top1=59823 accp=0.691 next=draft=1237 prop=1237 olap pair=111.1ms serial=197.5ms gain=86.4ms ratio=0.44 s0=4.0ms s1=193.5ms wait=0.1/48.1ms pred gate=device Token # 288: 3.699ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=0.999 next=pair draft=35 prop=35 pred gate=device Token # 289: 116.046ms; value: next_token_ids=tensor([45407], device='cuda:0') mtp accept=0 prop=35 top1=45407 accp=0.146 next=draft=4607 prop=4607 olap pair=110.7ms serial=196.7ms gain=85.9ms ratio=0.44 s0=4.2ms s1=192.4ms wait=0.1/47.9ms pred gate=device Token # 290: 117.142ms; value: next_token_ids=tensor([4607], device='cuda:0') mtp accept=1 prop=4607 top1=4607 accp=1.000 next=draft=1039 prop=1039 olap pair=111.7ms serial=197.6ms gain=85.9ms ratio=0.43 s0=4.6ms s1=193.0ms wait=0.1/47.2ms pred gate=device Token # 291: 3.748ms; value: next_token_ids=tensor([1039], device='cuda:0') mtp accept=1 prop=1039 top1=1039 accp=1.000 next=pair draft=7417 prop=7417 pred gate=device Token # 292: 115.879ms; value: next_token_ids=tensor([7417], device='cuda:0') mtp accept=1 prop=7417 top1=7417 accp=0.995 next=draft=2111 prop=2111 olap pair=110.5ms serial=196.5ms gain=86.0ms ratio=0.44 s0=3.9ms s1=192.6ms wait=0.1/48.2ms pred gate=device Token # 293: 3.760ms; value: next_token_ids=tensor([24], device='cuda:0') mtp accept=0 prop=2111 top1=24 accp=0.003 next=pair draft=57 prop=57 pred gate=device Token # 294: 116.915ms; value: next_token_ids=tensor([57], device='cuda:0') mtp accept=1 prop=57 top1=57 accp=1.000 next=draft=330 prop=330 olap pair=111.4ms serial=196.9ms gain=85.5ms ratio=0.43 s0=4.5ms s1=192.4ms wait=0.1/47.5ms pred gate=device Token # 295: 3.788ms; value: next_token_ids=tensor([330], device='cuda:0') mtp accept=1 prop=330 top1=330 accp=1.000 next=pair draft=9422 prop=9422 pred gate=device Token # 296: 116.374ms; value: next_token_ids=tensor([9422], device='cuda:0') mtp accept=1 prop=9422 top1=9422 accp=1.000 next=draft=303 prop=303 olap pair=111.1ms serial=196.6ms gain=85.5ms ratio=0.43 s0=4.7ms s1=191.9ms wait=0.1/47.0ms pred gate=device Token # 297: 3.820ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=1644 prop=1644 pred gate=device Token # 298: 116.733ms; value: next_token_ids=tensor([1644], device='cuda:0') mtp accept=1 prop=1644 top1=1644 accp=1.000 next=draft=17309 prop=17309 olap pair=111.4ms serial=198.0ms gain=86.6ms ratio=0.44 s0=4.5ms s1=193.5ms wait=0.1/46.9ms pred gate=device Token # 299: 3.752ms; value: next_token_ids=tensor([17309], device='cuda:0') mtp accept=1 prop=17309 top1=17309 accp=1.000 next=pair draft=26127 prop=26127 pred gate=device Token # 300: 116.192ms; value: next_token_ids=tensor([26127], device='cuda:0') mtp accept=1 prop=26127 top1=26127 accp=1.000 next=draft=303 prop=303 olap pair=110.8ms serial=197.0ms gain=86.2ms ratio=0.44 s0=3.9ms s1=193.0ms wait=0.1/48.2ms pred gate=device Token # 301: 3.753ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=4602 prop=4602 pred gate=device Token # 302: 117.289ms; value: next_token_ids=tensor([4602], device='cuda:0') mtp accept=1 prop=4602 top1=4602 accp=0.997 next=draft=12875 prop=12875 olap pair=111.1ms serial=196.2ms gain=85.2ms ratio=0.43 s0=5.4ms s1=190.8ms wait=0.1/46.5ms pred gate=device Token # 303: 4.326ms; value: next_token_ids=tensor([12875], device='cuda:0') mtp accept=1 prop=12875 top1=12875 accp=1.000 next=pair draft=58 prop=58 pred gate=device Token # 304: 116.541ms; value: next_token_ids=tensor([58], device='cuda:0') mtp accept=1 prop=58 top1=58 accp=1.000 next=draft=20 prop=20 olap pair=111.3ms serial=195.7ms gain=84.5ms ratio=0.43 s0=4.3ms s1=191.5ms wait=0.1/48.0ms pred gate=device Token # 305: 3.753ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 306: 116.859ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.998 next=draft=2920 prop=2920 olap pair=111.5ms serial=198.2ms gain=86.6ms ratio=0.44 s0=3.9ms s1=194.2ms wait=0.1/48.2ms pred gate=device Token # 307: 3.727ms; value: next_token_ids=tensor([2920], device='cuda:0') mtp accept=1 prop=2920 top1=2920 accp=0.980 next=pair draft=389 prop=389 pred gate=device Token # 308: 116.352ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=0.950 next=draft=2467 prop=2467 olap pair=111.1ms serial=197.0ms gain=86.0ms ratio=0.44 s0=5.3ms s1=191.8ms wait=0.1/46.5ms pred gate=device Token # 309: 3.732ms; value: next_token_ids=tensor([2467], device='cuda:0') mtp accept=1 prop=2467 top1=2467 accp=1.000 next=pair draft=99924 prop=99924 pred gate=device Token # 310: 116.262ms; value: next_token_ids=tensor([99924], device='cuda:0') mtp accept=1 prop=99924 top1=99924 accp=1.000 next=draft=301 prop=41354 olap pair=110.9ms serial=197.2ms gain=86.3ms ratio=0.44 s0=4.5ms s1=192.7ms wait=0.1/47.2ms pred gate=device Token # 311: 3.747ms; value: next_token_ids=tensor([410], device='cuda:0') mtp accept=0 prop=41354 top1=410 accp=0.003 next=pair draft=2467 prop=2467 pred gate=device Token # 312: 116.338ms; value: next_token_ids=tensor([2467], device='cuda:0') mtp accept=1 prop=2467 top1=10861 accp=0.216 next=draft=10861 prop=10861 olap pair=111.0ms serial=197.3ms gain=86.3ms ratio=0.44 s0=4.5ms s1=192.8ms wait=0.1/47.0ms pred gate=device Token # 313: 3.786ms; value: next_token_ids=tensor([10861], device='cuda:0') mtp accept=1 prop=10861 top1=10861 accp=1.000 next=pair draft=301 prop=301 pred gate=device Token # 314: 116.271ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=1.000 next=draft=41354 prop=41354 olap pair=110.9ms serial=197.2ms gain=86.3ms ratio=0.44 s0=4.5ms s1=192.7ms wait=0.1/47.1ms pred gate=device Token # 315: 3.753ms; value: next_token_ids=tensor([41354], device='cuda:0') mtp accept=1 prop=41354 top1=41354 accp=1.000 next=pair draft=876 prop=876 pred gate=device Token # 316: 116.698ms; value: next_token_ids=tensor([876], device='cuda:0') mtp accept=1 prop=876 top1=876 accp=1.000 next=draft=15 prop=15 olap pair=111.4ms serial=198.1ms gain=86.7ms ratio=0.44 s0=4.5ms s1=193.7ms wait=0.1/47.1ms pred gate=device Token # 317: 3.713ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 318: 117.321ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=1909 prop=1909 olap pair=112.0ms serial=198.1ms gain=86.1ms ratio=0.43 s0=4.1ms s1=194.0ms wait=0.1/48.2ms pred gate=device Token # 319: 3.731ms; value: next_token_ids=tensor([1909], device='cuda:0') mtp accept=1 prop=1909 top1=1909 accp=0.844 next=pair draft=2541 prop=2541 pred gate=device Token # 320: 116.789ms; value: next_token_ids=tensor([44596], device='cuda:0') mtp accept=0 prop=2541 top1=44596 accp=0.500 next=draft=54182 prop=54182 olap pair=111.5ms serial=198.3ms gain=86.8ms ratio=0.44 s0=4.4ms s1=194.0ms wait=0.1/47.4ms pred gate=device Token # 321: 116.931ms; value: next_token_ids=tensor([9422], device='cuda:0') mtp accept=0 prop=54182 top1=9422 accp=0.016 next=draft=67919 prop=67919 olap pair=111.5ms serial=198.2ms gain=86.7ms ratio=0.44 s0=4.4ms s1=193.8ms wait=0.1/47.1ms pred gate=device Token # 322: 115.695ms; value: next_token_ids=tensor([67919], device='cuda:0') mtp accept=1 prop=67919 top1=67919 accp=1.000 next=draft=1824 prop=1824 olap pair=110.3ms serial=196.1ms gain=85.7ms ratio=0.44 s0=4.0ms s1=192.1ms wait=0.1/48.1ms pred gate=device Token # 323: 3.772ms; value: next_token_ids=tensor([1824], device='cuda:0') mtp accept=1 prop=1824 top1=1824 accp=0.988 next=pair draft=6640 prop=6640 pred gate=device Token # 324: 116.872ms; value: next_token_ids=tensor([6640], device='cuda:0') mtp accept=1 prop=6640 top1=6640 accp=0.990 next=draft=2440 prop=2440 olap pair=111.5ms serial=198.3ms gain=86.7ms ratio=0.44 s0=4.4ms s1=193.8ms wait=0.1/47.0ms pred gate=device Token # 325: 3.793ms; value: next_token_ids=tensor([2440], device='cuda:0') mtp accept=1 prop=2440 top1=2440 accp=1.000 next=pair draft=5555 prop=5555 pred gate=device Token # 326: 116.133ms; value: next_token_ids=tensor([5555], device='cuda:0') mtp accept=1 prop=5555 top1=5555 accp=1.000 next=draft=16303 prop=16303 olap pair=110.8ms serial=197.0ms gain=86.3ms ratio=0.44 s0=3.9ms s1=193.1ms wait=0.1/48.2ms pred gate=device Token # 327: 3.758ms; value: next_token_ids=tensor([16303], device='cuda:0') mtp accept=1 prop=16303 top1=16303 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 328: 116.199ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=3515 prop=3515 olap pair=110.9ms serial=197.3ms gain=86.4ms ratio=0.44 s0=3.9ms s1=193.4ms wait=0.1/48.4ms pred gate=device Token # 329: 3.739ms; value: next_token_ids=tensor([3515], device='cuda:0') mtp accept=1 prop=3515 top1=3515 accp=1.000 next=pair draft=1057 prop=1057 pred gate=device Token # 330: 115.754ms; value: next_token_ids=tensor([1057], device='cuda:0') mtp accept=1 prop=1057 top1=1057 accp=0.994 next=draft=50986 prop=50986 olap pair=110.4ms serial=195.6ms gain=85.1ms ratio=0.44 s0=4.3ms s1=191.3ms wait=0.1/47.7ms pred gate=device Token # 331: 3.743ms; value: next_token_ids=tensor([50986], device='cuda:0') mtp accept=1 prop=50986 top1=50986 accp=0.678 next=pair draft=301 prop=301 pred gate=device Token # 332: 115.965ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=0.995 next=draft=109792 prop=109792 olap pair=110.6ms serial=196.6ms gain=85.9ms ratio=0.44 s0=4.4ms s1=192.1ms wait=0.1/46.9ms pred gate=device Token # 333: 3.725ms; value: next_token_ids=tensor([114131], device='cuda:0') mtp accept=0 prop=109792 top1=114131 accp=0.167 next=pair draft=48084 prop=48084 pred gate=device Token # 334: 115.931ms; value: next_token_ids=tensor([77196], device='cuda:0') mtp accept=0 prop=48084 top1=77196 accp=0.048 next=draft=6221 prop=6221 olap pair=110.7ms serial=196.5ms gain=85.8ms ratio=0.44 s0=4.4ms s1=192.1ms wait=0.1/47.0ms pred gate=device Token # 335: 115.974ms; value: next_token_ids=tensor([13502], device='cuda:0') mtp accept=0 prop=6221 top1=13502 accp=0.212 next=draft=89228 prop=89228 olap pair=110.5ms serial=196.5ms gain=85.9ms ratio=0.44 s0=4.4ms s1=192.1ms wait=0.1/47.0ms pred gate=device Token # 336: 115.942ms; value: next_token_ids=tensor([89228], device='cuda:0') mtp accept=1 prop=89228 top1=89228 accp=0.986 next=draft=1237 prop=1237 olap pair=110.6ms serial=196.4ms gain=85.8ms ratio=0.44 s0=4.4ms s1=192.0ms wait=0.1/47.0ms pred gate=device Token # 337: 3.798ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=109792 prop=109792 pred gate=device Token # 338: 117.083ms; value: next_token_ids=tensor([109792], device='cuda:0') mtp accept=1 prop=109792 top1=109792 accp=1.000 next=draft=38 prop=38 olap pair=111.7ms serial=198.7ms gain=87.0ms ratio=0.44 s0=4.6ms s1=194.1ms wait=0.1/47.1ms pred gate=device Token # 339: 3.804ms; value: next_token_ids=tensor([38], device='cuda:0') mtp accept=1 prop=38 top1=38 accp=1.000 next=pair draft=3951 prop=3951 pred gate=device Token # 340: 116.719ms; value: next_token_ids=tensor([3951], device='cuda:0') mtp accept=1 prop=3951 top1=3951 accp=1.000 next=draft=15 prop=15 olap pair=111.4ms serial=198.0ms gain=86.6ms ratio=0.44 s0=4.6ms s1=193.4ms wait=0.1/46.9ms pred gate=device Token # 341: 3.737ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=pair draft=437 prop=20332 pred gate=device Token # 342: 116.549ms; value: next_token_ids=tensor([4687], device='cuda:0') mtp accept=0 prop=20332 top1=4687 accp=0.000 next=draft=30904 prop=30904 olap pair=111.3ms serial=197.9ms gain=86.6ms ratio=0.44 s0=4.5ms s1=193.3ms wait=0.1/47.0ms pred gate=device Token # 343: 116.366ms; value: next_token_ids=tensor([30904], device='cuda:0') mtp accept=1 prop=30904 top1=30904 accp=0.983 next=draft=15 prop=15 olap pair=111.0ms serial=197.4ms gain=86.4ms ratio=0.44 s0=4.5ms s1=192.9ms wait=0.1/47.0ms pred gate=device Token # 344: 3.716ms; value: next_token_ids=tensor([15], device='cuda:0') mtp accept=1 prop=15 top1=15 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 345: 115.955ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=1909 prop=1909 olap pair=110.6ms serial=196.7ms gain=86.1ms ratio=0.44 s0=4.4ms s1=192.3ms wait=0.1/47.1ms pred gate=device Token # 346: 3.761ms; value: next_token_ids=tensor([7018], device='cuda:0') mtp accept=0 prop=1909 top1=7018 accp=0.208 next=pair draft=17488 prop=86260 pred gate=device Token # 347: 116.120ms; value: next_token_ids=tensor([429], device='cuda:0') mtp accept=0 prop=86260 top1=429 accp=0.237 next=draft=49838 prop=49838 olap pair=110.9ms serial=197.2ms gain=86.3ms ratio=0.44 s0=4.4ms s1=192.8ms wait=0.1/47.0ms pred gate=device Token # 348: 116.238ms; value: next_token_ids=tensor([856], device='cuda:0') mtp accept=0 prop=49838 top1=856 accp=0.000 next=draft=16 prop=16 olap pair=110.9ms serial=197.2ms gain=86.3ms ratio=0.44 s0=4.5ms s1=192.7ms wait=0.1/46.9ms pred gate=device Token # 349: 116.638ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=14882 prop=14882 olap pair=111.3ms serial=198.0ms gain=86.7ms ratio=0.44 s0=4.4ms s1=193.6ms wait=0.1/47.2ms pred gate=device Token # 350: 3.778ms; value: next_token_ids=tensor([14882], device='cuda:0') mtp accept=1 prop=14882 top1=14882 accp=0.999 next=pair draft=28828 prop=28828 pred gate=device Token # 351: 115.762ms; value: next_token_ids=tensor([28828], device='cuda:0') mtp accept=1 prop=28828 top1=28828 accp=1.000 next=draft=2283 prop=2283 olap pair=110.4ms serial=196.3ms gain=85.8ms ratio=0.44 s0=4.4ms s1=191.9ms wait=0.1/47.1ms pred gate=device Token # 352: 3.775ms; value: next_token_ids=tensor([2283], device='cuda:0') mtp accept=1 prop=2283 top1=2283 accp=1.000 next=pair draft=88170 prop=88170 pred gate=device Token # 353: 117.027ms; value: next_token_ids=tensor([88170], device='cuda:0') mtp accept=1 prop=88170 top1=88170 accp=0.994 next=draft=4891 prop=4891 olap pair=111.6ms serial=198.5ms gain=86.9ms ratio=0.44 s0=4.5ms s1=194.0ms wait=0.1/46.9ms pred gate=device Token # 354: 3.758ms; value: next_token_ids=tensor([4891], device='cuda:0') mtp accept=1 prop=4891 top1=4891 accp=1.000 next=pair draft=90974 prop=90974 pred gate=device Token # 355: 116.039ms; value: next_token_ids=tensor([90974], device='cuda:0') mtp accept=1 prop=90974 top1=90974 accp=1.000 next=draft=303 prop=303 olap pair=110.7ms serial=196.8ms gain=86.0ms ratio=0.44 s0=4.5ms s1=192.3ms wait=0.1/47.0ms pred gate=device Token # 356: 3.800ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.770 next=pair draft=17136 prop=17136 pred gate=device Token # 357: 116.220ms; value: next_token_ids=tensor([17136], device='cuda:0') mtp accept=1 prop=17136 top1=17136 accp=0.778 next=draft=14811 prop=14811 olap pair=110.9ms serial=197.1ms gain=86.2ms ratio=0.44 s0=4.4ms s1=192.7ms wait=0.1/47.1ms pred gate=device Token # 358: 3.814ms; value: next_token_ids=tensor([14811], device='cuda:0') mtp accept=1 prop=14811 top1=14811 accp=1.000 next=pair draft=545 prop=545 pred gate=device Token # 359: 115.809ms; value: next_token_ids=tensor([545], device='cuda:0') mtp accept=1 prop=545 top1=545 accp=0.994 next=draft=18 prop=18 olap pair=110.5ms serial=196.4ms gain=85.9ms ratio=0.44 s0=4.4ms s1=192.0ms wait=0.1/47.1ms pred gate=device Token # 360: 3.775ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 361: 116.247ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=27521 prop=27521 olap pair=111.0ms serial=196.6ms gain=85.6ms ratio=0.44 s0=4.6ms s1=192.0ms wait=0.1/47.0ms pred gate=device Token # 362: 3.769ms; value: next_token_ids=tensor([27521], device='cuda:0') mtp accept=1 prop=27521 top1=27521 accp=1.000 next=pair draft=11274 prop=11274 pred gate=device Token # 363: 116.637ms; value: next_token_ids=tensor([11274], device='cuda:0') mtp accept=1 prop=11274 top1=11274 accp=1.000 next=draft=303 prop=303 olap pair=111.3ms serial=197.9ms gain=86.6ms ratio=0.44 s0=4.4ms s1=193.5ms wait=0.1/47.2ms pred gate=device Token # 364: 3.767ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=8835 prop=8835 pred gate=device Token # 365: 117.010ms; value: next_token_ids=tensor([8835], device='cuda:0') mtp accept=1 prop=8835 top1=8835 accp=1.000 next=draft=4894 prop=4894 olap pair=111.6ms serial=198.5ms gain=86.9ms ratio=0.44 s0=4.5ms s1=194.0ms wait=0.1/47.0ms pred gate=device Token # 366: 3.846ms; value: next_token_ids=tensor([4894], device='cuda:0') mtp accept=1 prop=4894 top1=4894 accp=1.000 next=pair draft=545 prop=545 pred gate=device Token # 367: 116.216ms; value: next_token_ids=tensor([545], device='cuda:0') mtp accept=1 prop=545 top1=545 accp=1.000 next=draft=2738 prop=2738 olap pair=110.8ms serial=196.9ms gain=86.0ms ratio=0.44 s0=4.5ms s1=192.3ms wait=0.1/46.9ms pred gate=device Token # 368: 3.741ms; value: next_token_ids=tensor([2738], device='cuda:0') mtp accept=1 prop=2738 top1=2738 accp=1.000 next=pair draft=7016 prop=7016 pred gate=device Token # 369: 116.251ms; value: next_token_ids=tensor([7016], device='cuda:0') mtp accept=1 prop=7016 top1=7016 accp=1.000 next=draft=11274 prop=11274 olap pair=110.9ms serial=197.1ms gain=86.2ms ratio=0.44 s0=4.4ms s1=192.7ms wait=0.1/47.1ms pred gate=device Token # 370: 3.713ms; value: next_token_ids=tensor([11274], device='cuda:0') mtp accept=1 prop=11274 top1=11274 accp=1.000 next=pair draft=478 prop=478 pred gate=device Token # 371: 115.911ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.943 next=draft=39932 prop=39932 olap pair=110.7ms serial=196.7ms gain=86.1ms ratio=0.44 s0=4.4ms s1=192.3ms wait=0.1/47.2ms pred gate=device Token # 372: 3.700ms; value: next_token_ids=tensor([39932], device='cuda:0') mtp accept=1 prop=39932 top1=39932 accp=0.704 next=pair draft=7157 prop=7157 pred gate=device Token # 373: 115.723ms; value: next_token_ids=tensor([7157], device='cuda:0') mtp accept=1 prop=7157 top1=10094 accp=0.625 next=draft=1959 prop=1959 olap pair=110.4ms serial=196.3ms gain=85.9ms ratio=0.44 s0=4.4ms s1=191.9ms wait=0.1/47.1ms pred gate=device Token # 374: 3.682ms; value: next_token_ids=tensor([1959], device='cuda:0') mtp accept=1 prop=1959 top1=1959 accp=0.794 next=pair draft=90974 prop=22611 pred gate=device Token # 375: 115.911ms; value: next_token_ids=tensor([10861], device='cuda:0') mtp accept=0 prop=22611 top1=8283 accp=0.047 next=draft=8283 prop=8283 olap pair=110.7ms serial=196.4ms gain=85.7ms ratio=0.44 s0=4.7ms s1=191.7ms wait=0.1/46.9ms pred gate=device Token # 376: 116.741ms; value: next_token_ids=tensor([8283], device='cuda:0') mtp accept=1 prop=8283 top1=8283 accp=0.777 next=draft=5402 prop=5402 olap pair=111.3ms serial=197.9ms gain=86.6ms ratio=0.44 s0=3.9ms s1=194.0ms wait=0.1/48.4ms pred gate=device Token # 377: 3.775ms; value: next_token_ids=tensor([5402], device='cuda:0') mtp accept=1 prop=5402 top1=5402 accp=1.000 next=pair draft=8955 prop=8955 pred gate=device Token # 378: 115.824ms; value: next_token_ids=tensor([8955], device='cuda:0') mtp accept=1 prop=8955 top1=8955 accp=1.000 next=draft=320 prop=320 olap pair=110.6ms serial=196.5ms gain=85.9ms ratio=0.44 s0=4.3ms s1=192.2ms wait=0.1/47.5ms pred gate=device Token # 379: 3.718ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.998 next=pair draft=48742 prop=48742 pred gate=device Token # 380: 115.558ms; value: next_token_ids=tensor([856], device='cuda:0') mtp accept=0 prop=48742 top1=3660 accp=0.143 next=draft=16 prop=16 olap pair=110.3ms serial=196.1ms gain=85.7ms ratio=0.44 s0=4.1ms s1=192.0ms wait=0.1/48.0ms pred gate=device Token # 381: 116.635ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=0.962 next=draft=21 prop=21 olap pair=111.3ms serial=197.8ms gain=86.5ms ratio=0.44 s0=3.9ms s1=193.9ms wait=0.1/48.4ms pred gate=device Token # 382: 3.782ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=0.957 next=pair draft=28828 prop=28828 pred gate=device Token # 383: 116.274ms; value: next_token_ids=tensor([28828], device='cuda:0') mtp accept=1 prop=28828 top1=28828 accp=0.997 next=draft=2283 prop=2283 olap pair=110.9ms serial=197.3ms gain=86.4ms ratio=0.44 s0=3.8ms s1=193.5ms wait=0.1/48.4ms pred gate=device Token # 384: 3.703ms; value: next_token_ids=tensor([2283], device='cuda:0') mtp accept=1 prop=2283 top1=2283 accp=1.000 next=pair draft=3660 prop=3660 pred gate=device Token # 385: 116.215ms; value: next_token_ids=tensor([3660], device='cuda:0') mtp accept=1 prop=3660 top1=3660 accp=0.848 next=draft=28769 prop=28769 olap pair=111.0ms serial=197.3ms gain=86.3ms ratio=0.44 s0=3.9ms s1=193.4ms wait=0.1/48.3ms pred gate=device Token # 386: 3.824ms; value: next_token_ids=tensor([28769], device='cuda:0') mtp accept=1 prop=28769 top1=28769 accp=0.999 next=pair draft=36 prop=36 pred gate=device Token # 387: 116.238ms; value: next_token_ids=tensor([36], device='cuda:0') mtp accept=1 prop=36 top1=36 accp=1.000 next=draft=8842 prop=8842 olap pair=111.0ms serial=197.2ms gain=86.2ms ratio=0.44 s0=4.1ms s1=193.0ms wait=0.1/47.9ms pred gate=device Token # 388: 3.828ms; value: next_token_ids=tensor([8842], device='cuda:0') mtp accept=1 prop=8842 top1=8842 accp=0.832 next=pair draft=5841 prop=5841 pred gate=device Token # 389: 116.087ms; value: next_token_ids=tensor([5841], device='cuda:0') mtp accept=1 prop=5841 top1=5841 accp=1.000 next=draft=2431 prop=2431 olap pair=110.8ms serial=197.0ms gain=86.2ms ratio=0.44 s0=3.9ms s1=193.1ms wait=0.1/48.4ms pred gate=device Token # 390: 3.706ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=2431 top1=303 accp=0.145 next=pair draft=2204 prop=2204 pred gate=device Token # 391: 115.546ms; value: next_token_ids=tensor([2204], device='cuda:0') mtp accept=1 prop=2204 top1=445 accp=0.375 next=draft=2541 prop=6063 olap pair=110.3ms serial=195.9ms gain=85.6ms ratio=0.44 s0=3.9ms s1=192.0ms wait=0.1/48.5ms pred gate=device Token # 392: 3.800ms; value: next_token_ids=tensor([6063], device='cuda:0') mtp accept=1 prop=6063 top1=6063 accp=0.205 next=pair draft=45276 prop=45276 pred gate=device Token # 393: 116.930ms; value: next_token_ids=tensor([45276], device='cuda:0') mtp accept=1 prop=45276 top1=45276 accp=0.447 next=draft=90738 prop=90738 olap pair=111.6ms serial=198.5ms gain=86.8ms ratio=0.44 s0=4.1ms s1=194.4ms wait=0.1/48.2ms pred gate=device Token # 394: 3.758ms; value: next_token_ids=tensor([90738], device='cuda:0') mtp accept=1 prop=90738 top1=90738 accp=0.955 next=pair draft=572 prop=572 pred gate=device Token # 395: 116.119ms; value: next_token_ids=tensor([572], device='cuda:0') mtp accept=1 prop=572 top1=572 accp=1.000 next=draft=1237 prop=1237 olap pair=110.9ms serial=197.1ms gain=86.2ms ratio=0.44 s0=4.1ms s1=193.0ms wait=0.1/47.9ms pred gate=device Token # 396: 3.745ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=0.949 next=pair draft=8040 prop=8040 pred gate=device Token # 397: 115.326ms; value: next_token_ids=tensor([48818], device='cuda:0') mtp accept=0 prop=8040 top1=48818 accp=0.127 next=draft=23085 prop=23085 olap pair=110.1ms serial=195.6ms gain=85.5ms ratio=0.44 s0=3.8ms s1=191.7ms wait=0.1/48.5ms pred gate=device Token # 398: 116.447ms; value: next_token_ids=tensor([23085], device='cuda:0') mtp accept=1 prop=23085 top1=23085 accp=1.000 next=draft=58 prop=58 olap pair=110.9ms serial=197.3ms gain=86.4ms ratio=0.44 s0=3.8ms s1=193.5ms wait=0.1/48.5ms pred gate=device Token # 399: 3.734ms; value: next_token_ids=tensor([58], device='cuda:0') mtp accept=1 prop=58 top1=58 accp=1.000 next=pair draft=223 prop=223 pred gate=device Token # 400: 116.764ms; value: next_token_ids=tensor([223], device='cuda:0') mtp accept=1 prop=223 top1=223 accp=1.000 next=draft=24091 prop=24091 olap pair=111.4ms serial=197.8ms gain=86.5ms ratio=0.44 s0=4.1ms s1=193.8ms wait=0.1/48.0ms pred gate=device Token # 401: 3.848ms; value: next_token_ids=tensor([24091], device='cuda:0') mtp accept=1 prop=24091 top1=24091 accp=1.000 next=pair draft=18 prop=18 pred gate=device Token # 402: 116.303ms; value: next_token_ids=tensor([18], device='cuda:0') mtp accept=1 prop=18 top1=18 accp=1.000 next=draft=303 prop=7417 olap pair=111.0ms serial=197.3ms gain=86.2ms ratio=0.44 s0=4.2ms s1=193.1ms wait=0.1/48.1ms pred gate=device Token # 403: 3.700ms; value: next_token_ids=tensor([7417], device='cuda:0') mtp accept=1 prop=7417 top1=303 accp=0.610 next=pair draft=2431 prop=2431 pred gate=device Token # 404: 115.731ms; value: next_token_ids=tensor([3170], device='cuda:0') mtp accept=0 prop=2431 top1=2431 accp=0.876 next=draft=14206 prop=10674 olap pair=110.5ms serial=196.5ms gain=86.0ms ratio=0.44 s0=3.9ms s1=192.5ms wait=0.1/48.4ms pred gate=device Token # 405: 115.970ms; value: next_token_ids=tensor([122603], device='cuda:0') mtp accept=0 prop=10674 top1=122603 accp=0.372 next=draft=320 prop=320 olap pair=110.7ms serial=196.5ms gain=85.8ms ratio=0.44 s0=4.7ms s1=191.8ms wait=0.1/47.2ms pred gate=device Token # 406: 117.357ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=0 prop=320 top1=303 accp=0.009 next=draft=2524 prop=2524 olap pair=111.1ms serial=196.4ms gain=85.2ms ratio=0.43 s0=8.4ms s1=188.0ms wait=0.2/43.1ms pred gate=device Token # 407: 117.237ms; value: next_token_ids=tensor([2524], device='cuda:0') mtp accept=1 prop=2524 top1=2524 accp=1.000 next=draft=8842 prop=10602 olap pair=111.7ms serial=198.8ms gain=87.1ms ratio=0.44 s0=4.2ms s1=194.6ms wait=0.1/48.2ms pred gate=device Token # 408: 3.800ms; value: next_token_ids=tensor([26127], device='cuda:0') mtp accept=0 prop=10602 top1=26127 accp=0.092 next=pair draft=70359 prop=70359 pred gate=device Token # 409: 115.489ms; value: next_token_ids=tensor([70359], device='cuda:0') mtp accept=1 prop=70359 top1=70359 accp=1.000 next=draft=548 prop=548 olap pair=110.2ms serial=195.7ms gain=85.5ms ratio=0.44 s0=4.1ms s1=191.5ms wait=0.1/47.8ms pred gate=device Token # 410: 3.770ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=1 prop=548 top1=548 accp=1.000 next=pair draft=4339 prop=4339 pred gate=device Token # 411: 116.290ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=0.964 next=draft=3542 prop=11113 olap pair=110.9ms serial=196.3ms gain=85.4ms ratio=0.43 s0=4.3ms s1=192.0ms wait=0.1/47.8ms pred gate=device Token # 412: 3.829ms; value: next_token_ids=tensor([11113], device='cuda:0') mtp accept=1 prop=11113 top1=11113 accp=0.007 next=pair draft=320 prop=320 pred gate=device Token # 413: 116.797ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=1207 prop=1207 olap pair=111.5ms serial=198.1ms gain=86.6ms ratio=0.44 s0=4.2ms s1=193.9ms wait=0.1/47.7ms pred gate=device Token # 414: 3.747ms; value: next_token_ids=tensor([2803], device='cuda:0') mtp accept=0 prop=1207 top1=2803 accp=0.498 next=pair draft=2490 prop=303 pred gate=device Token # 415: 116.538ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.497 next=draft=1909 prop=4407 olap pair=111.2ms serial=197.5ms gain=86.3ms ratio=0.44 s0=4.1ms s1=193.4ms wait=0.1/48.0ms pred gate=device Token # 416: 3.707ms; value: next_token_ids=tensor([4407], device='cuda:0') mtp accept=1 prop=4407 top1=1909 accp=0.829 next=pair draft=1909 prop=1909 pred gate=device Token # 417: 115.548ms; value: next_token_ids=tensor([1909], device='cuda:0') mtp accept=1 prop=1909 top1=1909 accp=0.718 next=draft=44596 prop=44596 olap pair=110.2ms serial=195.9ms gain=85.7ms ratio=0.44 s0=4.0ms s1=191.9ms wait=0.1/48.2ms pred gate=device Token # 418: 3.794ms; value: next_token_ids=tensor([44596], device='cuda:0') mtp accept=1 prop=44596 top1=44596 accp=0.974 next=pair draft=13097 prop=13097 pred gate=device Token # 419: 116.511ms; value: next_token_ids=tensor([2111], device='cuda:0') mtp accept=0 prop=13097 top1=2111 accp=0.448 next=draft=2426 prop=2426 olap pair=110.4ms serial=195.3ms gain=84.9ms ratio=0.43 s0=6.9ms s1=188.4ms wait=0.2/45.0ms pred gate=device Token # 420: 115.802ms; value: next_token_ids=tensor([558], device='cuda:0') mtp accept=0 prop=2426 top1=558 accp=0.049 next=draft=77170 prop=77170 olap pair=110.2ms serial=195.4ms gain=85.2ms ratio=0.44 s0=4.9ms s1=190.5ms wait=0.1/47.0ms pred gate=device Token # 421: 116.190ms; value: next_token_ids=tensor([77170], device='cuda:0') mtp accept=1 prop=77170 top1=77170 accp=1.000 next=draft=48 prop=48 olap pair=110.8ms serial=197.0ms gain=86.1ms ratio=0.44 s0=4.0ms s1=192.9ms wait=0.1/48.0ms pred gate=device Token # 422: 3.745ms; value: next_token_ids=tensor([2467], device='cuda:0') mtp accept=0 prop=48 top1=2467 accp=0.013 next=pair draft=99924 prop=99924 pred gate=device Token # 423: 115.696ms; value: next_token_ids=tensor([99924], device='cuda:0') mtp accept=1 prop=99924 top1=99924 accp=1.000 next=draft=41354 prop=41354 olap pair=110.3ms serial=195.9ms gain=85.6ms ratio=0.44 s0=3.9ms s1=192.0ms wait=0.1/48.5ms pred gate=device Token # 424: 3.890ms; value: next_token_ids=tensor([41354], device='cuda:0') mtp accept=1 prop=41354 top1=41354 accp=0.986 next=pair draft=637 prop=637 pred gate=device Token # 425: 116.092ms; value: next_token_ids=tensor([15227], device='cuda:0') mtp accept=0 prop=637 top1=15227 accp=0.143 next=draft=637 prop=637 olap pair=110.8ms serial=197.1ms gain=86.2ms ratio=0.44 s0=3.9ms s1=193.1ms wait=0.1/48.3ms pred gate=device Token # 426: 116.400ms; value: next_token_ids=tensor([637], device='cuda:0') mtp accept=1 prop=637 top1=637 accp=0.988 next=draft=84938 prop=84938 olap pair=111.1ms serial=196.9ms gain=85.8ms ratio=0.44 s0=4.0ms s1=192.9ms wait=0.1/48.5ms pred gate=device Token # 427: 3.770ms; value: next_token_ids=tensor([6399], device='cuda:0') mtp accept=0 prop=84938 top1=6399 accp=0.057 next=pair draft=4339 prop=4339 pred gate=device Token # 428: 116.433ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=0.996 next=draft=478 prop=478 olap pair=111.1ms serial=197.5ms gain=86.4ms ratio=0.44 s0=3.8ms s1=193.7ms wait=0.1/48.5ms pred gate=device Token # 429: 3.757ms; value: next_token_ids=tensor([28608], device='cuda:0') mtp accept=0 prop=478 top1=478 accp=0.621 next=pair draft=39 prop=39 pred gate=device Token # 430: 116.620ms; value: next_token_ids=tensor([39], device='cuda:0') mtp accept=1 prop=39 top1=39 accp=1.000 next=draft=17 prop=17 olap pair=111.2ms serial=197.8ms gain=86.6ms ratio=0.44 s0=3.8ms s1=194.0ms wait=0.1/48.6ms pred gate=device Token # 431: 3.755ms; value: next_token_ids=tensor([2827], device='cuda:0') mtp accept=0 prop=17 top1=2827 accp=0.012 next=pair draft=478 prop=478 pred gate=device Token # 432: 115.756ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=0.995 next=draft=7346 prop=7346 olap pair=110.5ms serial=196.5ms gain=86.0ms ratio=0.44 s0=3.8ms s1=192.7ms wait=0.1/48.5ms pred gate=device Token # 433: 3.733ms; value: next_token_ids=tensor([445], device='cuda:0') mtp accept=0 prop=7346 top1=445 accp=0.013 next=pair draft=28608 prop=28608 pred gate=device Token # 434: 115.957ms; value: next_token_ids=tensor([28608], device='cuda:0') mtp accept=1 prop=28608 top1=28608 accp=0.995 next=draft=39 prop=39 olap pair=110.6ms serial=196.7ms gain=86.0ms ratio=0.44 s0=3.9ms s1=192.8ms wait=0.1/48.3ms pred gate=device Token # 435: 3.739ms; value: next_token_ids=tensor([39], device='cuda:0') mtp accept=1 prop=39 top1=39 accp=1.000 next=pair draft=103971 prop=103971 pred gate=device Token # 436: 115.868ms; value: next_token_ids=tensor([103971], device='cuda:0') mtp accept=1 prop=103971 top1=103971 accp=1.000 next=draft=303 prop=303 olap pair=110.5ms serial=196.3ms gain=85.8ms ratio=0.44 s0=4.0ms s1=192.3ms wait=0.1/48.3ms pred gate=device Token # 437: 3.751ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=974 prop=974 pred gate=device Token # 438: 116.035ms; value: next_token_ids=tensor([974], device='cuda:0') mtp accept=1 prop=974 top1=974 accp=0.875 next=draft=17862 prop=17862 olap pair=110.8ms serial=196.9ms gain=86.1ms ratio=0.44 s0=3.9ms s1=193.0ms wait=0.1/48.5ms pred gate=device Token # 439: 3.828ms; value: next_token_ids=tensor([17862], device='cuda:0') mtp accept=1 prop=17862 top1=17862 accp=0.599 next=pair draft=2827 prop=2827 pred gate=device Token # 440: 116.480ms; value: next_token_ids=tensor([2827], device='cuda:0') mtp accept=1 prop=2827 top1=5555 accp=0.022 next=draft=1237 prop=1237 olap pair=111.0ms serial=197.1ms gain=86.1ms ratio=0.44 s0=4.2ms s1=192.9ms wait=0.1/48.0ms pred gate=device Token # 441: 3.815ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=pair draft=6459 prop=6459 pred gate=device Token # 442: 116.299ms; value: next_token_ids=tensor([6459], device='cuda:0') mtp accept=1 prop=6459 top1=6459 accp=0.788 next=draft=48 prop=48 olap pair=110.9ms serial=196.1ms gain=85.2ms ratio=0.43 s0=4.0ms s1=192.1ms wait=0.1/48.2ms pred gate=device Token # 443: 3.732ms; value: next_token_ids=tensor([48], device='cuda:0') mtp accept=1 prop=48 top1=48 accp=1.000 next=pair draft=1227 prop=1227 pred gate=device Token # 444: 116.110ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=1 prop=1227 top1=1227 accp=1.000 next=draft=389 prop=389 olap pair=110.8ms serial=196.2ms gain=85.3ms ratio=0.44 s0=3.9ms s1=192.3ms wait=0.1/48.4ms pred gate=device Token # 445: 3.769ms; value: next_token_ids=tensor([389], device='cuda:0') mtp accept=1 prop=389 top1=389 accp=0.995 next=pair draft=10756 prop=10756 pred gate=device Token # 446: 116.939ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=1 prop=10756 top1=10756 accp=0.991 next=draft=8975 prop=8975 olap pair=111.7ms serial=198.9ms gain=87.2ms ratio=0.44 s0=3.8ms s1=195.1ms wait=0.1/48.5ms pred gate=device Token # 447: 3.742ms; value: next_token_ids=tensor([2827], device='cuda:0') mtp accept=0 prop=8975 top1=2827 accp=0.001 next=pair draft=303 prop=303 pred gate=device Token # 448: 116.047ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.816 next=draft=9168 prop=9168 olap pair=110.8ms serial=196.9ms gain=86.1ms ratio=0.44 s0=3.9ms s1=193.0ms wait=0.1/48.4ms pred gate=device Token # 449: 3.724ms; value: next_token_ids=tensor([9168], device='cuda:0') mtp accept=1 prop=9168 top1=9168 accp=0.928 next=pair draft=7849 prop=7849 pred gate=device Token # 450: 116.462ms; value: next_token_ids=tensor([4602], device='cuda:0') mtp accept=0 prop=7849 top1=7849 accp=0.721 next=draft=41992 prop=41992 olap pair=111.2ms serial=196.8ms gain=85.6ms ratio=0.43 s0=7.7ms s1=189.1ms wait=0.2/44.0ms pred gate=device Token # 451: 116.249ms; value: next_token_ids=tensor([19426], device='cuda:0') mtp accept=0 prop=41992 top1=19426 accp=0.037 next=draft=10756 prop=10756 olap pair=110.9ms serial=197.3ms gain=86.3ms ratio=0.44 s0=3.9ms s1=193.4ms wait=0.1/48.3ms pred gate=device Token # 452: 117.234ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=1 prop=10756 top1=10756 accp=0.948 next=draft=1415 prop=1415 olap pair=111.8ms serial=199.0ms gain=87.2ms ratio=0.44 s0=3.9ms s1=195.1ms wait=0.1/48.3ms pred gate=device Token # 453: 3.859ms; value: next_token_ids=tensor([1415], device='cuda:0') mtp accept=1 prop=1415 top1=1415 accp=0.999 next=pair draft=34864 prop=34864 pred gate=device Token # 454: 116.460ms; value: next_token_ids=tensor([34864], device='cuda:0') mtp accept=1 prop=34864 top1=34864 accp=1.000 next=draft=1237 prop=1237 olap pair=111.2ms serial=197.6ms gain=86.4ms ratio=0.44 s0=4.1ms s1=193.5ms wait=0.1/48.0ms pred gate=device Token # 455: 3.778ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=0.602 next=pair draft=8040 prop=8040 pred gate=device Token # 456: 116.562ms; value: next_token_ids=tensor([8040], device='cuda:0') mtp accept=1 prop=8040 top1=8040 accp=1.000 next=draft=303 prop=303 olap pair=111.2ms serial=197.7ms gain=86.5ms ratio=0.44 s0=4.0ms s1=193.7ms wait=0.1/48.0ms pred gate=device Token # 457: 3.731ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=16126 prop=16126 pred gate=device Token # 458: 116.772ms; value: next_token_ids=tensor([53091], device='cuda:0') mtp accept=0 prop=16126 top1=1833 accp=0.287 next=draft=4374 prop=4374 olap pair=111.4ms serial=198.1ms gain=86.7ms ratio=0.44 s0=3.9ms s1=194.2ms wait=0.1/48.4ms pred gate=device Token # 459: 116.980ms; value: next_token_ids=tensor([4374], device='cuda:0') mtp accept=1 prop=4374 top1=4374 accp=1.000 next=draft=1465 prop=1465 olap pair=111.6ms serial=198.2ms gain=86.6ms ratio=0.44 s0=4.0ms s1=194.2ms wait=0.1/48.1ms pred gate=device Token # 460: 3.714ms; value: next_token_ids=tensor([1465], device='cuda:0') mtp accept=1 prop=1465 top1=1465 accp=1.000 next=pair draft=13582 prop=13582 pred gate=device Token # 461: 116.875ms; value: next_token_ids=tensor([13582], device='cuda:0') mtp accept=1 prop=13582 top1=13582 accp=1.000 next=draft=21 prop=21 olap pair=111.6ms serial=198.2ms gain=86.6ms ratio=0.44 s0=4.2ms s1=194.0ms wait=0.1/48.0ms pred gate=device Token # 462: 3.694ms; value: next_token_ids=tensor([21], device='cuda:0') mtp accept=1 prop=21 top1=21 accp=1.000 next=pair draft=16 prop=16 pred gate=device Token # 463: 117.505ms; value: next_token_ids=tensor([16], device='cuda:0') mtp accept=1 prop=16 top1=16 accp=1.000 next=draft=20 prop=20 olap pair=112.2ms serial=198.1ms gain=86.0ms ratio=0.43 s0=6.4ms s1=191.7ms wait=0.2/45.5ms pred gate=device Token # 464: 3.794ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=2541 prop=2541 pred gate=device Token # 465: 116.046ms; value: next_token_ids=tensor([2431], device='cuda:0') mtp accept=0 prop=2541 top1=2431 accp=0.022 next=draft=2541 prop=2541 olap pair=110.8ms serial=195.8ms gain=85.0ms ratio=0.43 s0=8.2ms s1=187.6ms wait=0.2/43.6ms pred gate=device Token # 466: 117.166ms; value: next_token_ids=tensor([2541], device='cuda:0') mtp accept=1 prop=2541 top1=2541 accp=0.875 next=draft=16126 prop=16126 olap pair=111.9ms serial=199.1ms gain=87.3ms ratio=0.44 s0=4.1ms s1=195.0ms wait=0.1/48.0ms pred gate=device Token # 467: 3.860ms; value: next_token_ids=tensor([28608], device='cuda:0') mtp accept=0 prop=16126 top1=28608 accp=0.004 next=pair draft=39 prop=39 pred gate=device Token # 468: 116.904ms; value: next_token_ids=tensor([39], device='cuda:0') mtp accept=1 prop=39 top1=39 accp=1.000 next=draft=303 prop=303 olap pair=111.6ms serial=197.5ms gain=85.9ms ratio=0.44 s0=4.1ms s1=193.4ms wait=0.1/48.1ms pred gate=device Token # 469: 3.684ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.981 next=pair draft=4569 prop=4569 pred gate=device Token # 470: 116.359ms; value: next_token_ids=tensor([1833], device='cuda:0') mtp accept=0 prop=4569 top1=1833 accp=0.021 next=draft=2827 prop=2827 olap pair=111.1ms serial=197.8ms gain=86.7ms ratio=0.44 s0=3.9ms s1=193.9ms wait=0.1/48.4ms pred gate=device Token # 471: 116.368ms; value: next_token_ids=tensor([2827], device='cuda:0') mtp accept=1 prop=2827 top1=2827 accp=1.000 next=draft=450 prop=450 olap pair=111.0ms serial=197.3ms gain=86.3ms ratio=0.44 s0=3.9ms s1=193.4ms wait=0.1/48.4ms pred gate=device Token # 472: 3.761ms; value: next_token_ids=tensor([450], device='cuda:0') mtp accept=1 prop=450 top1=450 accp=0.768 next=pair draft=926 prop=20 pred gate=device Token # 473: 116.371ms; value: next_token_ids=tensor([13097], device='cuda:0') mtp accept=0 prop=20 top1=13097 accp=0.003 next=draft=10756 prop=10756 olap pair=111.0ms serial=196.3ms gain=85.2ms ratio=0.43 s0=4.2ms s1=192.0ms wait=0.1/48.0ms pred gate=device Token # 474: 117.223ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=1 prop=10756 top1=10756 accp=1.000 next=draft=303 prop=303 olap pair=111.8ms serial=198.0ms gain=86.2ms ratio=0.44 s0=4.5ms s1=193.6ms wait=0.1/47.7ms pred gate=device Token # 475: 3.816ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=1207 prop=1207 pred gate=device Token # 476: 116.433ms; value: next_token_ids=tensor([1207], device='cuda:0') mtp accept=1 prop=1207 top1=1207 accp=1.000 next=draft=7849 prop=7849 olap pair=111.1ms serial=197.6ms gain=86.4ms ratio=0.44 s0=4.0ms s1=193.6ms wait=0.1/48.4ms pred gate=device Token # 477: 3.797ms; value: next_token_ids=tensor([7849], device='cuda:0') mtp accept=1 prop=7849 top1=7849 accp=0.934 next=pair draft=33912 prop=33912 pred gate=device Token # 478: 117.301ms; value: next_token_ids=tensor([33912], device='cuda:0') mtp accept=1 prop=33912 top1=33912 accp=0.968 next=draft=1299 prop=1299 olap pair=112.0ms serial=198.0ms gain=85.9ms ratio=0.43 s0=4.2ms s1=193.7ms wait=0.1/48.1ms pred gate=device Token # 479: 3.785ms; value: next_token_ids=tensor([1299], device='cuda:0') mtp accept=1 prop=1299 top1=1299 accp=0.986 next=pair draft=34864 prop=34864 pred gate=device Token # 480: 116.809ms; value: next_token_ids=tensor([37209], device='cuda:0') mtp accept=0 prop=34864 top1=37209 accp=0.000 next=draft=621 prop=3878 olap pair=110.8ms serial=196.1ms gain=85.3ms ratio=0.44 s0=6.5ms s1=189.6ms wait=0.2/45.3ms pred gate=device Token # 481: 116.265ms; value: next_token_ids=tensor([621], device='cuda:0') mtp accept=0 prop=3878 top1=621 accp=0.768 next=draft=19426 prop=19426 olap pair=110.9ms serial=197.4ms gain=86.4ms ratio=0.44 s0=3.9ms s1=193.4ms wait=0.1/48.4ms pred gate=device Token # 482: 116.408ms; value: next_token_ids=tensor([19426], device='cuda:0') mtp accept=1 prop=19426 top1=19426 accp=0.979 next=draft=10756 prop=10756 olap pair=111.1ms serial=196.8ms gain=85.7ms ratio=0.44 s0=4.2ms s1=192.5ms wait=0.1/47.9ms pred gate=device Token # 483: 3.784ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=1 prop=10756 top1=10756 accp=0.846 next=pair draft=14164 prop=14164 pred gate=device Token # 484: 116.186ms; value: next_token_ids=tensor([14164], device='cuda:0') mtp accept=1 prop=14164 top1=14164 accp=0.938 next=draft=4009 prop=2204 olap pair=110.9ms serial=197.4ms gain=86.5ms ratio=0.44 s0=3.9ms s1=193.5ms wait=0.1/48.4ms pred gate=device Token # 485: 3.841ms; value: next_token_ids=tensor([4009], device='cuda:0') mtp accept=0 prop=2204 top1=4009 accp=0.816 next=pair draft=303 prop=303 pred gate=device Token # 486: 116.489ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=4339 prop=3660 olap pair=111.2ms serial=197.1ms gain=85.9ms ratio=0.44 s0=6.3ms s1=190.8ms wait=0.2/45.7ms pred gate=device Token # 487: 3.767ms; value: next_token_ids=tensor([3660], device='cuda:0') mtp accept=1 prop=3660 top1=4339 accp=0.918 next=pair draft=7849 prop=7849 pred gate=device Token # 488: 116.915ms; value: next_token_ids=tensor([7849], device='cuda:0') mtp accept=1 prop=7849 top1=7849 accp=1.000 next=draft=33912 prop=33912 olap pair=111.6ms serial=198.4ms gain=86.9ms ratio=0.44 s0=3.9ms s1=194.6ms wait=0.1/48.5ms pred gate=device Token # 489: 3.838ms; value: next_token_ids=tensor([33912], device='cuda:0') mtp accept=1 prop=33912 top1=33912 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 490: 116.826ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=6459 prop=6459 olap pair=111.5ms serial=198.5ms gain=86.9ms ratio=0.44 s0=3.9ms s1=194.6ms wait=0.1/48.4ms pred gate=device Token # 491: 3.783ms; value: next_token_ids=tensor([33298], device='cuda:0') mtp accept=0 prop=6459 top1=33298 accp=0.001 next=pair draft=4339 prop=4339 pred gate=device Token # 492: 116.551ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=1.000 next=draft=3343 prop=3343 olap pair=111.2ms serial=197.8ms gain=86.6ms ratio=0.44 s0=3.9ms s1=193.9ms wait=0.1/48.4ms pred gate=device Token # 493: 3.805ms; value: next_token_ids=tensor([23945], device='cuda:0') mtp accept=0 prop=3343 top1=23945 accp=0.040 next=pair draft=10756 prop=10756 pred gate=device Token # 494: 115.923ms; value: next_token_ids=tensor([10602], device='cuda:0') mtp accept=0 prop=10756 top1=10602 accp=0.000 next=draft=320 prop=320 olap pair=110.5ms serial=196.6ms gain=86.1ms ratio=0.44 s0=3.8ms s1=192.8ms wait=0.1/48.4ms pred gate=device Token # 495: 116.259ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=1.000 next=draft=556 prop=556 olap pair=111.0ms serial=197.3ms gain=86.3ms ratio=0.44 s0=3.8ms s1=193.4ms wait=0.1/48.5ms pred gate=device Token # 496: 3.754ms; value: next_token_ids=tensor([2204], device='cuda:0') mtp accept=0 prop=556 top1=2204 accp=0.008 next=pair draft=8842 prop=8842 pred gate=device Token # 497: 116.918ms; value: next_token_ids=tensor([100593], device='cuda:0') mtp accept=0 prop=8842 top1=100593 accp=0.110 next=draft=10756 prop=10602 olap pair=111.6ms serial=198.2ms gain=86.6ms ratio=0.44 s0=3.9ms s1=194.3ms wait=0.1/48.5ms pred gate=device Token # 498: 116.546ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=0 prop=10602 top1=10756 accp=0.680 next=draft=25650 prop=25650 olap pair=111.2ms serial=197.8ms gain=86.6ms ratio=0.44 s0=3.9ms s1=193.9ms wait=0.1/48.3ms pred gate=device Token # 499: 115.962ms; value: next_token_ids=tensor([10780], device='cuda:0') mtp accept=0 prop=25650 top1=10780 accp=0.517 next=draft=621 prop=621 olap pair=110.6ms serial=196.3ms gain=85.7ms ratio=0.44 s0=4.3ms s1=192.0ms wait=0.1/48.0ms pred gate=device Token # 500: 116.187ms; value: next_token_ids=tensor([621], device='cuda:0') mtp accept=1 prop=621 top1=621 accp=1.000 next=draft=7557 prop=7557 olap pair=110.8ms serial=196.7ms gain=86.0ms ratio=0.44 s0=4.4ms s1=192.4ms wait=0.1/47.7ms pred gate=device Token # 501: 3.760ms; value: next_token_ids=tensor([7557], device='cuda:0') mtp accept=1 prop=7557 top1=7557 accp=0.968 next=pair draft=15227 prop=15227 pred gate=device Token # 502: 116.259ms; value: next_token_ids=tensor([48], device='cuda:0') mtp accept=0 prop=15227 top1=48 accp=0.498 next=draft=1457 prop=1457 olap pair=110.9ms serial=197.3ms gain=86.4ms ratio=0.44 s0=3.8ms s1=193.5ms wait=0.1/48.5ms pred gate=device Token # 503: 116.528ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=draft=15227 prop=15227 olap pair=111.1ms serial=197.8ms gain=86.6ms ratio=0.44 s0=3.8ms s1=194.0ms wait=0.1/48.6ms pred gate=device Token # 504: 3.927ms; value: next_token_ids=tensor([15227], device='cuda:0') mtp accept=1 prop=15227 top1=15227 accp=0.999 next=pair draft=572 prop=572 pred gate=device Token # 505: 116.424ms; value: next_token_ids=tensor([572], device='cuda:0') mtp accept=1 prop=572 top1=572 accp=1.000 next=draft=303 prop=303 olap pair=111.1ms serial=197.6ms gain=86.5ms ratio=0.44 s0=3.8ms s1=193.8ms wait=0.1/48.7ms pred gate=device Token # 506: 3.780ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=3996 prop=3996 pred gate=device Token # 507: 116.374ms; value: next_token_ids=tensor([3996], device='cuda:0') mtp accept=1 prop=3996 top1=3996 accp=0.927 next=draft=7849 prop=7849 olap pair=111.0ms serial=197.3ms gain=86.3ms ratio=0.44 s0=4.5ms s1=192.8ms wait=0.1/47.0ms pred gate=device Token # 508: 3.792ms; value: next_token_ids=tensor([7849], device='cuda:0') mtp accept=1 prop=7849 top1=7849 accp=1.000 next=pair draft=15227 prop=15227 pred gate=device Token # 509: 116.692ms; value: next_token_ids=tensor([15227], device='cuda:0') mtp accept=1 prop=15227 top1=15227 accp=1.000 next=draft=33298 prop=33298 olap pair=111.2ms serial=197.9ms gain=86.6ms ratio=0.44 s0=4.4ms s1=193.5ms wait=0.1/47.1ms pred gate=device Token # 510: 3.779ms; value: next_token_ids=tensor([33298], device='cuda:0') mtp accept=1 prop=33298 top1=33298 accp=1.000 next=pair draft=4339 prop=4339 pred gate=device Token # 511: 116.405ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=0.987 next=draft=1877 prop=23945 olap pair=111.0ms serial=197.3ms gain=86.2ms ratio=0.44 s0=4.5ms s1=192.7ms wait=0.1/46.9ms pred gate=device Token # 512: 3.817ms; value: next_token_ids=tensor([23945], device='cuda:0') mtp accept=1 prop=23945 top1=118322 accp=0.266 next=pair draft=10756 prop=10756 pred gate=device Token # 513: 115.847ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=0 prop=10756 top1=1472 accp=0.002 next=draft=556 prop=2684 olap pair=110.5ms serial=196.3ms gain=85.8ms ratio=0.44 s0=4.4ms s1=191.9ms wait=0.1/47.1ms pred gate=device Token # 514: 116.378ms; value: next_token_ids=tensor([2803], device='cuda:0') mtp accept=0 prop=2684 top1=2803 accp=0.012 next=draft=303 prop=303 olap pair=111.0ms serial=197.4ms gain=86.4ms ratio=0.44 s0=4.0ms s1=193.4ms wait=0.1/48.1ms pred gate=device Token # 515: 116.532ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=draft=4754 prop=1909 olap pair=111.2ms serial=197.8ms gain=86.7ms ratio=0.44 s0=4.2ms s1=193.6ms wait=0.1/47.5ms pred gate=device Token # 516: 3.754ms; value: next_token_ids=tensor([1909], device='cuda:0') mtp accept=1 prop=1909 top1=4407 accp=0.381 next=pair draft=2386 prop=12223 pred gate=device Token # 517: 117.216ms; value: next_token_ids=tensor([2386], device='cuda:0') mtp accept=0 prop=12223 top1=2386 accp=0.760 next=draft=4398 prop=4398 olap pair=111.9ms serial=198.1ms gain=86.2ms ratio=0.44 s0=4.2ms s1=194.0ms wait=0.1/47.8ms pred gate=device Token # 518: 116.714ms; value: next_token_ids=tensor([545], device='cuda:0') mtp accept=0 prop=4398 top1=1275 accp=0.010 next=draft=7849 prop=7849 olap pair=111.4ms serial=198.1ms gain=86.7ms ratio=0.44 s0=4.0ms s1=194.1ms wait=0.1/48.3ms pred gate=device Token # 519: 116.316ms; value: next_token_ids=tensor([7849], device='cuda:0') mtp accept=1 prop=7849 top1=7849 accp=1.000 next=draft=33912 prop=33912 olap pair=110.8ms serial=197.0ms gain=86.1ms ratio=0.44 s0=4.0ms s1=193.0ms wait=0.1/48.2ms pred gate=device Token # 520: 3.738ms; value: next_token_ids=tensor([33912], device='cuda:0') mtp accept=1 prop=33912 top1=33912 accp=1.000 next=pair draft=22636 prop=22636 pred gate=device Token # 521: 116.367ms; value: next_token_ids=tensor([22636], device='cuda:0') mtp accept=1 prop=22636 top1=22636 accp=0.605 next=draft=34864 prop=34864 olap pair=111.1ms serial=197.6ms gain=86.5ms ratio=0.44 s0=3.9ms s1=193.7ms wait=0.1/48.4ms pred gate=device Token # 522: 3.729ms; value: next_token_ids=tensor([34864], device='cuda:0') mtp accept=1 prop=34864 top1=34864 accp=0.514 next=pair draft=2022 prop=32622 pred gate=device Token # 523: 116.265ms; value: next_token_ids=tensor([2022], device='cuda:0') mtp accept=0 prop=32622 top1=2022 accp=0.767 next=draft=303 prop=303 olap pair=111.0ms serial=197.2ms gain=86.2ms ratio=0.44 s0=3.8ms s1=193.4ms wait=0.1/48.5ms pred gate=device Token # 524: 116.668ms; value: next_token_ids=tensor([621], device='cuda:0') mtp accept=0 prop=303 top1=621 accp=0.155 next=draft=2920 prop=2920 olap pair=111.3ms serial=197.4ms gain=86.1ms ratio=0.44 s0=5.5ms s1=192.0ms wait=0.2/46.4ms pred gate=device Token # 525: 117.198ms; value: next_token_ids=tensor([16486], device='cuda:0') mtp accept=0 prop=2920 top1=16486 accp=0.007 next=draft=15227 prop=15227 olap pair=111.8ms serial=198.8ms gain=87.0ms ratio=0.44 s0=4.5ms s1=194.3ms wait=0.1/47.4ms pred gate=device Token # 526: 117.002ms; value: next_token_ids=tensor([10756], device='cuda:0') mtp accept=0 prop=15227 top1=10756 accp=0.039 next=draft=15227 prop=15227 olap pair=111.6ms serial=198.5ms gain=87.0ms ratio=0.44 s0=3.9ms s1=194.6ms wait=0.1/48.3ms pred gate=device Token # 527: 116.278ms; value: next_token_ids=tensor([15227], device='cuda:0') mtp accept=1 prop=15227 top1=15227 accp=1.000 next=draft=303 prop=303 olap pair=110.9ms serial=197.1ms gain=86.2ms ratio=0.44 s0=4.0ms s1=193.1ms wait=0.1/48.2ms pred gate=device Token # 528: 3.787ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=1.000 next=pair draft=4272 prop=4272 pred gate=device Token # 529: 115.868ms; value: next_token_ids=tensor([4272], device='cuda:0') mtp accept=1 prop=4272 top1=4272 accp=0.838 next=draft=13199 prop=13199 olap pair=110.6ms serial=196.3ms gain=85.8ms ratio=0.44 s0=5.2ms s1=191.1ms wait=0.1/46.8ms pred gate=device Token # 530: 3.792ms; value: next_token_ids=tensor([13199], device='cuda:0') mtp accept=1 prop=13199 top1=13199 accp=0.903 next=pair draft=4498 prop=4498 pred gate=device Token # 531: 116.052ms; value: next_token_ids=tensor([4498], device='cuda:0') mtp accept=1 prop=4498 top1=4498 accp=1.000 next=draft=320 prop=320 olap pair=110.7ms serial=196.9ms gain=86.1ms ratio=0.44 s0=3.9ms s1=192.9ms wait=0.1/48.2ms pred gate=device Token # 532: 3.876ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.997 next=pair draft=556 prop=556 pred gate=device Token # 533: 116.239ms; value: next_token_ids=tensor([556], device='cuda:0') mtp accept=1 prop=556 top1=556 accp=0.999 next=draft=13709 prop=13709 olap pair=111.0ms serial=197.5ms gain=86.5ms ratio=0.44 s0=4.2ms s1=193.3ms wait=0.1/47.8ms pred gate=device Token # 534: 3.769ms; value: next_token_ids=tensor([124428], device='cuda:0') mtp accept=0 prop=13709 top1=50066 accp=0.004 next=pair draft=5555 prop=5555 pred gate=device Token # 535: 116.534ms; value: next_token_ids=tensor([5555], device='cuda:0') mtp accept=1 prop=5555 top1=5555 accp=1.000 next=draft=16303 prop=16303 olap pair=111.1ms serial=197.5ms gain=86.4ms ratio=0.44 s0=4.4ms s1=193.1ms wait=0.1/47.1ms pred gate=device Token # 536: 3.792ms; value: next_token_ids=tensor([100642], device='cuda:0') mtp accept=0 prop=16303 top1=16303 accp=0.696 next=pair draft=478 prop=478 pred gate=device Token # 537: 116.549ms; value: next_token_ids=tensor([478], device='cuda:0') mtp accept=1 prop=478 top1=478 accp=1.000 next=draft=11537 prop=11537 olap pair=111.2ms serial=197.7ms gain=86.5ms ratio=0.44 s0=4.4ms s1=193.3ms wait=0.1/47.1ms pred gate=device Token # 538: 3.768ms; value: next_token_ids=tensor([1909], device='cuda:0') mtp accept=0 prop=11537 top1=1909 accp=0.150 next=pair draft=68523 prop=68523 pred gate=device Token # 539: 116.875ms; value: next_token_ids=tensor([68523], device='cuda:0') mtp accept=1 prop=68523 top1=68523 accp=0.993 next=draft=44596 prop=44596 olap pair=111.6ms serial=198.2ms gain=86.6ms ratio=0.44 s0=4.5ms s1=193.7ms wait=0.1/47.1ms pred gate=device Token # 540: 3.705ms; value: next_token_ids=tensor([44596], device='cuda:0') mtp accept=1 prop=44596 top1=44596 accp=0.968 next=pair draft=428 prop=428 pred gate=device Token # 541: 118.262ms; value: next_token_ids=tensor([54182], device='cuda:0') mtp accept=0 prop=428 top1=54182 accp=0.018 next=draft=27 prop=27 olap pair=112.7ms serial=200.3ms gain=87.5ms ratio=0.44 s0=4.6ms s1=195.7ms wait=0.1/47.0ms pred gate=device Token # 542: 116.960ms; value: next_token_ids=tensor([27], device='cuda:0') mtp accept=1 prop=27 top1=27 accp=1.000 next=draft=16734 prop=16734 olap pair=111.6ms serial=197.9ms gain=86.3ms ratio=0.44 s0=4.3ms s1=193.6ms wait=0.1/47.8ms pred gate=device Token # 543: 3.792ms; value: next_token_ids=tensor([16734], device='cuda:0') mtp accept=1 prop=16734 top1=16734 accp=0.999 next=pair draft=1237 prop=1237 pred gate=device Token # 544: 116.686ms; value: next_token_ids=tensor([1237], device='cuda:0') mtp accept=1 prop=1237 top1=1237 accp=1.000 next=draft=15206 prop=15206 olap pair=111.4ms serial=198.1ms gain=86.7ms ratio=0.44 s0=4.3ms s1=193.8ms wait=0.1/47.5ms pred gate=device Token # 545: 3.754ms; value: next_token_ids=tensor([50986], device='cuda:0') mtp accept=0 prop=15206 top1=50986 accp=0.000 next=pair draft=301 prop=301 pred gate=device Token # 546: 116.287ms; value: next_token_ids=tensor([301], device='cuda:0') mtp accept=1 prop=301 top1=301 accp=0.998 next=draft=27 prop=27 olap pair=110.9ms serial=196.8ms gain=85.9ms ratio=0.44 s0=5.3ms s1=191.5ms wait=0.2/46.8ms pred gate=device Token # 547: 3.761ms; value: next_token_ids=tensor([27], device='cuda:0') mtp accept=1 prop=27 top1=27 accp=1.000 next=pair draft=1190 prop=1190 pred gate=device Token # 548: 116.278ms; value: next_token_ids=tensor([1190], device='cuda:0') mtp accept=1 prop=1190 top1=1190 accp=1.000 next=draft=9678 prop=9678 olap pair=110.9ms serial=196.9ms gain=86.0ms ratio=0.44 s0=4.9ms s1=191.9ms wait=0.1/46.9ms pred gate=device Token # 549: 3.771ms; value: next_token_ids=tensor([9678], device='cuda:0') mtp accept=1 prop=9678 top1=9678 accp=1.000 next=pair draft=985 prop=985 pred gate=device Token # 550: 115.706ms; value: next_token_ids=tensor([985], device='cuda:0') mtp accept=1 prop=985 top1=985 accp=0.964 next=draft=16734 prop=16734 olap pair=110.3ms serial=196.1ms gain=85.8ms ratio=0.44 s0=3.9ms s1=192.3ms wait=0.1/48.4ms pred gate=device Token # 551: 3.698ms; value: next_token_ids=tensor([1227], device='cuda:0') mtp accept=0 prop=16734 top1=16734 accp=0.760 next=pair draft=637 prop=637 pred gate=device Token # 552: 116.522ms; value: next_token_ids=tensor([637], device='cuda:0') mtp accept=1 prop=637 top1=637 accp=0.830 next=draft=8009 prop=8009 olap pair=111.2ms serial=197.9ms gain=86.7ms ratio=0.44 s0=3.8ms s1=194.1ms wait=0.1/48.6ms pred gate=device Token # 553: 3.739ms; value: next_token_ids=tensor([8009], device='cuda:0') mtp accept=1 prop=8009 top1=8009 accp=0.917 next=pair draft=28827 prop=28827 pred gate=device Token # 554: 115.338ms; value: next_token_ids=tensor([118302], device='cuda:0') mtp accept=0 prop=28827 top1=118302 accp=0.000 next=draft=548 prop=548 olap pair=110.1ms serial=195.6ms gain=85.5ms ratio=0.44 s0=3.8ms s1=191.8ms wait=0.1/48.5ms pred gate=device Token # 555: 116.771ms; value: next_token_ids=tensor([548], device='cuda:0') mtp accept=1 prop=548 top1=548 accp=0.999 next=draft=4339 prop=4339 olap pair=111.4ms serial=197.9ms gain=86.5ms ratio=0.44 s0=4.0ms s1=193.9ms wait=0.1/48.3ms pred gate=device Token # 556: 3.726ms; value: next_token_ids=tensor([4339], device='cuda:0') mtp accept=1 prop=4339 top1=4339 accp=1.000 next=pair draft=320 prop=320 pred gate=device Token # 557: 115.915ms; value: next_token_ids=tensor([320], device='cuda:0') mtp accept=1 prop=320 top1=320 accp=0.456 next=draft=1909 prop=1909 olap pair=110.7ms serial=196.8ms gain=86.1ms ratio=0.44 s0=4.0ms s1=192.8ms wait=0.1/48.2ms pred gate=device Token # 558: 3.833ms; value: next_token_ids=tensor([48], device='cuda:0') mtp accept=0 prop=1909 top1=48 accp=0.188 next=pair draft=1457 prop=1457 pred gate=device Token # 559: 116.624ms; value: next_token_ids=tensor([1457], device='cuda:0') mtp accept=1 prop=1457 top1=1457 accp=1.000 next=draft=4602 prop=4602 olap pair=111.3ms serial=197.8ms gain=86.5ms ratio=0.44 s0=4.0ms s1=193.9ms wait=0.1/48.4ms pred gate=device Token # 560: 3.824ms; value: next_token_ids=tensor([1299], device='cuda:0') mtp accept=0 prop=4602 top1=1299 accp=0.342 next=pair draft=5852 prop=5852 pred gate=device Token # 561: 116.343ms; value: next_token_ids=tensor([5852], device='cuda:0') mtp accept=1 prop=5852 top1=5852 accp=1.000 next=draft=12875 prop=12875 olap pair=110.9ms serial=197.3ms gain=86.4ms ratio=0.44 s0=3.9ms s1=193.4ms wait=0.1/48.4ms pred gate=device Token # 562: 3.846ms; value: next_token_ids=tensor([12875], device='cuda:0') mtp accept=1 prop=12875 top1=12875 accp=1.000 next=pair draft=58 prop=58 pred gate=device Token # 563: 116.860ms; value: next_token_ids=tensor([58], device='cuda:0') mtp accept=1 prop=58 top1=58 accp=1.000 next=draft=20 prop=20 olap pair=111.5ms serial=198.4ms gain=86.9ms ratio=0.44 s0=3.9ms s1=194.5ms wait=0.1/48.5ms pred gate=device Token # 564: 3.739ms; value: next_token_ids=tensor([20], device='cuda:0') mtp accept=1 prop=20 top1=20 accp=1.000 next=pair draft=303 prop=303 pred gate=device Token # 565: 116.253ms; value: next_token_ids=tensor([303], device='cuda:0') mtp accept=1 prop=303 top1=303 accp=0.980 next=draft=112659 prop=112659 olap pair=111.0ms serial=197.4ms gain=86.4ms ratio=0.44 s0=3.8ms s1=193.5ms wait=0.1/48.6ms pred gate=device Token # 566: 3.735ms; value: next_token_ids=tensor([4009], device='cuda:0') mtp accept=0 prop=112659 top1=4009 accp=0.134 next=pair draft=7174 prop=7174 pred gate=device