>>21725 I have it on my THINKPAD wwww seems ok, prefer cwfs for the big fucking server though >>21726 whoa, GUMI where's that fucking nerd GUMI cosplayer
hey vip I got a programming question for you I'm struggling to make a parallel code base run well. I'm not sure how to debug things, so, here we go: 1. I have a n algorithm with 20 MB of RAM required to run on ~2 MB of data. 2. I have 112 CPU cores. The sets of 2 MB of data are.. basically infinite. Like, I'll have 2000 of these chunks of data and I need to run the algorithm on it 2000 times. The issue I get is: Parallelizing the algorithm where I copy the 20 MB onto each core and running on the 2 MB of data requires a massive overhead and it's incredibly slow. If I put the algorithm in shared memory, this cleans up the overhead a bit, but I run into issues where the chunks of data aren't perfectly equal, so some finish earlier than others And then I have a join() call where one core is just chugging along. This is exacerbated when I split up the algorithms into its own set of cores - for instance, I copy the algorithm 1 core and run 4 sets of data, do this over 112 cores.
Basically -- do you have other ideas on what I can try when working with a hefty algorithm + constant stream of data? I have plenty of CPUs, I just don't know how to utilize them properly.
Anonymous
It's a class that contains a bank of filters The 20 MB filters are applied to the data and then try to classify the data So I was copying this class across all my different processes, but when the processes were attached across all the CPUs, I guess the context switching and copying over the data onto each CPU was too much and it went from a 2 minute execution to a 6 hour execution. When I limit to only 2 CPUs via taskset -c $1 and run it a bunch of times it goes super fast.
Would it be that, because I have, say, 28 processes, it's shoving 20*28 MB into cache? Sorry, what is MPI? Well, most likely, no. Ah, no. I mean I don't wanna say what it is cuz you'll immediately just say "well that's your problem" and end it there lol But it's Python and honestly I've just been doing "run code & run code1 & run code2 & run code3 &" in bash so it has different processes. Er: taskset -c 1,2,3,4 python main.py & taskset -c 5,6,7,8 python main1.py & taskset -c 9,10,11,12 python main2.py & -- etc When I look at the system-monitor, it seems to properly use those CPUs I've set affinity for Oh! Speaking of - the thing I'm trying to do is basically set CPU affinity in my program itself. So when I spawn a process and immediately set the CPU affinity to just 1 CPU, but it started at, say, 112 CPUs.. does it immediately copy over every single instance of my class to the 112 CPUs? I was worried I'm copying data over when I shouldn't be. I do have it --- okay.. uh, would you be to determine if -- ah hold on
Is there a way for me to track if I am loading 20 MB into each core or if it's simply pulling from shared memory properly? I'm simply struggling at what I need to look up on the internet for these things. www ah alright alright. I do have about half of my code written in C and then I call the compiled C via Python, maybe I can just slap a memory util in there as well. Mmmm you're smart thanks I gotta do that
In Python <--> C, Python has a ctypes library where you point to a shared library and need to initialize your data types on input and output, then you can call the shared library function in Python.
Oh fuck your message is gone, I refreshed the tab, dammit >>21803 huh
VIPPER
wwwwww could always bake the filters into the C part's rodata if you want to have FUN with the good old ar(1) your executable will be fuckhueg but it’ll get copied and do lookups free
Anonymous
I am glad I have something to push forward with now though, cuz I was absolutely stumped. Thanks!
VIPPER
wow Zcopy access in python looks like a PITA amazing, i love programming