I'm writing a program in which I allow the user to manually manipulate data on screen. I plot a histogram for a data set on screen, and then I allow the user to graphically normalize it. This boils down to a for loop that executes with each pixel by pixel movement of the mouse, transforming an array of about 6000-10000 data points after each mouse move. This has shown to be fairly cpu intensive, but only on a single core for any of the machines I've tested it on. The machines I intend to use it on range from 2 cores to 8.
What options do I have for speeding up my program and spreading this load across all available processors/cores?