I write a sample rate conversion test program with CUDA, and it seems a little better than saracon in 44->48KHz mode

cxn4689 · Mar 5, 2012 at 10:00 PM

I found I was boring recently and I have searched some research papers and the web site about SRC of standford. And I decided to write a test program for fun by using C and CUDA. I use the classic sinc function as my interpolation function and blackman window function as a truncating function. This program is accelerated by GPU with CUDA. Each window contains 256 samples. The test is done by RMAA. The basic 44.1KHz test singal file is generated by RMAA and the converted signal file is analyzed by RMAA too. I only test the 44.1KHz to 48 KHz conversion. And the THD and IDM distortion is a little better than Weiss Saracon. And the only problem is that my program takes 7seconds while saracon only takes 2 seconds. The result is shown below:

Roller · Mar 5, 2012 at 10:41 PM

Very interesting. How would one implement it on an audio player or on system wide output?

cxn4689 · Mar 5, 2012 at 11:39 PM

I am still studying the foobar's plug-in programming......

Roller · Mar 5, 2012 at 11:41 PM

Oh, you're making it for foobar2000? That's simply fantastic

Out of curiosity, does the number of compute units only affect SRC performance or also SRC output quality?

cxn4689 · Mar 6, 2012 at 1:54 AM

In theory, sound signal is bandlimited! So, it can be precisely reconstructed by using convolution with sinc function. Unfortunately, this convolution can not be implemented for it needs infinte input sample. So, we need a window function to truncate the input to make it computable. So, despite the float precision, the more samples we compute, the higher quality we get. With GPU's help, I can compute with more samples while the computing time remains the same. I want to implement this GPU method for foobar to do real-time src. And I am still studying the foobar2000 dsp plug-in programming.

Roller · Mar 6, 2012 at 2:05 AM

Hum, so in the event of a massive sample input, wouldn't that introduce resource spikes? But even so, those wouldn't be exactly noticeable since CPU usage would still be minimal with only the GPU loading up. Although I'm still not grasping the concept of how large can the sample base input be.

But this is definitely a thread I'll be keeping a watchful eye. Kudos on such an idea

BTW, do you think it would be viable to do a SRC based on OpenCL? I'm just asking out of curiosity, since I wouldn't exactly need it due to having CUDA hardware with me, but besides OpenCL opening it up for just about any capable GPU, it could also be used through CPUs with the aid of respective libraries.

cxn4689 · Mar 6, 2012 at 2:21 AM

In my current implememtation, I compute each new sample with 256 neighboring input samples. So the input lenght is not very long and the result is satisfied. Actually, any CUDA program can be rewritten to a equivalent openCL version. The parallism method I use is based on each new sampling point and the thread dimension is divied based on this. This program is not opimised yet. If you like, I can email you a CUDA source code that is developed under visual studio 2010+CUDA 4.0.

leeperry · Mar 6, 2012 at 8:45 AM

Sounds like a plan! Any chance of getting it to work with Reclock please? more infos at http://forum.slysoft.com/showthread.php?t=37493

I can run comparisons using WaveSpectra later on if you like: http://forum.slysoft.com/showpost.php?p=227357&postcount=3996

Roller · Mar 6, 2012 at 11:56 AM

Sorry for taking so long to reply. I could take a look at it, although it might be too much for me. Check your PM.

cxn4689 · Mar 27, 2012 at 7:12 AM

Quote:

leeperry said:
Sounds like a plan! Any chance of getting it to work with Reclock please? more infos at http://forum.slysoft.com/showthread.php?t=37493

I can run comparisons using WaveSpectra later on if you like: http://forum.slysoft.com/showpost.php?p=227357&postcount=3996

Maybe it's possible. My design is working on stream mode and the interface is independent of foobar2000. The latest version of opencl for foobar2K can be download here : http://sharesend.com/bf93z (support AMD or nVIDIA opencl compitable GPU)
If you want to see the complete source code (develop under VS2010), please contact me.

leeperry · Mar 28, 2012 at 8:52 AM

I'm not a coder ^^

Roller · Mar 28, 2012 at 11:08 AM

Currently doing testing on a range of hardware.

cxn4689 · Mar 30, 2012 at 9:30 AM

I have optimized the thread mapping of opencl for nvidia GPU. This 003 version can be downloaded here: http://sharesend.com/t4js6

leeperry · Mar 30, 2012 at 1:47 PM

If you could ever get it working in Reclock, this would warrant (hundred) thousands of users in the snap of a finger

You could also ask for infos on how to get it working in this thread: http://forum.slysoft.com/showthread.php?t=37493

Thanks!

Roller · Mar 30, 2012 at 2:21 PM

Quote:

leeperry said:
If you could ever get it working in Reclock, this would warrant (hundred) thousands of users in the snap of a finger

You could also ask for infos on how to get it working in this thread: http://forum.slysoft.com/showthread.php?t=37493

Thanks!

Having a foobar2000 component already warrants thousands of users, but I see where you're going

The wider the userbase, the better. I just think the component should reach a stable status before being ported to other platforms, and there are still some kinks to work out.

Latest Thread Images

I write a sample rate conversion test program with CUDA, and it seems a little better than saracon in 44->48KHz mode

cxn4689

New Head-Fier

Roller

Headphoneus Supremus

cxn4689

New Head-Fier

Roller

Headphoneus Supremus

cxn4689

New Head-Fier

Roller

Headphoneus Supremus

cxn4689

New Head-Fier

leeperry

Galvanically isolated his brain

Roller

Headphoneus Supremus

cxn4689

New Head-Fier

leeperry

Galvanically isolated his brain

Roller

Headphoneus Supremus

cxn4689

New Head-Fier

leeperry

Galvanically isolated his brain

Roller

Headphoneus Supremus

Users who are viewing this thread