I write a sample rate conversion test program with CUDA, and it seems a little better than saracon in 44->48KHz mode

Mar 5, 2012 at 10:00 PM Thread Starter Post #1 of 15

cxn4689

New Head-Fier
Joined
Dec 29, 2011
Posts
8
Likes
0
I found I was boring recently and I have searched some research papers and the web site about SRC of standford. And I decided to write a test program for fun by using C and CUDA. I use the classic sinc function as my interpolation function and blackman window function as a truncating function. This program is accelerated by GPU with CUDA. Each window contains 256 samples. The test is done by RMAA. The basic 44.1KHz test singal file is generated by RMAA and the converted signal file is analyzed by RMAA too. I only test the 44.1KHz to 48 KHz conversion. And the THD and IDM distortion is a little better than Weiss Saracon. And the only problem is that my program takes 7seconds while saracon only takes 2 seconds. The result is shown below:

 
Mar 5, 2012 at 10:41 PM Post #2 of 15
Very interesting. How would one implement it on an audio player or on system wide output?
 
Mar 5, 2012 at 11:41 PM Post #4 of 15
Oh, you're making it for foobar2000? That's simply fantastic :)
 
Out of curiosity, does the number of compute units only affect SRC performance or also SRC output quality?
 
Mar 6, 2012 at 1:54 AM Post #5 of 15
In theory, sound signal is bandlimited! So, it can be precisely reconstructed by using convolution with sinc function. Unfortunately, this convolution can not be implemented for it needs infinte input sample. So, we need a window function to truncate the input to make it computable. So, despite the float precision, the more samples we compute, the higher quality we get. With GPU's help, I can compute with more samples while the computing time remains the same. I want to implement this GPU method for foobar to do real-time src. And I am still studying the foobar2000 dsp plug-in programming.
 
Mar 6, 2012 at 2:05 AM Post #6 of 15
Hum, so in the event of a massive sample input, wouldn't that introduce resource spikes? But even so, those wouldn't be exactly noticeable since CPU usage would still be minimal with only the GPU loading up. Although I'm still not grasping the concept of how large can the sample base input be.
 
But this is definitely a thread I'll be keeping a watchful eye. Kudos on such an idea :)
 
BTW, do you think it would be viable to do a SRC based on OpenCL? I'm just asking out of curiosity, since I wouldn't exactly need it due to having CUDA hardware with me, but besides OpenCL opening it up for just about any capable GPU, it could also be used through CPUs with the aid of respective libraries.
 
Mar 6, 2012 at 2:21 AM Post #7 of 15
In my current implememtation, I compute each new sample with 256 neighboring  input samples. So the input lenght is not very long and the result is satisfied. Actually, any CUDA program can be rewritten to a equivalent openCL version. The parallism method I use is based on each new sampling point and the thread dimension is divied based on this. This program is not opimised yet. If you like, I can email you a CUDA source code that is developed under visual studio 2010+CUDA 4.0.
 
Mar 6, 2012 at 8:45 AM Post #8 of 15
Sounds like a plan! Any chance of getting it to work with Reclock please? more infos at http://forum.slysoft.com/showthread.php?t=37493
 
I can run comparisons using WaveSpectra later on if you like: http://forum.slysoft.com/showpost.php?p=227357&postcount=3996
 
Mar 6, 2012 at 11:56 AM Post #9 of 15
Sorry for taking so long to reply. I could take a look at it, although it might be too much for me. Check your PM.
 
Mar 27, 2012 at 7:12 AM Post #10 of 15


Quote:
Sounds like a plan! Any chance of getting it to work with Reclock please? more infos at http://forum.slysoft.com/showthread.php?t=37493
 
I can run comparisons using WaveSpectra later on if you like: http://forum.slysoft.com/showpost.php?p=227357&postcount=3996



Maybe it's possible. My design is working on stream mode and the interface is independent of foobar2000. The latest version of opencl for foobar2K can be download here : http://sharesend.com/bf93z (support AMD or nVIDIA opencl compitable GPU)
If you want to see the complete source code (develop under VS2010), please contact me.
 
 
Mar 30, 2012 at 1:47 PM Post #14 of 15
If you could ever get it working in Reclock, this would warrant (hundred) thousands of users in the snap of a finger :)
 
You could also ask for infos on how to get it working in this thread: http://forum.slysoft.com/showthread.php?t=37493
 
Thanks!
 
Mar 30, 2012 at 2:21 PM Post #15 of 15


Quote:
If you could ever get it working in Reclock, this would warrant (hundred) thousands of users in the snap of a finger :)
 
You could also ask for infos on how to get it working in this thread: http://forum.slysoft.com/showthread.php?t=37493
 
Thanks!



Having a foobar2000 component already warrants thousands of users, but I see where you're going :) The wider the userbase, the better. I just think the component should reach a stable status before being ported to other platforms, and there are still some kinks to work out.
 

Users who are viewing this thread

Back
Top