Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-training for alternate samplerate #83

Open
mrdeveloperdude opened this issue May 27, 2022 · 3 comments
Open

Re-training for alternate samplerate #83

mrdeveloperdude opened this issue May 27, 2022 · 3 comments

Comments

@mrdeveloperdude
Copy link

mrdeveloperdude commented May 27, 2022

I noticed that CREPE seems to be trained on sample data in 16kHz. I want to use this in an open source karaoke software I am working on which is designed to run on limited embedded hardware and so doing sample rate conversion on the fly is an expense I would like to avoid. I thought that it would be easy to fix by simply retraining CREPE by using my native samplerate instead (44.1kHz), but I could not find any code to train the models.

Was this left out on purpose? Is it available someplace?

Thanks!

@jongwook
Copy link
Member

Hi, the training code is available at: https://github.com/jongwook/crepe but it is even less maintained than this one.

One hack you could do is, just subsample every 3 (or 2) samples effectively making a 14.7 kHz (or 22.05 kHz) audio, run them through one of the pretrained CREPE models, and scale the frequency estimate by 16/14.7 (or 16/22.05) to get the actual frequencies. This assumes that there are negligible energy in the above-Nyquist band (which is generally true and worked okay for the web demo), and also that the accuracy is not terribly impacted by the frequency scaling.

@PratikStar
Copy link

Isn't the original model trained for 16k sample rate and NOT 44.1kHz?

@Laubeee
Copy link

Laubeee commented Sep 27, 2024

Thanks for sharing your training code
Could you say something about:

  • the usage of the NSynth dataset: was the full dataset used? only the notes in the range? only parts of the files or any other filtering methods (as towards the end the files are often just silence)?
  • I see there is an option to use noise and pitch-shift augmentations, were they used in the final model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants