dmk_meow · @dmk_meow
175 followers · 245 posts · Server pawoo.net
arXiv Sound & Audio🔊 · @arxiv_sd
39 followers · 621 posts · Server creative.ai

📝 Large-Scale Automatic Audiobook Creation 🔊👾🧠

"Leverages recent advances in neural text-to-speech and text summarization and allows users to customize an audiobook's speaking style and speed using a small amount of speech samples." [gal30b+] 🤖

🔗 arxiv.org/abs/2309.03926v1

#sd #ai #dc #DL #lg #arxiv

Last updated 1 year ago

char · @char
67 followers · 973 posts · Server ioc.exchange

After updating some code that was using libudev to the more modern API replacement sd-device, part of systemd; I wrote a simple example code and a post; just in case you are interested on this.

dev.to/carvilsi/linux-monitor-

-device

#linux #usb #systemd #sd #c

Last updated 1 year ago

arXiv Sound & Audio🔊 · @arxiv_sd
39 followers · 620 posts · Server creative.ai

📝 Multiple Representation Transfer From Large Language Models to End-to-End ASR Systems 📚🔊

"Transferring multiple representations of large language models improves the end-to-end ASR performance by up to 15% relative CER compared to transferring only a single representation." [gal30b+] 🤖

🔗 arxiv.org/abs/2309.04031v1

#cl #sd #arxiv

Last updated 1 year ago

dmk_meow · @dmk_meow
174 followers · 244 posts · Server pawoo.net
dmk_meow · @dmk_meow
173 followers · 243 posts · Server pawoo.net
dmk_meow · @dmk_meow
173 followers · 241 posts · Server pawoo.net
dmk_meow · @dmk_meow
173 followers · 240 posts · Server pawoo.net
福龍 · @qing_fulong
0 followers · 1 posts · Server pawoo.net

【Blow Up】
唇が導火線みたいさ
もう制御できない──。


【腐】
#SD腐

pixiv.net/novel/show.php?id=20

#SDBL #sd #リョ三

Last updated 1 year ago

arXiv Sound & Audio🔊 · @arxiv_sd
39 followers · 619 posts · Server creative.ai

📝 Implicit Design Choices and Their Impact on Emotion Recognition Model Development and Evaluation 🧠📚🔊

"Emotion recognition from visual and audio data is accomplished by using a convolutional neural network (CNN) and a recurrent neural network (RNN), respectively, to encode the two modalities." [gal30b+] 🤖

🔗 arxiv.org/abs/2309.03238v1

#lg #cl #sd #arxiv

Last updated 1 year ago

dmk_meow · @dmk_meow
173 followers · 239 posts · Server pawoo.net
arXiv Sound & Audio🔊 · @arxiv_sd
39 followers · 618 posts · Server creative.ai

📝 Zero-Shot Audio Captioning via Audibility Guidance 🔊📚

"A caption is generated by combining a large pre-trained language model, such as GPT-2, with a multimodal matching model, which scores how well a text matches the input audio and a text classifier which provides the guidance for audibility." [gal30b+] 🤖

🔗 arxiv.org/abs/2309.03884v1

#sd #cl #arxiv

Last updated 1 year ago

arXiv Sound & Audio🔊 · @arxiv_sd
39 followers · 618 posts · Server creative.ai

📝 Spiking Structured State Space Model for Monaural Speech Enhancement 🔊🔭

"Spiking-S4 merges energy efficiency of Spiking Neural Networks (SNN) and long-range sequence modeling capabilities of Structured State Space Models (S4)." [gal30b+] 🤖

🔗 arxiv.org/abs/2309.03641v1

#sd #cv #arxiv

Last updated 1 year ago

arXiv Sound & Audio🔊 · @arxiv_sd
39 followers · 615 posts · Server creative.ai

📝 Highly Controllable Diffusion-Based Any-to-Any Voice Conversion Model with Frame-Level Prosody Feature 🔊

"Utilizes a prosody conditioning module to transfer frame-level prosody and a post-processing step which allows improved controllability of speaking rate in any-to-any voice conversion." [gal30b+] 🤖

🔗 arxiv.org/abs/2309.03364v1

#sd #arxiv

Last updated 1 year ago

arXiv Sound & Audio🔊 · @arxiv_sd
39 followers · 615 posts · Server creative.ai

📝 Presenting the SWTC: A Symbolic Corpus of Themes From John Williams' Star Wars Episodes I-Ix 🔊

"It’s a symbolic corpus based on music scores, and it's made of musical themes from the complete Star Wars trilogies (Episodes I-IX) by John Williams." [gal30b+] 🤖

🔗 arxiv.org/abs/2309.03298v1

#sd #sc #arxiv

Last updated 1 year ago

arXiv Sound & Audio🔊 · @arxiv_sd
38 followers · 613 posts · Server creative.ai

📝 RoDia: A New Dataset for Romanian Dialect Identification From Speech 📚🔊

"Introduces RoDia, a Romanian dialect identification dataset consisting of 2 hours of transcribed spoken data covering five dialects, including both urban and rural environments, and propose multiple deep learning models to be used as baselines." [gal30b+] 🤖

⚙️ github.com/codrut2/RoDia
🔗 arxiv.org/abs/2309.03378v1

#cl #sd #arxiv

Last updated 1 year ago

arXiv Sound & Audio🔊 · @arxiv_sd
38 followers · 613 posts · Server creative.ai

📝 Parameter Efficient Audio Captioning with Faithful Guidance Using Audio-Text Shared Latent Representation 📚🔊

"We first present a data augmentation technique for generating audio captions which are not only relevant to the audio, but also, are semantically consistent with ground truth captions." [gal30b+] 🤖

🔗 arxiv.org/abs/2309.03340v1

#cl #mm #sd #arxiv

Last updated 1 year ago

dmk_meow · @dmk_meow
173 followers · 239 posts · Server pawoo.net
arXiv Sound & Audio🔊 · @arxiv_sd
38 followers · 609 posts · Server creative.ai

📝 BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network 🔊🧠

"By finding the optimal projection for discriminating between real and fake data in the feature space, it can improve the performance of GAN-based vocoders with small modifications, such as BigVGAN." [gal30b+] 🤖

⚙️ github.com/sony/bigvsan
🔗 arxiv.org/abs/2309.02836v1

#sd #lg #arxiv

Last updated 1 year ago

arXiv Sound & Audio🔊 · @arxiv_sd
38 followers · 608 posts · Server creative.ai

📝 Self-Supervised Disentanglement of Harmonic and Rhythmic Features in Music Audio Signals 🔊

"A variational autoencoder that generates an audio mel-spectrogram from two latent features representing the rhythmic and harmonic content, respectively, and is trained to reconstruct the input mel-spectrogram given its pitch-shifted version." [gal30b+] 🤖

⚙️ github.com/WuYiming6526/HARD-D
🔗 arxiv.org/abs/2309.02796v1

#sd #arxiv

Last updated 1 year ago