fffiloni (Sylvain Filoni)

upvoted an article 2 days ago

Article

Exploring the Daily Papers Page on Hugging Face

2 days ago

• 16

upvoted an article 5 days ago

Article

Introducing Community Tools on HuggingChat

9 days ago

• 26

upvoted 10 papers 5 days ago

Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Paper • 2409.09214 • Published 11 days ago • 44

SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer

Paper • 2409.08425 • Published 12 days ago • 9

StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation

Paper • 2409.12576 • Published 6 days ago • 14

FlexiTex: Enhancing Texture Generation with Visual Guidance

Paper • 2409.12431 • Published 6 days ago • 9

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

Paper • 2409.12957 • Published 6 days ago • 16

LVCD: Reference-based Lineart Video Colorization with Diffusion Models

Paper • 2409.12960 • Published 6 days ago • 20

upvoted an article 12 days ago

Article

"Diffusers Image Fill" guide

By

•

12 days ago

• 22

upvoted 2 papers 12 days ago

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Paper • 2409.06666 • Published 15 days ago • 52

Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models

Paper • 2409.07452 • Published 14 days ago • 19

upvoted a paper 13 days ago

VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos

Paper • 2409.07450 • Published 14 days ago • 10

upvoted 6 papers 16 days ago

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published 22 days ago • 31

FLUX that Plays Music

Paper • 2409.00587 • Published 24 days ago • 31

VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges

Paper • 2409.01071 • Published 23 days ago • 26

FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation

Paper • 2409.02245 • Published 22 days ago • 9

Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation

Paper • 2409.03718 • Published 20 days ago • 25

Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing

Paper • 2409.01322 • Published 23 days ago • 94

upvoted a paper 20 days ago

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

Paper • 2409.02634 • Published 21 days ago • 85

upvoted a paper 21 days ago

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

Paper • 2409.02095 • Published 22 days ago • 33

upvoted 3 papers 27 days ago

Kalman-Inspired Feature Propagation for Video Face Super-Resolution

Paper • 2408.05205 • Published Aug 9 • 8

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

Paper • 2408.15239 • Published 29 days ago • 27

MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

Paper • 2408.14211 • Published 30 days ago • 8

upvoted a paper 28 days ago

Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published 29 days ago • 120

upvoted 9 papers about 2 months ago

Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion

Paper • 2408.00458 • Published Aug 1 • 10

TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models

Paper • 2408.00735 • Published Aug 1 • 15

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 104

MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models

Paper • 2408.01337 • Published Aug 2 • 10

TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

Paper • 2408.01291 • Published Aug 2 • 11

ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer

Paper • 2408.03284 • Published Aug 6 • 9

Facing the Music: Tackling Singing Voice Separation in Cinematic Audio Source Separation

Paper • 2408.03588 • Published Aug 7 • 6

Fast Sprite Decomposition from Animated Graphics

Paper • 2408.03923 • Published Aug 7 • 7

Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches

Paper • 2408.04567 • Published Aug 8 • 23

upvoted an article about 2 months ago

Article

A Complete Guide to Audio Datasets

Dec 15, 2022

• 16

upvoted 13 papers about 2 months ago

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Paper • 2403.14610 • Published Mar 21 • 3

Animate3D: Animating Any 3D Model with Multi-view Video Diffusion

Paper • 2407.11398 • Published Jul 16 • 8

Kinetic Typography Diffusion Model

Paper • 2407.10476 • Published Jul 15 • 1

Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle

Paper • 2407.19548 • Published Jul 28 • 22

Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models

Paper • 2407.19474 • Published Jul 28 • 22

Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture

Paper • 2407.19593 • Published Jul 28 • 12

Artist: Aesthetically Controllable Text-Driven Stylization without Training

Paper • 2407.15842 • Published Jul 22 • 13

AccDiffusion: An Accurate Method for Higher-Resolution Image Generation

Paper • 2407.10738 • Published Jul 15 • 3

DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors

Paper • 2407.16260 • Published Jul 23 • 1

SHIC: Shape-Image Correspondences with no Keypoint Supervision

Paper • 2407.18907 • Published Jul 26 • 38

Text2Place: Affordance-aware Text Guided Human Placement

Paper • 2407.15446 • Published Jul 22 • 2

BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation

Paper • 2407.17952 • Published Jul 25 • 27

Floating No More: Object-Ground Reconstruction from a Single Image

Paper • 2407.18914 • Published Jul 26 • 18

upvoted 9 papers 2 months ago

EVLM: An Efficient Vision-Language Model for Visual Understanding

Paper • 2407.14177 • Published Jul 19 • 42

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds

Paper • 2407.01494 • Published Jul 1 • 13

PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation

Paper • 2407.02869 • Published Jul 3 • 18

Video-to-Audio Generation with Hidden Alignment

Paper • 2407.07464 • Published Jul 10 • 16

Still-Moving: Customized Video Generation without Customized Video Data

Paper • 2407.08674 • Published Jul 11 • 11

Video Diffusion Alignment via Reward Gradients

Paper • 2407.08737 • Published Jul 11 • 47

Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity

Paper • 2407.10387 • Published Jul 15 • 6

IMAGDressing-v1: Customizable Virtual Dressing

Paper • 2407.12705 • Published Jul 17 • 12

The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation

Paper • 2407.12579 • Published Jul 17 • 1

Sylvain Filoni

AI & ML interests

Articles

Breaking Barriers: The Critical Role of Art and Design in Advancing AI Capabilities

Organizations

fffiloni's activity

Exploring the Daily Papers Page on Hugging Face

Introducing Community Tools on HuggingChat

"Diffusers Image Fill" guide

A Complete Guide to Audio Datasets