Skip to content

dawidope/XetSharp

Repository files navigation

XetSharp

C# client for downloading files from Hugging Face using the Xet protocol via Rust FFI.

Quick Start

using System.Net.Http.Json;
using System.Text.Json.Serialization;
using XetSharp;

var repo = "dawidope/testrepo";
using var http = new HttpClient();
using var cts = new CancellationTokenSource();

// 1. Get file list (recursive) — files with xetHash use Xet protocol
var tree = await http.GetFromJsonAsync<List<HfTreeItem>>(
    $"https://huggingface.co/api/models/{repo}/tree/main?recursive=true");

var xetFiles = tree!.Where(f => f.XetHash != null).ToList();

// 2. Get temporary CAS access token (no HF account needed for public repos)
var token = await http.GetFromJsonAsync<XetTokenResponse>(
    $"https://huggingface.co/api/models/{repo}/xet-read-token/main");

// 3. Download via Xet protocol
var options = new XetClientOptions
{
    Endpoint = token!.Endpoint,
    Token = token.AccessToken,
    TokenExpiry = token.Expiration,
    // Optional: tune memory usage (default ~6GB) and progress granularity
    // MaxConcurrentDownloads = 4,
    // DownloadBufferSize = 512_000_000,        // 512 MB base buffer
    // DownloadBufferLimit = 1_500_000_000,      // 1.5 GB hard cap
    // ReconstructionFetchSize = 32_000_000,     // 32 MB blocks for smoother progress
};

using var client = new XetClient(options);

var downloads = xetFiles.Select(f => new XetFileDownload
{
    Hash = f.XetHash!,
    FileSize = f.Size,
    DestinationPath = f.Path!,
}).ToList();

var progress = new Progress<XetProgress>(p =>
{
    double pct = p.TotalBytes > 0 ? (double)p.BytesCompleted / p.TotalBytes * 100 : 0;
    Console.Write($"\r[{pct:F1}%] {p.BytesCompleted / 1024 / 1024:F1} MB");
});

try
{
    await client.DownloadAsync(downloads, progress, cts.Token);
}
catch (OperationCanceledException)
{
    Console.WriteLine("\nDownload cancelled.");
}

// --- HF API models ---
record HfTreeItem
{
    [JsonPropertyName("type")]  public string? Type { get; init; }
    [JsonPropertyName("path")]  public string? Path { get; init; }
    [JsonPropertyName("size")]  public long Size { get; init; }
    [JsonPropertyName("xetHash")] public string? XetHash { get; init; }
}

record XetTokenResponse
{
    [JsonPropertyName("casUrl")]      public string? Endpoint { get; init; }
    [JsonPropertyName("accessToken")] public string? AccessToken { get; init; }
    [JsonPropertyName("exp")]         public long Expiration { get; init; }
}

Note: The accessToken above is a temporary CAS token returned by the HF API — it's not your personal HF token. Public repos work without authentication. For private repos, add your HF token as a Bearer header on the HTTP requests.

Features

  • Parallel downloads — multiple files downloaded concurrently via Xet protocol
  • Aggregated progressIProgress<XetProgress> reports combined progress across all files with per-file breakdown
  • CancellationCancellationToken support, download aborts within ~100ms
  • Token refresh — callback for expired HF tokens
  • Memory tuning — configurable buffer sizes and concurrency via XetClientOptions

Memory & Performance Tuning

xet-core uses aggressive buffering by default for maximum throughput. You can tune via XetClientOptions:

Option Default Effect
MaxConcurrentDownloads 8 Parallel file downloads
DownloadBufferSize ~2GB Base memory buffer
DownloadBufferPerFileSize ~512MB Additional buffer per file
DownloadBufferLimit ~8GB Hard memory cap
ReconstructionFetchSize ~256MB Fetch block size (smaller = smoother progress)
PrefetchBufferSize ~1GB Prefetch lookahead

Default memory usage: 2GB + 8 × 512MB = 6GB. For a ~1.5GB footprint:

var options = new XetClientOptions
{
    MaxConcurrentDownloads = 4,
    DownloadBufferSize = 512_000_000,
    DownloadBufferPerFileSize = 128_000_000,
    DownloadBufferLimit = 1_500_000_000,
    ReconstructionFetchSize = 32_000_000,
    PrefetchBufferSize = 64_000_000,
};

These options are process-global — the first XetClient instance sets them.

Building

Prerequisites

Build everything (Rust + C#)

./build.ps1

Options:

./build.ps1                        # Full Release build
./build.ps1 -Configuration Debug   # Debug build
./build.ps1 -SkipRust              # C# only (reuse existing native DLL)

Manual build

# 1. Rust native library
cd native/hf_xet_ffi
cargo build --release

# 2. C# solution
dotnet build

Run example

cd XetSharp.Example
dotnet run -- dawidope/testrepo
dotnet run -- dawidope/testrepo model-00002-of-00002.safetensors
dotnet run -- dawidope/testrepo "" C:\Downloads

Project Structure

XetSharp/
├── XetSharp.slnx                 # Solution
├── build.ps1                     # Build script (Rust + C#)
├── XetSharp/                     # C# library
│   ├── XetSharp.csproj
│   ├── XetClient.cs              # High-level API
│   ├── XetFileDownload.cs        # File info for download
│   ├── XetProgress.cs            # Progress data
│   └── Native/
│       ├── NativeMethods.cs      # P/Invoke declarations
│       └── NativeTypes.cs        # Marshaling structs
├── XetSharp.Example/             # Example console app
│   ├── XetSharp.Example.csproj
│   └── Program.cs
└── native/hf_xet_ffi/           # Rust FFI crate
    ├── Cargo.toml
    └── src/
        ├── lib.rs                # extern "C" exports
        ├── runtime.rs            # Tokio runtime
        ├── progress.rs           # Progress aggregation bridge
        ├── token.rs              # Token refresh bridge
        └── error.rs              # Error handling

Architecture

XetSharp is a thin wrapper — it provides:

  • XetClient.DownloadAsync() — download files via Xet protocol (chunked, dedup, parallel)
  • Progress reporting — via IProgress<XetProgress> with per-file aggregation
  • Cancellation — via CancellationToken with native FFI cancellation flag
  • Token refresh — callback for expired HF tokens

Everything else (HF API calls, file caching, HTTP downloads for small files) is your responsibility.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors