Learning to Rank (LtR) with Infer.NET

A Bayesian learning to rank implementation using Microsoft's Infer.NET probabilistic programming framework. This project implements pairwise preference learning using a linear model based on the TrueSkill/Thurstonian ranking approach.

⚠️ SCALABILITY WARNING: This implementation has severe performance limitations for large queries. The prediction algorithm uses exponential-time recursive computation (O(2^n)) that becomes intractable for queries with more than 10-15 items. For queries with 40+ items (common in real datasets), prediction may stall or take hours to complete. Consider using this only for small-scale experiments or queries with very few items per query.

Overview

This solution provides two command-line applications for learning to rank:

TrainLtR: Trains a Bayesian ranking model from pairwise preference data
PredictLtR: Generates rank distribution predictions for new queries

The model learns pairwise preferences using a linear model where ties are not supported. It's particularly suitable for scenarios where you have explicit pairwise comparisons between items within queries.

Prerequisites

.NET 8.0 SDK or later
Infer.NET framework (v0.4.2504.701) - GitHub (open source, MIT license)

Platform Support

✅ Cross-platform: Windows, Linux, and macOS
✅ Modern tooling: Visual Studio 2022+, VS Code, or JetBrains Rider

Note: This solution has been modernized from .NET Framework 4.6.1 to .NET 8 with the latest Infer.NET framework.

Algorithm

Graphical Model

The implementation uses a modified TrueSkill/Thurstonian ranking model with observed feature vectors. The model learns:

Feature weights (w): Linear combination weights for ranking features
Noise parameters: Uncertainty in pairwise comparisons

For K training examples, each with n items, the model generates m = n-1 pairwise preference observations.

Learning Process

Feature extraction: Convert items to feature vectors
Pairwise comparison: Generate all possible item pairs within each query
Bayesian inference: Learn feature weights using variational message passing
Ranking prediction: Compute probability distributions over possible rankings

Quick Start

Building

git clone <repository-url>
cd LearningToRank
dotnet build

Running

# Train a model
dotnet run --project TrainLtR -- data/train.small.ltr model.json

# Generate predictions
dotnet run --project PredictLtR -- model.json data/predict.ltr predictions.csv

IDE Support

Visual Studio 2022+: Open LearningToRank.sln
VS Code: Install C# extension and open the folder
JetBrains Rider: Open the solution file

Data Format

Input Format (SVM-Light)

Training and prediction data must be in SVM-Light format:

<rank> qid:<query_id> <feature_id>:<feature_value> ... <feature_id>:<feature_value>

Example:

1 qid:1 1:0.5 2:1.0 3:0.2
2 qid:1 1:0.3 2:0.8 3:0.9
1 qid:2 1:0.7 2:0.1 3:0.4

Requirements:

rank: Integer ranking (lower = better)
qid: Query identifier (groups items for comparison)
feature_id: Must start from 1 (not 0)
feature_value: Floating-point feature value

Output Format (CSV)

Predictions are written to CSV with rank distribution probabilities:

QueryIndex,ItemIndex,Rank0,Rank1,Rank2,Rank3,Rank4,Rank5,Rank6,Rank7,Rank8,Rank9
0,0,0.500222,0.499778,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
0,1,0.499778,0.500222,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000

Usage

Training

dotnet run --project TrainLtR -- <training_data.ltr> <output_model.json>

Example:

dotnet run --project TrainLtR -- data/train.small.ltr model.json

Prediction

dotnet run --project PredictLtR -- <model.json> <prediction_data.ltr> [output.csv]

Examples:

# Use default output filename (predictions.csv)
dotnet run --project PredictLtR -- model.json data/predict.ltr

# Specify custom output filename
dotnet run --project PredictLtR -- model.json data/predict.ltr my_results.csv

Data Preparation Tips

For 2-rank queries: Shuffle items within each query for better pairwise learning
For multi-rank queries: Sort items by rank within each query
Feature IDs: Must start from 1 (not 0)
Query grouping: Items with the same qid are compared pairwise

Output Interpretation

CSV format: Each row represents one item's rank distribution
Probabilities: Sum to 1.0 for each item across all possible ranks
Ranking: Lower rank numbers indicate better positions (rank 0 = best)

Sample Data

The /data folder contains example datasets:

train.small.ltr - Small training set for testing
predict.ltr - Prediction dataset (LETOR MQ2008)
test.small.ltr - Small test set
test.sorted.ltr - Sorted test data

Note: Most datasets are from the LETOR MQ2008 benchmark collection.

Performance Considerations

Scalability Limitations

Query size: Optimal for 2-10 items per query
Maximum recommended: 15 items per query
Avoid: Queries with 40+ items (exponential slowdown)

Optimization Tips

Use smaller query sizes when possible
Consider data preprocessing to reduce query complexity
Monitor memory usage for large datasets
Use progress reporting to track long-running predictions

Technical Details

Modernization (v2.0)

This solution has been modernized from .NET Framework 4.6.1:

Component	Old Version	New Version
Framework	.NET Framework 4.6.1	.NET 8
Infer.NET	0.3.1810.501	0.4.2504.701
Project Format	Legacy .csproj	SDK-style
Serialization	BinaryFormatter	JSON
Dependencies	packages.config	PackageReference

Breaking Changes

Model format: Now uses JSON instead of binary
Command line: Updated for .NET CLI
Requirements: .NET 8 SDK required

The core Bayesian learning to rank algorithm remains unchanged, ensuring compatibility with the original implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
PredictLtR		PredictLtR
TrainLtR		TrainLtR
data		data
img		img
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
LearningToRank.sln		LearningToRank.sln
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Learning to Rank (LtR) with Infer.NET

Overview

Prerequisites

Platform Support

Algorithm

Graphical Model

Learning Process

Quick Start

Building

Running

IDE Support

Data Format

Input Format (SVM-Light)

Output Format (CSV)

Usage

Training

Prediction

Data Preparation Tips

Output Interpretation

Sample Data

Performance Considerations

Scalability Limitations

Optimization Tips

Technical Details

Modernization (v2.0)

Breaking Changes

About

Uh oh!

Releases 1

Packages

Languages

License

usptact/LearningToRank

Folders and files

Latest commit

History

Repository files navigation

Learning to Rank (LtR) with Infer.NET

Overview

Prerequisites

Platform Support

Algorithm

Graphical Model

Learning Process

Quick Start

Building

Running

IDE Support

Data Format

Input Format (SVM-Light)

Output Format (CSV)

Usage

Training

Prediction

Data Preparation Tips

Output Interpretation

Sample Data

Performance Considerations

Scalability Limitations

Optimization Tips

Technical Details

Modernization (v2.0)

Breaking Changes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages