Get Started

This guide will help you get started with Fine-Tuning models. Our Fine-Tuning feature enables you to train popular LLMs on any publicly available dataset hosted on Hugging Face. Follow the steps below to launch and manage a fine-tuning job.

Step 1: Select Base Model

You can choose from a number of of Open Source base models to fine tune.

Step 2: Add Dataset

Specify the Hugging Face dataset path you’d like to use for fine-tuning. There are two types of datasets:

Training: Dataset to train your model against
Validation (optional): Dataset to validate the model against unseen data that is outside of the training loop.

Requirements

Type	Description
File Type	`.jsonl` (JSON Lines) or `.csv`
Columns	Must include a column named `text` (case-sensitive) with the training data

Source

HuggingFace dataset: Full path (example imdatta0/ultrachat_1k)

Step 3: Output Job Details

Provide the following:

Suffix: A custom identifier added to your Job ID for easier tracking.
Registry: Choose the model registry where the fine-tuned model will be saved for future deployment.
Seed: Set a random seed for reproducibility (defaults to 42).

Step 4: Configure Hyperparameters

Each hyperparameter comes with a default value, which you can adjust:

Hyperparameter	Description
Batch size	Number of training samples used in one forward/backward pass.
Learning Rate	Controls the step size during optimization.
Number of Epochs	Total number of times the model will iterate over the full training set.
Warmup Ratio	Proportion of training steps to gradually increase the learning rate.
Weight Decay	Regularization to prevent overfitting by penalizing large weights.

Step 5: Set LoRA Parameters

Parameter	Description
LoRA Rank	Dimension of the low-rank decomposition
LoRA Alpha	Scaling factor for LoRA updates.
LoRA Dropout	Dropout rate applied to LoRA layers.

Step 6: Launch the Job

Once a job is launched, it will automatically start the training run. These runs can vary in time, depending on the configurations set.

You can track the progress of your job by monitoring its status as it transitions from Initializing to Running to Completed.

Step 8: Post-Job Artifacts

Once the job is completed:

Navigate to the job's details view to view the list of Checkpoints created for epochs.
For each checkpoint, you’ll see:
- Training Loss: The error (or loss) the model incurred while learning from the training dataset. A decreasing training loss generally indicates the model is learning, but if it’s very low compared to validation loss, it might be overfitting.
- Validation Loss: Visible only if a validation dataset is provided. The error computed using the validation dataset (unseen during training) is an indicator of how well the model will perform on real-world or unseen data.
From the selected checkpoint, you can either deploy the fine tuned model by:
- Navigating to the Model Registry where the model weights becomes Available in your chosen Registry and location, or
- Deploy the model directly to an Endpoint.

Registered model weights appear under Model Registry > Model > Versions. Know more about Model Registry. From here, you can deploy the fine-tuned model to Endpoints for inference.

Get Started

Step 1: Select Base Model​

Step 2: Add Dataset​

Step 3: Output Job Details​

Step 4: Configure Hyperparameters​

Step 5: Set LoRA Parameters​

Step 6: Launch the Job​

Step 8: Post-Job Artifacts​

Contents