Get Started
This guide will help you get started with Fine-Tuning models. Our Fine-Tuning feature enables you to train popular LLMs on any publicly available dataset hosted on Hugging Face. Follow the steps below to launch and manage a fine-tuning job.
Step 1: Select Base Model
You can choose from a number of of Open Source base models to fine tune.
Step 2: Add Dataset
Specify the Hugging Face dataset path you’d like to use for fine-tuning. There are two types of datasets:
- Training: Dataset to train your model against
- Validation (optional): Dataset to validate the model against unseen data that is outside of the training loop.
Requirements
Type | Description |
---|---|
File Type | .jsonl (JSON Lines) or .csv |
Columns | Must include a column named text (case-sensitive) with the training data |
Source
- HuggingFace dataset: Full path (example
imdatta0/ultrachat_1k
)
Step 3: Output Job Details
Provide the following:
- Suffix: A custom identifier added to your Job ID for easier tracking.
- Registry: Choose the model registry where the fine-tuned model will be saved for future deployment.
- Seed: Set a random seed for reproducibility (defaults to 42).
Step 4: Configure Hyperparameters
Each hyperparameter comes with a default value, which you can adjust:
Hyperparameter | Description |
---|---|
Batch size | Number of training samples used in one forward/backward pass. |
Learning Rate | Controls the step size during optimization. |
Number of Epochs | Total number of times the model will iterate over the full training set. |
Warmup Ratio | Proportion of training steps to gradually increase the learning rate. |
Weight Decay | Regularization to prevent overfitting by penalizing large weights. |
Step 5: Set LoRA Parameters
Parameter | Description |
---|---|
LoRA Rank | Dimension of the low-rank decomposition |
LoRA Alpha | Scaling factor for LoRA updates. |
LoRA Dropout | Dropout rate applied to LoRA layers. |
Step 6: Launch the Job
Once a job is launched, it will automatically start the training run. These runs can vary in time, depending on the configurations set.
You can track the progress of your job by monitoring its status as it transitions from Initializing
to Running
to Completed
.
Step 8: Post-Job Artifacts
Once the job is completed:
- Navigate to the job's details view to view the list of Checkpoints created for epochs.
- For each checkpoint, you’ll see:
- Training Loss: The error (or loss) the model incurred while learning from the training dataset. A decreasing training loss generally indicates the model is learning, but if it’s very low compared to validation loss, it might be overfitting.
- Validation Loss: Visible only if a validation dataset is provided. The error computed using the validation dataset (unseen during training) is an indicator of how well the model will perform on real-world or unseen data.
- From the selected checkpoint, you can either deploy the fine tuned model by:
- Navigating to the Model Registry where the model weights becomes
Available
in your chosen Registry and location, or - Deploy the model directly to an Endpoint.
- Navigating to the Model Registry where the model weights becomes
Registered model weights appear under Model Registry > Model > Versions. Know more about Model Registry. From here, you can deploy the fine-tuned model to Endpoints for inference.