Unsloth + Llama3.2:3b 微調測試#
基本設定#
連接 Google Drive#
[1]:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
安裝套件#
[2]:
%%capture
import os, re
if "COLAB_" not in "".join(os.environ.keys()):
!pip install unsloth
else:
# Do this only in Colab notebooks! Otherwise use pip install unsloth
import torch; v = re.match(r"[0-9\.]{3,}", str(torch.__version__)).group(0)
xformers = "xformers==" + ("0.0.32.post2" if v == "2.8.0" else "0.0.29.post3")
!pip install --no-deps bitsandbytes accelerate {xformers} peft trl triton cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf "datasets>=3.4.1,<4.0.0" "huggingface_hub>=0.34.0" hf_transfer
!pip install --no-deps unsloth
!pip install transformers==4.55.4
使用 Unsloth 載入並初始化 4-bit 模型#
這段程式碼使用 Unsloth 框架載入 LLM
透過 FastLanguageModel.from_pretrained() 下載並初始化模型與 tokenizer
dtype=None 會自動依 GPU 選擇最佳精度 (FP16 / BF16)
load_in_4bit=True 表示使用 4-bit 量化,降低 VRAM 占用,加快載入與推論速度
fourbit_models 列出了官方預先量化的模型,下載更快、更省資源
[3]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
"unsloth/Meta-Llama-3.1-8B-bnb-4bit", # Llama-3.1 2x faster
"unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
"unsloth/Meta-Llama-3.1-70B-bnb-4bit",
"unsloth/Meta-Llama-3.1-405B-bnb-4bit", # 4bit for 405b!
"unsloth/Mistral-Small-Instruct-2409", # Mistral 22b 2x faster!
"unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
"unsloth/Phi-3.5-mini-instruct", # Phi-3.5 2x faster!
"unsloth/Phi-3-medium-4k-instruct",
"unsloth/gemma-2-9b-bnb-4bit",
"unsloth/gemma-2-27b-bnb-4bit", # Gemma 2x faster!
"unsloth/Llama-3.2-1B-bnb-4bit", # NEW! Llama 3.2 models
"unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
"unsloth/Llama-3.2-3B-bnb-4bit",
"unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
"unsloth/Llama-3.3-70B-Instruct-bnb-4bit" # NEW! Llama 3.3 70B!
] # More models at https://huggingface.co/unsloth
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/Llama-3.2-3B-Instruct", # or choose "unsloth/Llama-3.2-1B-Instruct"
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))== Unsloth 2025.9.1: Fast Llama patching. Transformers: 4.55.4.
\\ /| Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \ Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\ / Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
"-____-" Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
載入 PEFT#
FastLanguageModel.get_peft_model:這裡把 base model 套上 LoRA adapter
r: rank(越高越準、越耗記憶體);lora_alpha: scaling;lora_dropout: 避免過擬合;
target_modules: 針對 Attention/MLP 的哪些 Linear 層套 LoRA(依模型結構而定)
[4]:
model = FastLanguageModel.get_peft_model(
model,
r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
Unsloth 2025.9.1 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.
資料集準備#
[ ]:
!wget https://hsiangjenli.github.io/2025-it-help-ironman/_static/data/data.json -O /content/data.json
[5]:
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(
tokenizer,
chat_template = "llama-3.1",
)
def formatting_prompts_func(examples):
convos = examples["conversations"]
texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
return { "text" : texts, }
pass
[6]:
from datasets import load_dataset
from unsloth.chat_templates import standardize_sharegpt
dataset = load_dataset(
"json",
data_files={"train": "/content/data.json"},
split="train"
)
def to_conversations(example):
return {
"conversations": [
{"from": "human", "value": example["eng"]},
{"from": "gpt", "value": example["zh-tw"]},
]
}
dataset = dataset.map(to_conversations)
# 轉換成 standardize_sharegpt 格式,格式:("role","content")
dataset = standardize_sharegpt(dataset)
# 建立 chat template,與官方教學一致用 conversations 來產出 text 欄位
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("unsloth/llama-3.2-3b-bnb-4bit")
tokenizer = get_chat_template(tokenizer, chat_template="llama-3.1") # 依你的模型調整
def formatting_prompts_func(examples):
convos = examples["conversations"]
texts = [
tokenizer.apply_chat_template(
convo, tokenize=False, add_generation_prompt=False
)
for convo in convos
]
return {"text": texts}
dataset = dataset.map(formatting_prompts_func, batched=True, remove_columns=dataset.column_names)
[7]:
dataset[5]["text"]
[7]:
'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n**Iterators terminating on the shortest input sequence:**<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n**在最短輸入序列 (shortest input sequence) 處終止的疊代器:**<|eot_id|>'
訓練模型#
訓練參數設定#
[8]:
from trl import SFTConfig, SFTTrainer
from transformers import DataCollatorForSeq2Seq
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
packing = False, # Can make training 5x faster for short sequences.
args = SFTConfig(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 5,
# num_train_epochs = 1, # Set this for 1 full training run.
max_steps = 60,
learning_rate = 2e-4,
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
report_to = "none", # Use this for WandB etc
),
)
[9]:
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
trainer,
instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n",
response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n",
)
[10]:
# 查看轉換過後的結果
tokenizer.decode(trainer.train_dataset[5]["input_ids"])
[10]:
'<|begin_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n**Iterators terminating on the shortest input sequence:**<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n**在最短輸入序列 (shortest input sequence) 處終止的疊代器:**<|eot_id|>'
[11]:
space = tokenizer(" ", add_special_tokens = False).input_ids[0]
tokenizer.decode([space if x == -100 else x for x in trainer.train_dataset[5]["labels"]])
[11]:
' **在最短輸入序列 (shortest input sequence) 處終止的疊代器:**<|eot_id|>'
資源查看#
[12]:
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")
GPU = Tesla T4. Max memory = 14.741 GB.
3.07 GB of memory reserved.
開始微調#
[13]:
trainer_stats = trainer.train()
==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1
\\ /| Num examples = 74 | Num Epochs = 6 | Total steps = 60
O^O/ \_/ \ Batch size per device = 2 | Gradient accumulation steps = 4
\ / Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
"-____-" Trainable parameters = 24,313,856 of 3,237,063,680 (0.75% trained)
Unsloth: Will smartly offload gradients to save VRAM!
[60/60 02:35, Epoch 6/6]
Step | Training Loss |
---|---|
1 | 2.099300 |
2 | 2.319000 |
3 | 2.232800 |
4 | 2.140000 |
5 | 2.085500 |
6 | 2.246200 |
7 | 1.538600 |
8 | 1.617900 |
9 | 2.000900 |
10 | 2.164600 |
11 | 1.288900 |
12 | 1.229000 |
13 | 1.114200 |
14 | 1.087900 |
15 | 0.970900 |
16 | 1.049800 |
17 | 1.039700 |
18 | 0.754300 |
19 | 0.766500 |
20 | 1.161000 |
21 | 0.624900 |
22 | 0.547300 |
23 | 0.836900 |
24 | 0.599400 |
25 | 0.602800 |
26 | 0.555900 |
27 | 0.619000 |
28 | 0.496500 |
29 | 0.567800 |
30 | 0.258900 |
31 | 0.287000 |
32 | 0.321900 |
33 | 0.481500 |
34 | 0.334100 |
35 | 0.389700 |
36 | 0.337600 |
37 | 0.357400 |
38 | 0.308100 |
39 | 0.365500 |
40 | 0.395700 |
41 | 0.161800 |
42 | 0.222900 |
43 | 0.202700 |
44 | 0.240600 |
45 | 0.274600 |
46 | 0.201500 |
47 | 0.323600 |
48 | 0.203300 |
49 | 0.204800 |
50 | 0.101700 |
51 | 0.157600 |
52 | 0.115400 |
53 | 0.143100 |
54 | 0.217000 |
55 | 0.141400 |
56 | 0.202100 |
57 | 0.078900 |
58 | 0.156400 |
59 | 0.126000 |
60 | 0.220900 |
查看花費時間#
[14]:
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")
175.4575 seconds used for training.
2.92 minutes used for training.
Peak reserved memory = 3.119 GB.
Peak reserved memory for training = 0.049 GB.
Peak reserved memory % of max memory = 21.159 %.
Peak reserved memory for training % of max memory = 0.332 %.
運行微調後的模型結果#
[15]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
messages = [
{"role": "user", "content": "timer file descriptor HOWTO"},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize = True,
add_generation_prompt = True, # Must add for generation
return_tensors = "pt",
).to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 128,
use_cache = True, temperature = 1.5, min_p = 0.1)
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
定時器文件 descriptor教材<|eot_id|>
儲存微調後的模型#
[16]:
model.save_pretrained("/content/drive/MyDrive/lora_model")
tokenizer.save_pretrained("/content/drive/MyDrive/lora_model")
[16]:
('/content/drive/MyDrive/lora_model/tokenizer_config.json',
'/content/drive/MyDrive/lora_model/special_tokens_map.json',
'/content/drive/MyDrive/lora_model/chat_template.jinja',
'/content/drive/MyDrive/lora_model/tokenizer.json')
轉換成 GGUF#
[17]:
!git clone --recursive https://github.com/ggerganov/llama.cpp
!(cd llama.cpp; cmake -B build;cmake --build build --config Release)
Cloning into 'llama.cpp'...
remote: Enumerating objects: 61208, done.
remote: Counting objects: 100% (153/153), done.
remote: Compressing objects: 100% (88/88), done.
remote: Total 61208 (delta 99), reused 72 (delta 65), pack-reused 61055 (from 3)
Receiving objects: 100% (61208/61208), 151.79 MiB | 17.08 MiB/s, done.
Resolving deltas: 100% (44461/44461), done.
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMAKE_BUILD_TYPE=Release
-- Found Git: /usr/bin/git (found version "2.34.1")
-- The ASM compiler identification is GNU
-- Found assembler: /usr/bin/cc
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- GGML_SYSTEM_ARCH: x86
-- Including CPU backend
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- x86 detected
-- Adding CPU backend variant ggml-cpu: -march=native
-- ggml version: 0.0.6403
-- ggml commit: 3b15924d
-- Found CURL: /usr/lib/x86_64-linux-gnu/libcurl.so (found version "7.81.0")
-- Configuring done (1.3s)
-- Generating done (0.2s)
-- Build files have been written to: /content/llama.cpp/build
[ 1%] Building C object ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o
[ 1%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml.cpp.o
[ 2%] Building C object ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o
[ 2%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o
[ 2%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml-opt.cpp.o
[ 3%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o
[ 3%] Building C object ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o
[ 4%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/gguf.cpp.o
[ 4%] Linking CXX shared library ../../bin/libggml-base.so
[ 4%] Built target ggml-base
[ 4%] Building C object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o
[ 5%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.cpp.o
[ 5%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/repack.cpp.o
[ 6%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/hbm.cpp.o
[ 6%] Building C object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/quants.c.o
[ 6%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/traits.cpp.o
[ 7%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/amx.cpp.o
[ 7%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o
[ 8%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/binary-ops.cpp.o
[ 8%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/unary-ops.cpp.o
[ 8%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/vec.cpp.o
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ops.cpp.o
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/llamafile/sgemm.cpp.o
[ 10%] Building C object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/arch/x86/quants.c.o
[ 10%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/arch/x86/repack.cpp.o
[ 10%] Linking CXX shared library ../../bin/libggml-cpu.so
[ 10%] Built target ggml-cpu
[ 10%] Building CXX object ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o
[ 10%] Linking CXX shared library ../../bin/libggml.so
[ 10%] Built target ggml
[ 11%] Building CXX object src/CMakeFiles/llama.dir/llama.cpp.o
[ 11%] Building CXX object src/CMakeFiles/llama.dir/llama-adapter.cpp.o
[ 12%] Building CXX object src/CMakeFiles/llama.dir/llama-arch.cpp.o
[ 12%] Building CXX object src/CMakeFiles/llama.dir/llama-batch.cpp.o
[ 12%] Building CXX object src/CMakeFiles/llama.dir/llama-chat.cpp.o
[ 13%] Building CXX object src/CMakeFiles/llama.dir/llama-context.cpp.o
[ 13%] Building CXX object src/CMakeFiles/llama.dir/llama-cparams.cpp.o
[ 14%] Building CXX object src/CMakeFiles/llama.dir/llama-grammar.cpp.o
[ 14%] Building CXX object src/CMakeFiles/llama.dir/llama-graph.cpp.o
[ 14%] Building CXX object src/CMakeFiles/llama.dir/llama-hparams.cpp.o
[ 15%] Building CXX object src/CMakeFiles/llama.dir/llama-impl.cpp.o
[ 15%] Building CXX object src/CMakeFiles/llama.dir/llama-io.cpp.o
[ 16%] Building CXX object src/CMakeFiles/llama.dir/llama-kv-cache.cpp.o
[ 16%] Building CXX object src/CMakeFiles/llama.dir/llama-kv-cache-iswa.cpp.o
[ 16%] Building CXX object src/CMakeFiles/llama.dir/llama-memory.cpp.o
[ 17%] Building CXX object src/CMakeFiles/llama.dir/llama-memory-hybrid.cpp.o
[ 17%] Building CXX object src/CMakeFiles/llama.dir/llama-memory-recurrent.cpp.o
[ 18%] Building CXX object src/CMakeFiles/llama.dir/llama-mmap.cpp.o
[ 18%] Building CXX object src/CMakeFiles/llama.dir/llama-model-loader.cpp.o
[ 18%] Building CXX object src/CMakeFiles/llama.dir/llama-model-saver.cpp.o
[ 19%] Building CXX object src/CMakeFiles/llama.dir/llama-model.cpp.o
[ 19%] Building CXX object src/CMakeFiles/llama.dir/llama-quant.cpp.o
[ 20%] Building CXX object src/CMakeFiles/llama.dir/llama-sampling.cpp.o
[ 20%] Building CXX object src/CMakeFiles/llama.dir/llama-vocab.cpp.o
[ 20%] Building CXX object src/CMakeFiles/llama.dir/unicode-data.cpp.o
[ 21%] Building CXX object src/CMakeFiles/llama.dir/unicode.cpp.o
[ 21%] Linking CXX shared library ../bin/libllama.so
[ 21%] Built target llama
[ 21%] Building CXX object common/CMakeFiles/build_info.dir/build-info.cpp.o
[ 21%] Built target build_info
[ 21%] Building CXX object common/CMakeFiles/common.dir/arg.cpp.o
[ 22%] Building CXX object common/CMakeFiles/common.dir/chat-parser.cpp.o
[ 22%] Building CXX object common/CMakeFiles/common.dir/chat.cpp.o
[ 23%] Building CXX object common/CMakeFiles/common.dir/common.cpp.o
[ 23%] Building CXX object common/CMakeFiles/common.dir/console.cpp.o
[ 23%] Building CXX object common/CMakeFiles/common.dir/json-partial.cpp.o
[ 24%] Building CXX object common/CMakeFiles/common.dir/json-schema-to-grammar.cpp.o
[ 24%] Building CXX object common/CMakeFiles/common.dir/llguidance.cpp.o
[ 25%] Building CXX object common/CMakeFiles/common.dir/log.cpp.o
[ 25%] Building CXX object common/CMakeFiles/common.dir/ngram-cache.cpp.o
[ 25%] Building CXX object common/CMakeFiles/common.dir/regex-partial.cpp.o
[ 26%] Building CXX object common/CMakeFiles/common.dir/sampling.cpp.o
[ 26%] Building CXX object common/CMakeFiles/common.dir/speculative.cpp.o
[ 27%] Linking CXX static library libcommon.a
[ 27%] Built target common
[ 27%] Building CXX object tests/CMakeFiles/test-tokenizer-0.dir/test-tokenizer-0.cpp.o
[ 27%] Linking CXX executable ../bin/test-tokenizer-0
[ 27%] Built target test-tokenizer-0
[ 28%] Building CXX object tests/CMakeFiles/test-sampling.dir/test-sampling.cpp.o
[ 28%] Building CXX object tests/CMakeFiles/test-sampling.dir/get-model.cpp.o
[ 28%] Linking CXX executable ../bin/test-sampling
[ 28%] Built target test-sampling
[ 28%] Building CXX object tests/CMakeFiles/test-grammar-parser.dir/test-grammar-parser.cpp.o
[ 29%] Building CXX object tests/CMakeFiles/test-grammar-parser.dir/get-model.cpp.o
[ 29%] Linking CXX executable ../bin/test-grammar-parser
[ 29%] Built target test-grammar-parser
[ 29%] Building CXX object tests/CMakeFiles/test-grammar-integration.dir/test-grammar-integration.cpp.o
[ 30%] Building CXX object tests/CMakeFiles/test-grammar-integration.dir/get-model.cpp.o
[ 30%] Linking CXX executable ../bin/test-grammar-integration
[ 30%] Built target test-grammar-integration
[ 30%] Building CXX object tests/CMakeFiles/test-llama-grammar.dir/test-llama-grammar.cpp.o
[ 30%] Building CXX object tests/CMakeFiles/test-llama-grammar.dir/get-model.cpp.o
[ 31%] Linking CXX executable ../bin/test-llama-grammar
[ 31%] Built target test-llama-grammar
[ 32%] Building CXX object tests/CMakeFiles/test-chat.dir/test-chat.cpp.o
[ 32%] Building CXX object tests/CMakeFiles/test-chat.dir/get-model.cpp.o
[ 32%] Linking CXX executable ../bin/test-chat
[ 32%] Built target test-chat
[ 33%] Building CXX object tests/CMakeFiles/test-json-schema-to-grammar.dir/test-json-schema-to-grammar.cpp.o
[ 33%] Building CXX object tests/CMakeFiles/test-json-schema-to-grammar.dir/get-model.cpp.o
[ 34%] Linking CXX executable ../bin/test-json-schema-to-grammar
[ 34%] Built target test-json-schema-to-grammar
[ 34%] Building CXX object tests/CMakeFiles/test-quantize-stats.dir/test-quantize-stats.cpp.o
[ 35%] Linking CXX executable ../bin/test-quantize-stats
[ 35%] Built target test-quantize-stats
[ 35%] Building CXX object tests/CMakeFiles/test-gbnf-validator.dir/test-gbnf-validator.cpp.o
[ 36%] Linking CXX executable ../bin/test-gbnf-validator
[ 36%] Built target test-gbnf-validator
[ 37%] Building CXX object tests/CMakeFiles/test-tokenizer-1-bpe.dir/test-tokenizer-1-bpe.cpp.o
[ 37%] Linking CXX executable ../bin/test-tokenizer-1-bpe
[ 37%] Built target test-tokenizer-1-bpe
[ 38%] Building CXX object tests/CMakeFiles/test-tokenizer-1-spm.dir/test-tokenizer-1-spm.cpp.o
[ 38%] Linking CXX executable ../bin/test-tokenizer-1-spm
[ 38%] Built target test-tokenizer-1-spm
[ 39%] Building CXX object tests/CMakeFiles/test-chat-parser.dir/test-chat-parser.cpp.o
[ 39%] Building CXX object tests/CMakeFiles/test-chat-parser.dir/get-model.cpp.o
[ 40%] Linking CXX executable ../bin/test-chat-parser
[ 40%] Built target test-chat-parser
[ 40%] Building CXX object tests/CMakeFiles/test-chat-template.dir/test-chat-template.cpp.o
[ 40%] Building CXX object tests/CMakeFiles/test-chat-template.dir/get-model.cpp.o
[ 41%] Linking CXX executable ../bin/test-chat-template
[ 41%] Built target test-chat-template
[ 42%] Building CXX object tests/CMakeFiles/test-json-partial.dir/test-json-partial.cpp.o
[ 42%] Building CXX object tests/CMakeFiles/test-json-partial.dir/get-model.cpp.o
[ 42%] Linking CXX executable ../bin/test-json-partial
[ 42%] Built target test-json-partial
[ 42%] Building CXX object tests/CMakeFiles/test-log.dir/test-log.cpp.o
[ 43%] Building CXX object tests/CMakeFiles/test-log.dir/get-model.cpp.o
[ 43%] Linking CXX executable ../bin/test-log
[ 43%] Built target test-log
[ 43%] Building CXX object tests/CMakeFiles/test-regex-partial.dir/test-regex-partial.cpp.o
[ 44%] Building CXX object tests/CMakeFiles/test-regex-partial.dir/get-model.cpp.o
[ 44%] Linking CXX executable ../bin/test-regex-partial
[ 44%] Built target test-regex-partial
[ 45%] Building CXX object tests/CMakeFiles/test-thread-safety.dir/test-thread-safety.cpp.o
[ 45%] Building CXX object tests/CMakeFiles/test-thread-safety.dir/get-model.cpp.o
[ 46%] Linking CXX executable ../bin/test-thread-safety
[ 46%] Built target test-thread-safety
[ 46%] Building CXX object tests/CMakeFiles/test-arg-parser.dir/test-arg-parser.cpp.o
[ 46%] Building CXX object tests/CMakeFiles/test-arg-parser.dir/get-model.cpp.o
[ 47%] Linking CXX executable ../bin/test-arg-parser
[ 47%] Built target test-arg-parser
[ 48%] Building CXX object tests/CMakeFiles/test-opt.dir/test-opt.cpp.o
[ 48%] Building CXX object tests/CMakeFiles/test-opt.dir/get-model.cpp.o
[ 49%] Linking CXX executable ../bin/test-opt
[ 49%] Built target test-opt
[ 49%] Building CXX object tests/CMakeFiles/test-gguf.dir/test-gguf.cpp.o
[ 49%] Building CXX object tests/CMakeFiles/test-gguf.dir/get-model.cpp.o
[ 50%] Linking CXX executable ../bin/test-gguf
[ 50%] Built target test-gguf
[ 50%] Building CXX object tests/CMakeFiles/test-backend-ops.dir/test-backend-ops.cpp.o
[ 51%] Building CXX object tests/CMakeFiles/test-backend-ops.dir/get-model.cpp.o
[ 51%] Linking CXX executable ../bin/test-backend-ops
[ 51%] Built target test-backend-ops
[ 51%] Building CXX object tests/CMakeFiles/test-model-load-cancel.dir/test-model-load-cancel.cpp.o
[ 52%] Building CXX object tests/CMakeFiles/test-model-load-cancel.dir/get-model.cpp.o
[ 52%] Linking CXX executable ../bin/test-model-load-cancel
[ 52%] Built target test-model-load-cancel
[ 52%] Building CXX object tests/CMakeFiles/test-autorelease.dir/test-autorelease.cpp.o
[ 53%] Building CXX object tests/CMakeFiles/test-autorelease.dir/get-model.cpp.o
[ 53%] Linking CXX executable ../bin/test-autorelease
[ 53%] Built target test-autorelease
[ 54%] Building CXX object tests/CMakeFiles/test-barrier.dir/test-barrier.cpp.o
[ 54%] Building CXX object tests/CMakeFiles/test-barrier.dir/get-model.cpp.o
[ 54%] Linking CXX executable ../bin/test-barrier
[ 54%] Built target test-barrier
[ 54%] Building CXX object tests/CMakeFiles/test-quantize-fns.dir/test-quantize-fns.cpp.o
[ 54%] Building CXX object tests/CMakeFiles/test-quantize-fns.dir/get-model.cpp.o
[ 55%] Linking CXX executable ../bin/test-quantize-fns
[ 55%] Built target test-quantize-fns
[ 55%] Building CXX object tests/CMakeFiles/test-quantize-perf.dir/test-quantize-perf.cpp.o
[ 56%] Building CXX object tests/CMakeFiles/test-quantize-perf.dir/get-model.cpp.o
[ 56%] Linking CXX executable ../bin/test-quantize-perf
[ 56%] Built target test-quantize-perf
[ 56%] Building CXX object tests/CMakeFiles/test-rope.dir/test-rope.cpp.o
[ 57%] Building CXX object tests/CMakeFiles/test-rope.dir/get-model.cpp.o
[ 57%] Linking CXX executable ../bin/test-rope
[ 57%] Built target test-rope
[ 57%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/mtmd.cpp.o
[ 58%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/mtmd-audio.cpp.o
[ 58%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/clip.cpp.o
[ 58%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/mtmd-helper.cpp.o
[ 59%] Linking CXX shared library ../../bin/libmtmd.so
[ 59%] Built target mtmd
[ 60%] Building C object tests/CMakeFiles/test-mtmd-c-api.dir/test-mtmd-c-api.c.o
[ 60%] Building CXX object tests/CMakeFiles/test-mtmd-c-api.dir/get-model.cpp.o
[ 60%] Linking CXX executable ../bin/test-mtmd-c-api
[ 60%] Built target test-mtmd-c-api
[ 61%] Building C object tests/CMakeFiles/test-c.dir/test-c.c.o
[ 61%] Linking C executable ../bin/test-c
[ 61%] Built target test-c
[ 62%] Building CXX object examples/batched/CMakeFiles/llama-batched.dir/batched.cpp.o
[ 62%] Linking CXX executable ../../bin/llama-batched
[ 62%] Built target llama-batched
[ 62%] Building CXX object examples/embedding/CMakeFiles/llama-embedding.dir/embedding.cpp.o
[ 63%] Linking CXX executable ../../bin/llama-embedding
[ 63%] Built target llama-embedding
[ 63%] Building CXX object examples/eval-callback/CMakeFiles/llama-eval-callback.dir/eval-callback.cpp.o
[ 63%] Linking CXX executable ../../bin/llama-eval-callback
[ 63%] Built target llama-eval-callback
[ 64%] Building C object examples/gguf-hash/CMakeFiles/sha256.dir/deps/sha256/sha256.c.o
[ 64%] Built target sha256
[ 65%] Building C object examples/gguf-hash/CMakeFiles/xxhash.dir/deps/xxhash/xxhash.c.o
[ 65%] Built target xxhash
[ 65%] Building C object examples/gguf-hash/CMakeFiles/sha1.dir/deps/sha1/sha1.c.o
[ 65%] Built target sha1
[ 66%] Building CXX object examples/gguf-hash/CMakeFiles/llama-gguf-hash.dir/gguf-hash.cpp.o
[ 66%] Linking CXX executable ../../bin/llama-gguf-hash
[ 66%] Built target llama-gguf-hash
[ 66%] Building CXX object examples/gguf/CMakeFiles/llama-gguf.dir/gguf.cpp.o
[ 66%] Linking CXX executable ../../bin/llama-gguf
[ 66%] Built target llama-gguf
[ 66%] Building CXX object examples/gritlm/CMakeFiles/llama-gritlm.dir/gritlm.cpp.o
[ 67%] Linking CXX executable ../../bin/llama-gritlm
[ 67%] Built target llama-gritlm
[ 68%] Building CXX object examples/lookahead/CMakeFiles/llama-lookahead.dir/lookahead.cpp.o
[ 68%] Linking CXX executable ../../bin/llama-lookahead
[ 68%] Built target llama-lookahead
[ 68%] Building CXX object examples/lookup/CMakeFiles/llama-lookup.dir/lookup.cpp.o
[ 69%] Linking CXX executable ../../bin/llama-lookup
[ 69%] Built target llama-lookup
[ 69%] Building CXX object examples/lookup/CMakeFiles/llama-lookup-create.dir/lookup-create.cpp.o
[ 70%] Linking CXX executable ../../bin/llama-lookup-create
[ 70%] Built target llama-lookup-create
[ 70%] Building CXX object examples/lookup/CMakeFiles/llama-lookup-merge.dir/lookup-merge.cpp.o
[ 70%] Linking CXX executable ../../bin/llama-lookup-merge
[ 70%] Built target llama-lookup-merge
[ 71%] Building CXX object examples/lookup/CMakeFiles/llama-lookup-stats.dir/lookup-stats.cpp.o
[ 71%] Linking CXX executable ../../bin/llama-lookup-stats
[ 71%] Built target llama-lookup-stats
[ 71%] Building CXX object examples/parallel/CMakeFiles/llama-parallel.dir/parallel.cpp.o
[ 72%] Linking CXX executable ../../bin/llama-parallel
[ 72%] Built target llama-parallel
[ 72%] Building CXX object examples/passkey/CMakeFiles/llama-passkey.dir/passkey.cpp.o
[ 73%] Linking CXX executable ../../bin/llama-passkey
[ 73%] Built target llama-passkey
[ 73%] Building CXX object examples/retrieval/CMakeFiles/llama-retrieval.dir/retrieval.cpp.o
[ 74%] Linking CXX executable ../../bin/llama-retrieval
[ 74%] Built target llama-retrieval
[ 74%] Building CXX object examples/save-load-state/CMakeFiles/llama-save-load-state.dir/save-load-state.cpp.o
[ 75%] Linking CXX executable ../../bin/llama-save-load-state
[ 75%] Built target llama-save-load-state
[ 76%] Building CXX object examples/simple/CMakeFiles/llama-simple.dir/simple.cpp.o
[ 76%] Linking CXX executable ../../bin/llama-simple
[ 76%] Built target llama-simple
[ 76%] Building CXX object examples/simple-chat/CMakeFiles/llama-simple-chat.dir/simple-chat.cpp.o
[ 77%] Linking CXX executable ../../bin/llama-simple-chat
[ 77%] Built target llama-simple-chat
[ 77%] Building CXX object examples/speculative/CMakeFiles/llama-speculative.dir/speculative.cpp.o
[ 78%] Linking CXX executable ../../bin/llama-speculative
[ 78%] Built target llama-speculative
[ 78%] Building CXX object examples/speculative-simple/CMakeFiles/llama-speculative-simple.dir/speculative-simple.cpp.o
[ 78%] Linking CXX executable ../../bin/llama-speculative-simple
[ 78%] Built target llama-speculative-simple
[ 78%] Building CXX object examples/gen-docs/CMakeFiles/llama-gen-docs.dir/gen-docs.cpp.o
[ 79%] Linking CXX executable ../../bin/llama-gen-docs
[ 79%] Built target llama-gen-docs
[ 80%] Building CXX object examples/training/CMakeFiles/llama-finetune.dir/finetune.cpp.o
[ 80%] Linking CXX executable ../../bin/llama-finetune
[ 80%] Built target llama-finetune
[ 80%] Building CXX object examples/diffusion/CMakeFiles/llama-diffusion-cli.dir/diffusion-cli.cpp.o
[ 81%] Linking CXX executable ../../bin/llama-diffusion-cli
[ 81%] Built target llama-diffusion-cli
[ 82%] Building CXX object examples/model-conversion/CMakeFiles/llama-logits.dir/logits.cpp.o
[ 82%] Linking CXX executable ../../bin/llama-logits
[ 82%] Built target llama-logits
[ 83%] Building CXX object examples/convert-llama2c-to-ggml/CMakeFiles/llama-convert-llama2c-to-ggml.dir/convert-llama2c-to-ggml.cpp.o
[ 83%] Linking CXX executable ../../bin/llama-convert-llama2c-to-ggml
[ 83%] Built target llama-convert-llama2c-to-ggml
[ 83%] Building CXX object pocs/vdot/CMakeFiles/llama-vdot.dir/vdot.cpp.o
[ 84%] Linking CXX executable ../../bin/llama-vdot
[ 84%] Built target llama-vdot
[ 85%] Building CXX object pocs/vdot/CMakeFiles/llama-q8dot.dir/q8dot.cpp.o
[ 85%] Linking CXX executable ../../bin/llama-q8dot
[ 85%] Built target llama-q8dot
[ 85%] Building CXX object tools/batched-bench/CMakeFiles/llama-batched-bench.dir/batched-bench.cpp.o
[ 86%] Linking CXX executable ../../bin/llama-batched-bench
[ 86%] Built target llama-batched-bench
[ 87%] Building CXX object tools/gguf-split/CMakeFiles/llama-gguf-split.dir/gguf-split.cpp.o
[ 87%] Linking CXX executable ../../bin/llama-gguf-split
[ 87%] Built target llama-gguf-split
[ 87%] Building CXX object tools/imatrix/CMakeFiles/llama-imatrix.dir/imatrix.cpp.o
[ 88%] Linking CXX executable ../../bin/llama-imatrix
[ 88%] Built target llama-imatrix
[ 88%] Building CXX object tools/llama-bench/CMakeFiles/llama-bench.dir/llama-bench.cpp.o
[ 89%] Linking CXX executable ../../bin/llama-bench
[ 89%] Built target llama-bench
[ 89%] Building CXX object tools/main/CMakeFiles/llama-cli.dir/main.cpp.o
[ 89%] Linking CXX executable ../../bin/llama-cli
[ 89%] Built target llama-cli
[ 89%] Building CXX object tools/perplexity/CMakeFiles/llama-perplexity.dir/perplexity.cpp.o
[ 89%] Linking CXX executable ../../bin/llama-perplexity
[ 89%] Built target llama-perplexity
[ 90%] Building CXX object tools/quantize/CMakeFiles/llama-quantize.dir/quantize.cpp.o
[ 90%] Linking CXX executable ../../bin/llama-quantize
[ 90%] Built target llama-quantize
[ 90%] Generating loading.html.hpp
[ 90%] Generating index.html.gz.hpp
[ 91%] Building CXX object tools/server/CMakeFiles/llama-server.dir/server.cpp.o
[ 91%] Linking CXX executable ../../bin/llama-server
[ 91%] Built target llama-server
[ 91%] Building CXX object tools/run/CMakeFiles/llama-run.dir/run.cpp.o
[ 91%] Building CXX object tools/run/CMakeFiles/llama-run.dir/linenoise.cpp/linenoise.cpp.o
[ 92%] Linking CXX executable ../../bin/llama-run
[ 92%] Built target llama-run
[ 93%] Building CXX object tools/tokenize/CMakeFiles/llama-tokenize.dir/tokenize.cpp.o
[ 93%] Linking CXX executable ../../bin/llama-tokenize
[ 93%] Built target llama-tokenize
[ 94%] Building CXX object tools/tts/CMakeFiles/llama-tts.dir/tts.cpp.o
[ 94%] Linking CXX executable ../../bin/llama-tts
[ 94%] Built target llama-tts
[ 94%] Building CXX object tools/mtmd/CMakeFiles/llama-llava-cli.dir/deprecation-warning.cpp.o
[ 94%] Linking CXX executable ../../bin/llama-llava-cli
[ 94%] Built target llama-llava-cli
[ 94%] Building CXX object tools/mtmd/CMakeFiles/llama-gemma3-cli.dir/deprecation-warning.cpp.o
[ 95%] Linking CXX executable ../../bin/llama-gemma3-cli
[ 95%] Built target llama-gemma3-cli
[ 96%] Building CXX object tools/mtmd/CMakeFiles/llama-minicpmv-cli.dir/deprecation-warning.cpp.o
[ 96%] Linking CXX executable ../../bin/llama-minicpmv-cli
[ 96%] Built target llama-minicpmv-cli
[ 96%] Building CXX object tools/mtmd/CMakeFiles/llama-qwen2vl-cli.dir/deprecation-warning.cpp.o
[ 97%] Linking CXX executable ../../bin/llama-qwen2vl-cli
[ 97%] Built target llama-qwen2vl-cli
[ 97%] Building CXX object tools/mtmd/CMakeFiles/llama-mtmd-cli.dir/mtmd-cli.cpp.o
[ 98%] Linking CXX executable ../../bin/llama-mtmd-cli
[ 98%] Built target llama-mtmd-cli
[ 99%] Building CXX object tools/cvector-generator/CMakeFiles/llama-cvector-generator.dir/cvector-generator.cpp.o
[ 99%] Linking CXX executable ../../bin/llama-cvector-generator
[ 99%] Built target llama-cvector-generator
[100%] Building CXX object tools/export-lora/CMakeFiles/llama-export-lora.dir/export-lora.cpp.o
[100%] Linking CXX executable ../../bin/llama-export-lora
[100%] Built target llama-export-lora
[18]:
!wget https://raw.githubusercontent.com/ggml-org/llama.cpp/refs/heads/master/convert_hf_to_gguf.py -O llama.cpp/convert_hf_to_gguf.py
if True: model.save_pretrained_gguf("/content/drive/MyDrive/lora_model/q4_k_m", tokenizer, quantization_method = "q4_k_m")
--2025-09-07 16:06:57-- https://raw.githubusercontent.com/ggml-org/llama.cpp/refs/heads/master/convert_hf_to_gguf.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 419460 (410K) [text/plain]
Saving to: ‘llama.cpp/convert_hf_to_gguf.py’
llama.cpp/convert_h 100%[===================>] 409.63K --.-KB/s in 0.003s
2025-09-07 16:06:57 (121 MB/s) - ‘llama.cpp/convert_hf_to_gguf.py’ saved [419460/419460]
Unsloth: You have 1 CPUs. Using `safe_serialization` is 10x slower.
We shall switch to Pytorch saving, which might take 3 minutes and not 30 minutes.
To force `safe_serialization`, set it to `None` instead.
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 2.4G
Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 3.59 out of 12.67 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...
100%|██████████| 28/28 [00:00<00:00, 28.42it/s]
Unsloth: Saving tokenizer... Done.
Unsloth: Saving /content/drive/MyDrive/lora_model/q4_k_m/pytorch_model-00001-of-00002.bin...
Unsloth: Saving /content/drive/MyDrive/lora_model/q4_k_m/pytorch_model-00002-of-00002.bin...
Done.
Unsloth: Converting llama model. Can use fast conversion = False.
==((====))== Unsloth: Conversion from QLoRA to GGUF information
\\ /| [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \ [1] Converting HF to GGUF 16bits might take 3 minutes.
\ / [2] Converting GGUF 16bits to ['q4_k_m'] might take 10 minutes each.
"-____-" In total, you will have to wait at least 16 minutes.
Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: [1] Converting model at /content/drive/MyDrive/lora_model/q4_k_m into f16 GGUF format.
The output location will be /content/drive/MyDrive/lora_model/q4_k_m/unsloth.F16.gguf
This might take 3 minutes...
INFO:hf-to-gguf:Loading model: q4_k_m
INFO:hf-to-gguf:Model architecture: LlamaForCausalLM
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:rope_freqs.weight, torch.float32 --> F32, shape = {64}
INFO:hf-to-gguf:gguf: loading model weight map from 'pytorch_model.bin.index.json'
INFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00001-of-00002.bin'
INFO:hf-to-gguf:token_embd.weight, torch.float16 --> F16, shape = {3072, 128256}
INFO:hf-to-gguf:blk.0.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.0.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.0.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.0.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.1.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.1.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.1.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.1.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.2.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.2.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.2.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.2.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.3.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.3.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.3.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.3.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.4.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.4.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.4.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.4.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.5.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.5.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.5.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.5.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.6.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.6.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.6.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.6.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.7.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.7.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.7.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.7.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.8.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.8.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.8.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.8.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.8.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.8.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.8.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.9.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.9.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.9.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.9.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.9.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.9.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.9.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.9.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.9.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.10.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.10.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.10.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.10.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.10.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.11.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.11.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.11.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.11.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.11.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.12.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.12.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.12.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.12.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.12.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.12.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.12.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.13.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.13.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.13.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.13.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.13.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.13.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.13.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.14.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.14.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.14.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.14.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.14.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.14.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.14.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.15.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.15.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.15.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.15.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.15.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.15.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.15.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.15.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.16.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.16.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.16.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.16.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.16.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.16.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.16.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.16.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.16.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.17.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.17.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.17.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.17.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.17.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.17.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.17.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.17.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.17.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.18.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.18.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.18.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.18.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.18.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.18.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.18.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.18.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.18.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.19.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.19.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.19.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.19.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.19.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.19.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.19.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.19.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.19.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.20.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.20.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.20.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.20.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.20.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.20.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00002-of-00002.bin'
INFO:hf-to-gguf:blk.20.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.20.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.20.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.21.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.21.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.21.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.21.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.21.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.21.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.21.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.21.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.21.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.22.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.22.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.22.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.22.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.22.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.22.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.22.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.22.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.22.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.23.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.23.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.23.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.23.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.23.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.23.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.23.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.23.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.23.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.24.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.24.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.24.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.24.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.24.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.24.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.24.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.24.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.24.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.25.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.25.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.25.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.25.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.25.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.25.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.25.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.25.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.25.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.26.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.26.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.26.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.26.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.26.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.26.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.26.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.26.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.26.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.27.attn_q.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.27.attn_k.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.27.attn_v.weight, torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.27.attn_output.weight, torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.27.ffn_gate.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.27.ffn_up.weight, torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.27.ffn_down.weight, torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.27.attn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.27.ffn_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:output_norm.weight, torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 131072
INFO:hf-to-gguf:gguf: embedding length = 3072
INFO:hf-to-gguf:gguf: feed forward length = 8192
INFO:hf-to-gguf:gguf: head count = 24
INFO:hf-to-gguf:gguf: key-value head count = 8
INFO:hf-to-gguf:gguf: rope theta = 500000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
INFO:numexpr.utils:NumExpr defaulting to 2 threads.
WARNING:gguf.vocab:Unknown separator token '<|begin_of_text|>' in TemplateProcessing<pair>
INFO:gguf.vocab:Adding 280147 merge(s).
INFO:gguf.vocab:Setting special token type bos to 128000
INFO:gguf.vocab:Setting special token type eos to 128001
INFO:gguf.vocab:Setting special token type pad to 128004
INFO:gguf.vocab:Setting add_bos_token to True
INFO:gguf.vocab:Setting add_sep_token to False
INFO:gguf.vocab:Setting chat_template to {{- bos_token }}
{%- if custom_tools is defined %}
{%- set tools = custom_tools %}
{%- endif %}
{%- if not tools_in_user_message is defined %}
{%- set tools_in_user_message = true %}
{%- endif %}
{%- if not date_string is defined %}
{%- set date_string = "26 July 2024" %}
{%- endif %}
{%- if not tools is defined %}
{%- set tools = none %}
{%- endif %}
{#- This block extracts the system message, so we can slot it into the right place. #}
{%- if messages[0]['role'] == 'system' %}
{%- set system_message = messages[0]['content'] %}
{%- set messages = messages[1:] %}
{%- else %}
{%- set system_message = "" %}
{%- endif %}
{#- System message + builtin tools #}
{{- "<|start_header_id|>system<|end_header_id|>
" }}
{%- if builtin_tools is defined or tools is not none %}
{{- "Environment: ipython
" }}
{%- endif %}
{%- if builtin_tools is defined %}
{{- "Tools: " + builtin_tools | reject('equalto', 'code_interpreter') | join(", ") + "
"}}
{%- endif %}
{{- "Cutting Knowledge Date: December 2023
" }}
{{- "Today Date: " + date_string + "
" }}
{%- if tools is not none and not tools_in_user_message %}
{{- "You have access to the following functions. To call a function, please respond with JSON for a function call." }}
{{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
{{- "Do not use variables.
" }}
{%- for t in tools %}
{{- t | tojson(indent=4) }}
{{- "
" }}
{%- endfor %}
{%- endif %}
{{- system_message }}
{{- "<|eot_id|>" }}
{#- Custom tools are passed in a user message with some extra guidance #}
{%- if tools_in_user_message and not tools is none %}
{#- Extract the first user message so we can plug it in here #}
{%- if messages | length != 0 %}
{%- set first_user_message = messages[0]['content'] %}
{%- set messages = messages[1:] %}
{%- else %}
{{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
{%- endif %}
{{- '<|start_header_id|>user<|end_header_id|>
' -}}
{{- "Given the following functions, please respond with a JSON for a function call " }}
{{- "with its proper arguments that best answers the given prompt.
" }}
{{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
{{- "Do not use variables.
" }}
{%- for t in tools %}
{{- t | tojson(indent=4) }}
{{- "
" }}
{%- endfor %}
{{- first_user_message + "<|eot_id|>"}}
{%- endif %}
{%- for message in messages %}
{%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
{{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>
'+ message['content'] + '<|eot_id|>' }}
{%- elif 'tool_calls' in message %}
{%- if not message.tool_calls|length == 1 %}
{{- raise_exception("This model only supports single tool-calls at once!") }}
{%- endif %}
{%- set tool_call = message.tool_calls[0].function %}
{%- if builtin_tools is defined and tool_call.name in builtin_tools %}
{{- '<|start_header_id|>assistant<|end_header_id|>
' -}}
{{- "<|python_tag|>" + tool_call.name + ".call(" }}
{%- for arg_name, arg_val in tool_call.arguments | items %}
{{- arg_name + '="' + arg_val + '"' }}
{%- if not loop.last %}
{{- ", " }}
{%- endif %}
{%- endfor %}
{{- ")" }}
{%- else %}
{{- '<|start_header_id|>assistant<|end_header_id|>
' -}}
{{- '{"name": "' + tool_call.name + '", ' }}
{{- '"parameters": ' }}
{{- tool_call.arguments | tojson }}
{{- "}" }}
{%- endif %}
{%- if builtin_tools is defined %}
{#- This means we're in ipython mode #}
{{- "<|eom_id|>" }}
{%- else %}
{{- "<|eot_id|>" }}
{%- endif %}
{%- elif message.role == "tool" or message.role == "ipython" %}
{{- "<|start_header_id|>ipython<|end_header_id|>
" }}
{%- if message.content is mapping or message.content is iterable %}
{{- message.content | tojson }}
{%- else %}
{{- message.content }}
{%- endif %}
{{- "<|eot_id|>" }}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|start_header_id|>assistant<|end_header_id|>
' }}
{%- endif %}
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/content/drive/MyDrive/lora_model/q4_k_m/unsloth.F16.gguf: n_tensors = 255, total_size = 6.4G
Writing: 100%|██████████| 6.43G/6.43G [02:40<00:00, 40.1Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to /content/drive/MyDrive/lora_model/q4_k_m/unsloth.F16.gguf
Unsloth: Conversion completed! Output location: /content/drive/MyDrive/lora_model/q4_k_m/unsloth.F16.gguf
Unsloth: [2] Converting GGUF 16bit into q4_k_m. This might take 20 minutes...
main: build = 6403 (3b15924d)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu
main: quantizing '/content/drive/MyDrive/lora_model/q4_k_m/unsloth.F16.gguf' to '/content/drive/MyDrive/lora_model/q4_k_m/unsloth.Q4_K_M.gguf' as Q4_K_M using 4 threads
llama_model_loader: loaded meta data with 32 key-value pairs and 255 tensors from /content/drive/MyDrive/lora_model/q4_k_m/unsloth.F16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Q4_K_M
llama_model_loader: - kv 3: general.quantized_by str = Unsloth
llama_model_loader: - kv 4: general.size_label str = 3.2B
llama_model_loader: - kv 5: general.repo_url str = https://huggingface.co/unsloth
llama_model_loader: - kv 6: general.tags arr[str,2] = ["unsloth", "llama.cpp"]
llama_model_loader: - kv 7: llama.block_count u32 = 28
llama_model_loader: - kv 8: llama.context_length u32 = 131072
llama_model_loader: - kv 9: llama.embedding_length u32 = 3072
llama_model_loader: - kv 10: llama.feed_forward_length u32 = 8192
llama_model_loader: - kv 11: llama.attention.head_count u32 = 24
llama_model_loader: - kv 12: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 13: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 14: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 15: llama.attention.key_length u32 = 128
llama_model_loader: - kv 16: llama.attention.value_length u32 = 128
llama_model_loader: - kv 17: general.file_type u32 = 1
llama_model_loader: - kv 18: llama.vocab_size u32 = 128256
llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 20: general.quantization_version u32 = 2
llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 22: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 26: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 128001
llama_model_loader: - kv 28: tokenizer.ggml.padding_token_id u32 = 128004
llama_model_loader: - kv 29: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 30: tokenizer.ggml.add_sep_token bool = false
llama_model_loader: - kv 31: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - type f32: 58 tensors
llama_model_loader: - type f16: 197 tensors
[ 1/ 255] output_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 2/ 255] rope_freqs.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MB
[ 3/ 255] token_embd.weight - [ 3072, 128256, 1, 1], type = f16, converting to q6_K .. size = 751.50 MiB -> 308.23 MiB
[ 4/ 255] blk.0.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 5/ 255] blk.0.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 6/ 255] blk.0.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 7/ 255] blk.0.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 8/ 255] blk.0.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q6_K .. size = 6.00 MiB -> 2.46 MiB
[ 9/ 255] blk.0.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q6_K .. size = 48.00 MiB -> 19.69 MiB
[ 10/ 255] blk.0.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 11/ 255] blk.0.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 12/ 255] blk.0.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 13/ 255] blk.1.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 14/ 255] blk.1.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 15/ 255] blk.1.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 16/ 255] blk.1.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 17/ 255] blk.1.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q6_K .. size = 6.00 MiB -> 2.46 MiB
[ 18/ 255] blk.1.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q6_K .. size = 48.00 MiB -> 19.69 MiB
[ 19/ 255] blk.1.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 20/ 255] blk.1.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 21/ 255] blk.1.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 22/ 255] blk.2.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 23/ 255] blk.2.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 24/ 255] blk.2.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 25/ 255] blk.2.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 26/ 255] blk.2.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q6_K .. size = 6.00 MiB -> 2.46 MiB
[ 27/ 255] blk.2.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q6_K .. size = 48.00 MiB -> 19.69 MiB
[ 28/ 255] blk.2.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 29/ 255] blk.2.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 30/ 255] blk.2.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 31/ 255] blk.3.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 32/ 255] blk.3.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 33/ 255] blk.3.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 34/ 255] blk.3.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 35/ 255] blk.3.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 36/ 255] blk.3.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 37/ 255] blk.3.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 38/ 255] blk.3.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 39/ 255] blk.3.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 40/ 255] blk.4.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 41/ 255] blk.4.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 42/ 255] blk.4.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 43/ 255] blk.4.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 44/ 255] blk.4.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 45/ 255] blk.4.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 46/ 255] blk.4.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 47/ 255] blk.4.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 48/ 255] blk.4.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 49/ 255] blk.5.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 50/ 255] blk.5.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 51/ 255] blk.5.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 52/ 255] blk.5.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 53/ 255] blk.5.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q6_K .. size = 6.00 MiB -> 2.46 MiB
[ 54/ 255] blk.5.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q6_K .. size = 48.00 MiB -> 19.69 MiB
[ 55/ 255] blk.5.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 56/ 255] blk.5.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 57/ 255] blk.5.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 58/ 255] blk.6.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 59/ 255] blk.6.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 60/ 255] blk.6.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 61/ 255] blk.6.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 62/ 255] blk.6.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 63/ 255] blk.6.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 64/ 255] blk.6.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 65/ 255] blk.6.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 66/ 255] blk.6.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 67/ 255] blk.7.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 68/ 255] blk.7.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 69/ 255] blk.7.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 70/ 255] blk.7.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 71/ 255] blk.7.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 72/ 255] blk.7.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 73/ 255] blk.7.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 74/ 255] blk.7.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 75/ 255] blk.7.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 76/ 255] blk.8.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 77/ 255] blk.8.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 78/ 255] blk.8.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 79/ 255] blk.8.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 80/ 255] blk.8.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q6_K .. size = 6.00 MiB -> 2.46 MiB
[ 81/ 255] blk.8.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q6_K .. size = 48.00 MiB -> 19.69 MiB
[ 82/ 255] blk.8.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 83/ 255] blk.8.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 84/ 255] blk.8.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 85/ 255] blk.9.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 86/ 255] blk.9.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 87/ 255] blk.9.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 88/ 255] blk.9.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 89/ 255] blk.9.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 90/ 255] blk.9.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 91/ 255] blk.9.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 92/ 255] blk.9.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 93/ 255] blk.9.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 94/ 255] blk.10.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 95/ 255] blk.10.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 96/ 255] blk.10.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 97/ 255] blk.10.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 98/ 255] blk.10.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 99/ 255] blk.10.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 100/ 255] blk.10.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 101/ 255] blk.10.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 102/ 255] blk.10.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 103/ 255] blk.11.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 104/ 255] blk.11.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 105/ 255] blk.11.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 106/ 255] blk.11.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 107/ 255] blk.11.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q6_K .. size = 6.00 MiB -> 2.46 MiB
[ 108/ 255] blk.11.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q6_K .. size = 48.00 MiB -> 19.69 MiB
[ 109/ 255] blk.11.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 110/ 255] blk.11.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 111/ 255] blk.11.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 112/ 255] blk.12.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 113/ 255] blk.12.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 114/ 255] blk.12.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 115/ 255] blk.12.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 116/ 255] blk.12.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 117/ 255] blk.12.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 118/ 255] blk.12.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 119/ 255] blk.12.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 120/ 255] blk.12.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 121/ 255] blk.13.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 122/ 255] blk.13.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 123/ 255] blk.13.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 124/ 255] blk.13.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 125/ 255] blk.13.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 126/ 255] blk.13.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 127/ 255] blk.13.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 128/ 255] blk.13.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 129/ 255] blk.13.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 130/ 255] blk.14.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 131/ 255] blk.14.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 132/ 255] blk.14.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 133/ 255] blk.14.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 134/ 255] blk.14.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q6_K .. size = 6.00 MiB -> 2.46 MiB
[ 135/ 255] blk.14.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q6_K .. size = 48.00 MiB -> 19.69 MiB
[ 136/ 255] blk.14.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 137/ 255] blk.14.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 138/ 255] blk.14.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 139/ 255] blk.15.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 140/ 255] blk.15.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 141/ 255] blk.15.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 142/ 255] blk.15.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 143/ 255] blk.15.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 144/ 255] blk.15.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 145/ 255] blk.15.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 146/ 255] blk.15.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 147/ 255] blk.15.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 148/ 255] blk.16.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 149/ 255] blk.16.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 150/ 255] blk.16.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 151/ 255] blk.16.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 152/ 255] blk.16.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 153/ 255] blk.16.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 154/ 255] blk.16.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 155/ 255] blk.16.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 156/ 255] blk.16.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 157/ 255] blk.17.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 158/ 255] blk.17.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 159/ 255] blk.17.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 160/ 255] blk.17.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 161/ 255] blk.17.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q6_K .. size = 6.00 MiB -> 2.46 MiB
[ 162/ 255] blk.17.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q6_K .. size = 48.00 MiB -> 19.69 MiB
[ 163/ 255] blk.17.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 164/ 255] blk.17.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 165/ 255] blk.17.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 166/ 255] blk.18.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 167/ 255] blk.18.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 168/ 255] blk.18.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 169/ 255] blk.18.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 170/ 255] blk.18.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 171/ 255] blk.18.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 172/ 255] blk.18.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 173/ 255] blk.18.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 174/ 255] blk.18.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 175/ 255] blk.19.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 176/ 255] blk.19.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 177/ 255] blk.19.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 178/ 255] blk.19.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 179/ 255] blk.19.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 180/ 255] blk.19.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 181/ 255] blk.19.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 182/ 255] blk.19.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 183/ 255] blk.19.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 184/ 255] blk.20.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 185/ 255] blk.20.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 186/ 255] blk.20.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 187/ 255] blk.20.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 188/ 255] blk.20.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q6_K .. size = 6.00 MiB -> 2.46 MiB
[ 189/ 255] blk.20.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q6_K .. size = 48.00 MiB -> 19.69 MiB
[ 190/ 255] blk.20.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 191/ 255] blk.20.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 192/ 255] blk.20.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 193/ 255] blk.21.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 194/ 255] blk.21.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 195/ 255] blk.21.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 196/ 255] blk.21.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 197/ 255] blk.21.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 198/ 255] blk.21.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 199/ 255] blk.21.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 200/ 255] blk.21.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 201/ 255] blk.21.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 202/ 255] blk.22.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 203/ 255] blk.22.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 204/ 255] blk.22.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 205/ 255] blk.22.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 206/ 255] blk.22.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 207/ 255] blk.22.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 208/ 255] blk.22.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 209/ 255] blk.22.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 210/ 255] blk.22.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 211/ 255] blk.23.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 212/ 255] blk.23.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 213/ 255] blk.23.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 214/ 255] blk.23.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 215/ 255] blk.23.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q6_K .. size = 6.00 MiB -> 2.46 MiB
[ 216/ 255] blk.23.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q6_K .. size = 48.00 MiB -> 19.69 MiB
[ 217/ 255] blk.23.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 218/ 255] blk.23.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 219/ 255] blk.23.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 220/ 255] blk.24.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 221/ 255] blk.24.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 222/ 255] blk.24.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 223/ 255] blk.24.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 224/ 255] blk.24.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q6_K .. size = 6.00 MiB -> 2.46 MiB
[ 225/ 255] blk.24.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q6_K .. size = 48.00 MiB -> 19.69 MiB
[ 226/ 255] blk.24.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 227/ 255] blk.24.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 228/ 255] blk.24.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 229/ 255] blk.25.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 230/ 255] blk.25.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 231/ 255] blk.25.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 232/ 255] blk.25.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 233/ 255] blk.25.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q6_K .. size = 6.00 MiB -> 2.46 MiB
[ 234/ 255] blk.25.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q6_K .. size = 48.00 MiB -> 19.69 MiB
[ 235/ 255] blk.25.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 236/ 255] blk.25.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 237/ 255] blk.25.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 238/ 255] blk.26.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 239/ 255] blk.26.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 240/ 255] blk.26.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 241/ 255] blk.26.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 242/ 255] blk.26.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q6_K .. size = 6.00 MiB -> 2.46 MiB
[ 243/ 255] blk.26.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q6_K .. size = 48.00 MiB -> 19.69 MiB
[ 244/ 255] blk.26.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 245/ 255] blk.26.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 246/ 255] blk.26.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 247/ 255] blk.27.attn_k.weight - [ 3072, 1024, 1, 1], type = f16, converting to q4_K .. size = 6.00 MiB -> 1.69 MiB
[ 248/ 255] blk.27.attn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 249/ 255] blk.27.attn_output.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 250/ 255] blk.27.attn_q.weight - [ 3072, 3072, 1, 1], type = f16, converting to q4_K .. size = 18.00 MiB -> 5.06 MiB
[ 251/ 255] blk.27.attn_v.weight - [ 3072, 1024, 1, 1], type = f16, converting to q6_K .. size = 6.00 MiB -> 2.46 MiB
[ 252/ 255] blk.27.ffn_down.weight - [ 8192, 3072, 1, 1], type = f16, converting to q6_K .. size = 48.00 MiB -> 19.69 MiB
[ 253/ 255] blk.27.ffn_gate.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
[ 254/ 255] blk.27.ffn_norm.weight - [ 3072, 1, 1, 1], type = f32, size = 0.012 MB
[ 255/ 255] blk.27.ffn_up.weight - [ 3072, 8192, 1, 1], type = f16, converting to q4_K .. size = 48.00 MiB -> 13.50 MiB
llama_model_quantize_impl: model size = 6128.17 MB
llama_model_quantize_impl: quant size = 1918.35 MB
main: quantize time = 376145.88 ms
main: total time = 376145.88 ms
Unsloth: Conversion completed! Output location: /content/drive/MyDrive/lora_model/q4_k_m/unsloth.Q4_K_M.gguf
Unsloth: Saved Ollama Modelfile to /content/drive/MyDrive/lora_model/q4_k_m/Modelfile