Unsloth + Llama3.2:3b 微調測試#

基本設定#

連接 Google Drive#

[1]:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive

安裝套件#

[2]:
%%capture
import os, re
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    import torch; v = re.match(r"[0-9\.]{3,}", str(torch.__version__)).group(0)
    xformers = "xformers==" + ("0.0.32.post2" if v == "2.8.0" else "0.0.29.post3")
    !pip install --no-deps bitsandbytes accelerate {xformers} peft trl triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets>=3.4.1,<4.0.0" "huggingface_hub>=0.34.0" hf_transfer
    !pip install --no-deps unsloth
!pip install transformers==4.55.4

使用 Unsloth 載入並初始化 4-bit 模型#

  • 這段程式碼使用 Unsloth 框架載入 LLM

  • 透過 FastLanguageModel.from_pretrained() 下載並初始化模型與 tokenizer

  • dtype=None 會自動依 GPU 選擇最佳精度 (FP16 / BF16)

  • load_in_4bit=True 表示使用 4-bit 量化,降低 VRAM 占用,加快載入與推論速度

  • fourbit_models 列出了官方預先量化的模型,下載更快、更省資源

[3]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 2x faster
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # 4bit for 405b!
    "unsloth/Mistral-Small-Instruct-2409",     # Mistral 22b 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3.5-mini-instruct",           # Phi-3.5 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!

    "unsloth/Llama-3.2-1B-bnb-4bit",           # NEW! Llama 3.2 models
    "unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
    "unsloth/Llama-3.2-3B-bnb-4bit",
    "unsloth/Llama-3.2-3B-Instruct-bnb-4bit",

    "unsloth/Llama-3.3-70B-Instruct-bnb-4bit" # NEW! Llama 3.3 70B!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-3B-Instruct", # or choose "unsloth/Llama-3.2-1B-Instruct"
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.9.1: Fast Llama patching. Transformers: 4.55.4.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!

載入 PEFT#

  • FastLanguageModel.get_peft_model:這裡把 base model 套上 LoRA adapter

  • r: rank(越高越準、越耗記憶體);lora_alpha: scaling;lora_dropout: 避免過擬合;

  • target_modules: 針對 Attention/MLP 的哪些 Linear 層套 LoRA(依模型結構而定)

[4]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)
Unsloth 2025.9.1 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.

資料集準備#

[ ]:
!wget https://hsiangjenli.github.io/2025-it-help-ironman/_static/data/data.json -O /content/data.json
[5]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3.1",
)

def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
    return { "text" : texts, }
pass
[6]:
from datasets import load_dataset
from unsloth.chat_templates import standardize_sharegpt

dataset = load_dataset(
    "json",
    data_files={"train": "/content/data.json"},
    split="train"
)

def to_conversations(example):
    return {
        "conversations": [
            {"from": "human", "value": example["eng"]},
            {"from": "gpt",   "value": example["zh-tw"]},
        ]
    }

dataset = dataset.map(to_conversations)

# 轉換成 standardize_sharegpt 格式,格式:("role","content")
dataset = standardize_sharegpt(dataset)

# 建立 chat template,與官方教學一致用 conversations 來產出 text 欄位
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("unsloth/llama-3.2-3b-bnb-4bit")
tokenizer = get_chat_template(tokenizer, chat_template="llama-3.1")  # 依你的模型調整

def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [
        tokenizer.apply_chat_template(
            convo, tokenize=False, add_generation_prompt=False
        )
        for convo in convos
    ]
    return {"text": texts}

dataset = dataset.map(formatting_prompts_func, batched=True, remove_columns=dataset.column_names)
[7]:
dataset[5]["text"]
[7]:
'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n**Iterators terminating on the shortest input sequence:**<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n**在最短輸入序列 (shortest input sequence) 處終止的疊代器:**<|eot_id|>'

訓練模型#

訓練參數設定#

[8]:
from trl import SFTConfig, SFTTrainer
from transformers import DataCollatorForSeq2Seq
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    packing = False, # Can make training 5x faster for short sequences.
    args = SFTConfig(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 60,
        learning_rate = 2e-4,
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)
[9]:
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n",
    response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n",
)
[10]:
# 查看轉換過後的結果
tokenizer.decode(trainer.train_dataset[5]["input_ids"])
[10]:
'<|begin_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n**Iterators terminating on the shortest input sequence:**<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n**在最短輸入序列 (shortest input sequence) 處終止的疊代器:**<|eot_id|>'
[11]:
space = tokenizer(" ", add_special_tokens = False).input_ids[0]
tokenizer.decode([space if x == -100 else x for x in trainer.train_dataset[5]["labels"]])
[11]:
'                                              **在最短輸入序列 (shortest input sequence) 處終止的疊代器:**<|eot_id|>'

資源查看#

[12]:
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")
GPU = Tesla T4. Max memory = 14.741 GB.
3.07 GB of memory reserved.

開始微調#

[13]:
trainer_stats = trainer.train()
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 74 | Num Epochs = 6 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 24,313,856 of 3,237,063,680 (0.75% trained)
Unsloth: Will smartly offload gradients to save VRAM!
[60/60 02:35, Epoch 6/6]
Step Training Loss
1 2.099300
2 2.319000
3 2.232800
4 2.140000
5 2.085500
6 2.246200
7 1.538600
8 1.617900
9 2.000900
10 2.164600
11 1.288900
12 1.229000
13 1.114200
14 1.087900
15 0.970900
16 1.049800
17 1.039700
18 0.754300
19 0.766500
20 1.161000
21 0.624900
22 0.547300
23 0.836900
24 0.599400
25 0.602800
26 0.555900
27 0.619000
28 0.496500
29 0.567800
30 0.258900
31 0.287000
32 0.321900
33 0.481500
34 0.334100
35 0.389700
36 0.337600
37 0.357400
38 0.308100
39 0.365500
40 0.395700
41 0.161800
42 0.222900
43 0.202700
44 0.240600
45 0.274600
46 0.201500
47 0.323600
48 0.203300
49 0.204800
50 0.101700
51 0.157600
52 0.115400
53 0.143100
54 0.217000
55 0.141400
56 0.202100
57 0.078900
58 0.156400
59 0.126000
60 0.220900

查看花費時間#

[14]:
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")
175.4575 seconds used for training.
2.92 minutes used for training.
Peak reserved memory = 3.119 GB.
Peak reserved memory for training = 0.049 GB.
Peak reserved memory % of max memory = 21.159 %.
Peak reserved memory for training % of max memory = 0.332 %.

運行微調後的模型結果#

[15]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"role": "user", "content": "timer file descriptor HOWTO"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 128,
                   use_cache = True, temperature = 1.5, min_p = 0.1)
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
定時器文件 descriptor教材<|eot_id|>

儲存微調後的模型#

[16]:
model.save_pretrained("/content/drive/MyDrive/lora_model")
tokenizer.save_pretrained("/content/drive/MyDrive/lora_model")
[16]:
('/content/drive/MyDrive/lora_model/tokenizer_config.json',
 '/content/drive/MyDrive/lora_model/special_tokens_map.json',
 '/content/drive/MyDrive/lora_model/chat_template.jinja',
 '/content/drive/MyDrive/lora_model/tokenizer.json')

轉換成 GGUF#

[17]:
!git clone --recursive https://github.com/ggerganov/llama.cpp
!(cd llama.cpp; cmake -B build;cmake --build build --config Release)
Cloning into 'llama.cpp'...
remote: Enumerating objects: 61208, done.
remote: Counting objects: 100% (153/153), done.
remote: Compressing objects: 100% (88/88), done.
remote: Total 61208 (delta 99), reused 72 (delta 65), pack-reused 61055 (from 3)
Receiving objects: 100% (61208/61208), 151.79 MiB | 17.08 MiB/s, done.
Resolving deltas: 100% (44461/44461), done.
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMAKE_BUILD_TYPE=Release
-- Found Git: /usr/bin/git (found version "2.34.1")
-- The ASM compiler identification is GNU
-- Found assembler: /usr/bin/cc
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- GGML_SYSTEM_ARCH: x86
-- Including CPU backend
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- x86 detected
-- Adding CPU backend variant ggml-cpu: -march=native
-- ggml version: 0.0.6403
-- ggml commit:  3b15924d
-- Found CURL: /usr/lib/x86_64-linux-gnu/libcurl.so (found version "7.81.0")
-- Configuring done (1.3s)
-- Generating done (0.2s)
-- Build files have been written to: /content/llama.cpp/build
[  1%] Building C object ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o
[  1%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml.cpp.o
[  2%] Building C object ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o
[  2%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o
[  2%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml-opt.cpp.o
[  3%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o
[  3%] Building C object ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o
[  4%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/gguf.cpp.o
[  4%] Linking CXX shared library ../../bin/libggml-base.so
[  4%] Built target ggml-base
[  4%] Building C object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o
[  5%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.cpp.o
[  5%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/repack.cpp.o
[  6%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/hbm.cpp.o
[  6%] Building C object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/quants.c.o
[  6%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/traits.cpp.o
[  7%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/amx.cpp.o
[  7%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o
[  8%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/binary-ops.cpp.o
[  8%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/unary-ops.cpp.o
[  8%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/vec.cpp.o
[  9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ops.cpp.o
[  9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/llamafile/sgemm.cpp.o
[ 10%] Building C object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/arch/x86/quants.c.o
[ 10%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/arch/x86/repack.cpp.o
[ 10%] Linking CXX shared library ../../bin/libggml-cpu.so
[ 10%] Built target ggml-cpu
[ 10%] Building CXX object ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o
[ 10%] Linking CXX shared library ../../bin/libggml.so
[ 10%] Built target ggml
[ 11%] Building CXX object src/CMakeFiles/llama.dir/llama.cpp.o
[ 11%] Building CXX object src/CMakeFiles/llama.dir/llama-adapter.cpp.o
[ 12%] Building CXX object src/CMakeFiles/llama.dir/llama-arch.cpp.o
[ 12%] Building CXX object src/CMakeFiles/llama.dir/llama-batch.cpp.o
[ 12%] Building CXX object src/CMakeFiles/llama.dir/llama-chat.cpp.o
[ 13%] Building CXX object src/CMakeFiles/llama.dir/llama-context.cpp.o
[ 13%] Building CXX object src/CMakeFiles/llama.dir/llama-cparams.cpp.o
[ 14%] Building CXX object src/CMakeFiles/llama.dir/llama-grammar.cpp.o
[ 14%] Building CXX object src/CMakeFiles/llama.dir/llama-graph.cpp.o
[ 14%] Building CXX object src/CMakeFiles/llama.dir/llama-hparams.cpp.o
[ 15%] Building CXX object src/CMakeFiles/llama.dir/llama-impl.cpp.o
[ 15%] Building CXX object src/CMakeFiles/llama.dir/llama-io.cpp.o
[ 16%] Building CXX object src/CMakeFiles/llama.dir/llama-kv-cache.cpp.o
[ 16%] Building CXX object src/CMakeFiles/llama.dir/llama-kv-cache-iswa.cpp.o
[ 16%] Building CXX object src/CMakeFiles/llama.dir/llama-memory.cpp.o
[ 17%] Building CXX object src/CMakeFiles/llama.dir/llama-memory-hybrid.cpp.o
[ 17%] Building CXX object src/CMakeFiles/llama.dir/llama-memory-recurrent.cpp.o
[ 18%] Building CXX object src/CMakeFiles/llama.dir/llama-mmap.cpp.o
[ 18%] Building CXX object src/CMakeFiles/llama.dir/llama-model-loader.cpp.o
[ 18%] Building CXX object src/CMakeFiles/llama.dir/llama-model-saver.cpp.o
[ 19%] Building CXX object src/CMakeFiles/llama.dir/llama-model.cpp.o
[ 19%] Building CXX object src/CMakeFiles/llama.dir/llama-quant.cpp.o
[ 20%] Building CXX object src/CMakeFiles/llama.dir/llama-sampling.cpp.o
[ 20%] Building CXX object src/CMakeFiles/llama.dir/llama-vocab.cpp.o
[ 20%] Building CXX object src/CMakeFiles/llama.dir/unicode-data.cpp.o
[ 21%] Building CXX object src/CMakeFiles/llama.dir/unicode.cpp.o
[ 21%] Linking CXX shared library ../bin/libllama.so
[ 21%] Built target llama
[ 21%] Building CXX object common/CMakeFiles/build_info.dir/build-info.cpp.o
[ 21%] Built target build_info
[ 21%] Building CXX object common/CMakeFiles/common.dir/arg.cpp.o
[ 22%] Building CXX object common/CMakeFiles/common.dir/chat-parser.cpp.o
[ 22%] Building CXX object common/CMakeFiles/common.dir/chat.cpp.o
[ 23%] Building CXX object common/CMakeFiles/common.dir/common.cpp.o
[ 23%] Building CXX object common/CMakeFiles/common.dir/console.cpp.o
[ 23%] Building CXX object common/CMakeFiles/common.dir/json-partial.cpp.o
[ 24%] Building CXX object common/CMakeFiles/common.dir/json-schema-to-grammar.cpp.o
[ 24%] Building CXX object common/CMakeFiles/common.dir/llguidance.cpp.o
[ 25%] Building CXX object common/CMakeFiles/common.dir/log.cpp.o
[ 25%] Building CXX object common/CMakeFiles/common.dir/ngram-cache.cpp.o
[ 25%] Building CXX object common/CMakeFiles/common.dir/regex-partial.cpp.o
[ 26%] Building CXX object common/CMakeFiles/common.dir/sampling.cpp.o
[ 26%] Building CXX object common/CMakeFiles/common.dir/speculative.cpp.o
[ 27%] Linking CXX static library libcommon.a
[ 27%] Built target common
[ 27%] Building CXX object tests/CMakeFiles/test-tokenizer-0.dir/test-tokenizer-0.cpp.o
[ 27%] Linking CXX executable ../bin/test-tokenizer-0
[ 27%] Built target test-tokenizer-0
[ 28%] Building CXX object tests/CMakeFiles/test-sampling.dir/test-sampling.cpp.o
[ 28%] Building CXX object tests/CMakeFiles/test-sampling.dir/get-model.cpp.o
[ 28%] Linking CXX executable ../bin/test-sampling
[ 28%] Built target test-sampling
[ 28%] Building CXX object tests/CMakeFiles/test-grammar-parser.dir/test-grammar-parser.cpp.o
[ 29%] Building CXX object tests/CMakeFiles/test-grammar-parser.dir/get-model.cpp.o
[ 29%] Linking CXX executable ../bin/test-grammar-parser
[ 29%] Built target test-grammar-parser
[ 29%] Building CXX object tests/CMakeFiles/test-grammar-integration.dir/test-grammar-integration.cpp.o
[ 30%] Building CXX object tests/CMakeFiles/test-grammar-integration.dir/get-model.cpp.o
[ 30%] Linking CXX executable ../bin/test-grammar-integration
[ 30%] Built target test-grammar-integration
[ 30%] Building CXX object tests/CMakeFiles/test-llama-grammar.dir/test-llama-grammar.cpp.o
[ 30%] Building CXX object tests/CMakeFiles/test-llama-grammar.dir/get-model.cpp.o
[ 31%] Linking CXX executable ../bin/test-llama-grammar
[ 31%] Built target test-llama-grammar
[ 32%] Building CXX object tests/CMakeFiles/test-chat.dir/test-chat.cpp.o
[ 32%] Building CXX object tests/CMakeFiles/test-chat.dir/get-model.cpp.o
[ 32%] Linking CXX executable ../bin/test-chat
[ 32%] Built target test-chat
[ 33%] Building CXX object tests/CMakeFiles/test-json-schema-to-grammar.dir/test-json-schema-to-grammar.cpp.o
[ 33%] Building CXX object tests/CMakeFiles/test-json-schema-to-grammar.dir/get-model.cpp.o
[ 34%] Linking CXX executable ../bin/test-json-schema-to-grammar
[ 34%] Built target test-json-schema-to-grammar
[ 34%] Building CXX object tests/CMakeFiles/test-quantize-stats.dir/test-quantize-stats.cpp.o
[ 35%] Linking CXX executable ../bin/test-quantize-stats
[ 35%] Built target test-quantize-stats
[ 35%] Building CXX object tests/CMakeFiles/test-gbnf-validator.dir/test-gbnf-validator.cpp.o
[ 36%] Linking CXX executable ../bin/test-gbnf-validator
[ 36%] Built target test-gbnf-validator
[ 37%] Building CXX object tests/CMakeFiles/test-tokenizer-1-bpe.dir/test-tokenizer-1-bpe.cpp.o
[ 37%] Linking CXX executable ../bin/test-tokenizer-1-bpe
[ 37%] Built target test-tokenizer-1-bpe
[ 38%] Building CXX object tests/CMakeFiles/test-tokenizer-1-spm.dir/test-tokenizer-1-spm.cpp.o
[ 38%] Linking CXX executable ../bin/test-tokenizer-1-spm
[ 38%] Built target test-tokenizer-1-spm
[ 39%] Building CXX object tests/CMakeFiles/test-chat-parser.dir/test-chat-parser.cpp.o
[ 39%] Building CXX object tests/CMakeFiles/test-chat-parser.dir/get-model.cpp.o
[ 40%] Linking CXX executable ../bin/test-chat-parser
[ 40%] Built target test-chat-parser
[ 40%] Building CXX object tests/CMakeFiles/test-chat-template.dir/test-chat-template.cpp.o
[ 40%] Building CXX object tests/CMakeFiles/test-chat-template.dir/get-model.cpp.o
[ 41%] Linking CXX executable ../bin/test-chat-template
[ 41%] Built target test-chat-template
[ 42%] Building CXX object tests/CMakeFiles/test-json-partial.dir/test-json-partial.cpp.o
[ 42%] Building CXX object tests/CMakeFiles/test-json-partial.dir/get-model.cpp.o
[ 42%] Linking CXX executable ../bin/test-json-partial
[ 42%] Built target test-json-partial
[ 42%] Building CXX object tests/CMakeFiles/test-log.dir/test-log.cpp.o
[ 43%] Building CXX object tests/CMakeFiles/test-log.dir/get-model.cpp.o
[ 43%] Linking CXX executable ../bin/test-log
[ 43%] Built target test-log
[ 43%] Building CXX object tests/CMakeFiles/test-regex-partial.dir/test-regex-partial.cpp.o
[ 44%] Building CXX object tests/CMakeFiles/test-regex-partial.dir/get-model.cpp.o
[ 44%] Linking CXX executable ../bin/test-regex-partial
[ 44%] Built target test-regex-partial
[ 45%] Building CXX object tests/CMakeFiles/test-thread-safety.dir/test-thread-safety.cpp.o
[ 45%] Building CXX object tests/CMakeFiles/test-thread-safety.dir/get-model.cpp.o
[ 46%] Linking CXX executable ../bin/test-thread-safety
[ 46%] Built target test-thread-safety
[ 46%] Building CXX object tests/CMakeFiles/test-arg-parser.dir/test-arg-parser.cpp.o
[ 46%] Building CXX object tests/CMakeFiles/test-arg-parser.dir/get-model.cpp.o
[ 47%] Linking CXX executable ../bin/test-arg-parser
[ 47%] Built target test-arg-parser
[ 48%] Building CXX object tests/CMakeFiles/test-opt.dir/test-opt.cpp.o
[ 48%] Building CXX object tests/CMakeFiles/test-opt.dir/get-model.cpp.o
[ 49%] Linking CXX executable ../bin/test-opt
[ 49%] Built target test-opt
[ 49%] Building CXX object tests/CMakeFiles/test-gguf.dir/test-gguf.cpp.o
[ 49%] Building CXX object tests/CMakeFiles/test-gguf.dir/get-model.cpp.o
[ 50%] Linking CXX executable ../bin/test-gguf
[ 50%] Built target test-gguf
[ 50%] Building CXX object tests/CMakeFiles/test-backend-ops.dir/test-backend-ops.cpp.o
[ 51%] Building CXX object tests/CMakeFiles/test-backend-ops.dir/get-model.cpp.o
[ 51%] Linking CXX executable ../bin/test-backend-ops
[ 51%] Built target test-backend-ops
[ 51%] Building CXX object tests/CMakeFiles/test-model-load-cancel.dir/test-model-load-cancel.cpp.o
[ 52%] Building CXX object tests/CMakeFiles/test-model-load-cancel.dir/get-model.cpp.o
[ 52%] Linking CXX executable ../bin/test-model-load-cancel
[ 52%] Built target test-model-load-cancel
[ 52%] Building CXX object tests/CMakeFiles/test-autorelease.dir/test-autorelease.cpp.o
[ 53%] Building CXX object tests/CMakeFiles/test-autorelease.dir/get-model.cpp.o
[ 53%] Linking CXX executable ../bin/test-autorelease
[ 53%] Built target test-autorelease
[ 54%] Building CXX object tests/CMakeFiles/test-barrier.dir/test-barrier.cpp.o
[ 54%] Building CXX object tests/CMakeFiles/test-barrier.dir/get-model.cpp.o
[ 54%] Linking CXX executable ../bin/test-barrier
[ 54%] Built target test-barrier
[ 54%] Building CXX object tests/CMakeFiles/test-quantize-fns.dir/test-quantize-fns.cpp.o
[ 54%] Building CXX object tests/CMakeFiles/test-quantize-fns.dir/get-model.cpp.o
[ 55%] Linking CXX executable ../bin/test-quantize-fns
[ 55%] Built target test-quantize-fns
[ 55%] Building CXX object tests/CMakeFiles/test-quantize-perf.dir/test-quantize-perf.cpp.o
[ 56%] Building CXX object tests/CMakeFiles/test-quantize-perf.dir/get-model.cpp.o
[ 56%] Linking CXX executable ../bin/test-quantize-perf
[ 56%] Built target test-quantize-perf
[ 56%] Building CXX object tests/CMakeFiles/test-rope.dir/test-rope.cpp.o
[ 57%] Building CXX object tests/CMakeFiles/test-rope.dir/get-model.cpp.o
[ 57%] Linking CXX executable ../bin/test-rope
[ 57%] Built target test-rope
[ 57%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/mtmd.cpp.o
[ 58%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/mtmd-audio.cpp.o
[ 58%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/clip.cpp.o
[ 58%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/mtmd-helper.cpp.o
[ 59%] Linking CXX shared library ../../bin/libmtmd.so
[ 59%] Built target mtmd
[ 60%] Building C object tests/CMakeFiles/test-mtmd-c-api.dir/test-mtmd-c-api.c.o
[ 60%] Building CXX object tests/CMakeFiles/test-mtmd-c-api.dir/get-model.cpp.o
[ 60%] Linking CXX executable ../bin/test-mtmd-c-api
[ 60%] Built target test-mtmd-c-api
[ 61%] Building C object tests/CMakeFiles/test-c.dir/test-c.c.o
[ 61%] Linking C executable ../bin/test-c
[ 61%] Built target test-c
[ 62%] Building CXX object examples/batched/CMakeFiles/llama-batched.dir/batched.cpp.o
[ 62%] Linking CXX executable ../../bin/llama-batched
[ 62%] Built target llama-batched
[ 62%] Building CXX object examples/embedding/CMakeFiles/llama-embedding.dir/embedding.cpp.o
[ 63%] Linking CXX executable ../../bin/llama-embedding
[ 63%] Built target llama-embedding
[ 63%] Building CXX object examples/eval-callback/CMakeFiles/llama-eval-callback.dir/eval-callback.cpp.o
[ 63%] Linking CXX executable ../../bin/llama-eval-callback
[ 63%] Built target llama-eval-callback
[ 64%] Building C object examples/gguf-hash/CMakeFiles/sha256.dir/deps/sha256/sha256.c.o
[ 64%] Built target sha256
[ 65%] Building C object examples/gguf-hash/CMakeFiles/xxhash.dir/deps/xxhash/xxhash.c.o
[ 65%] Built target xxhash
[ 65%] Building C object examples/gguf-hash/CMakeFiles/sha1.dir/deps/sha1/sha1.c.o
[ 65%] Built target sha1
[ 66%] Building CXX object examples/gguf-hash/CMakeFiles/llama-gguf-hash.dir/gguf-hash.cpp.o
[ 66%] Linking CXX executable ../../bin/llama-gguf-hash
[ 66%] Built target llama-gguf-hash
[ 66%] Building CXX object examples/gguf/CMakeFiles/llama-gguf.dir/gguf.cpp.o
[ 66%] Linking CXX executable ../../bin/llama-gguf
[ 66%] Built target llama-gguf
[ 66%] Building CXX object examples/gritlm/CMakeFiles/llama-gritlm.dir/gritlm.cpp.o
[ 67%] Linking CXX executable ../../bin/llama-gritlm
[ 67%] Built target llama-gritlm
[ 68%] Building CXX object examples/lookahead/CMakeFiles/llama-lookahead.dir/lookahead.cpp.o
[ 68%] Linking CXX executable ../../bin/llama-lookahead
[ 68%] Built target llama-lookahead
[ 68%] Building CXX object examples/lookup/CMakeFiles/llama-lookup.dir/lookup.cpp.o
[ 69%] Linking CXX executable ../../bin/llama-lookup
[ 69%] Built target llama-lookup
[ 69%] Building CXX object examples/lookup/CMakeFiles/llama-lookup-create.dir/lookup-create.cpp.o
[ 70%] Linking CXX executable ../../bin/llama-lookup-create
[ 70%] Built target llama-lookup-create
[ 70%] Building CXX object examples/lookup/CMakeFiles/llama-lookup-merge.dir/lookup-merge.cpp.o
[ 70%] Linking CXX executable ../../bin/llama-lookup-merge
[ 70%] Built target llama-lookup-merge
[ 71%] Building CXX object examples/lookup/CMakeFiles/llama-lookup-stats.dir/lookup-stats.cpp.o
[ 71%] Linking CXX executable ../../bin/llama-lookup-stats
[ 71%] Built target llama-lookup-stats
[ 71%] Building CXX object examples/parallel/CMakeFiles/llama-parallel.dir/parallel.cpp.o
[ 72%] Linking CXX executable ../../bin/llama-parallel
[ 72%] Built target llama-parallel
[ 72%] Building CXX object examples/passkey/CMakeFiles/llama-passkey.dir/passkey.cpp.o
[ 73%] Linking CXX executable ../../bin/llama-passkey
[ 73%] Built target llama-passkey
[ 73%] Building CXX object examples/retrieval/CMakeFiles/llama-retrieval.dir/retrieval.cpp.o
[ 74%] Linking CXX executable ../../bin/llama-retrieval
[ 74%] Built target llama-retrieval
[ 74%] Building CXX object examples/save-load-state/CMakeFiles/llama-save-load-state.dir/save-load-state.cpp.o
[ 75%] Linking CXX executable ../../bin/llama-save-load-state
[ 75%] Built target llama-save-load-state
[ 76%] Building CXX object examples/simple/CMakeFiles/llama-simple.dir/simple.cpp.o
[ 76%] Linking CXX executable ../../bin/llama-simple
[ 76%] Built target llama-simple
[ 76%] Building CXX object examples/simple-chat/CMakeFiles/llama-simple-chat.dir/simple-chat.cpp.o
[ 77%] Linking CXX executable ../../bin/llama-simple-chat
[ 77%] Built target llama-simple-chat
[ 77%] Building CXX object examples/speculative/CMakeFiles/llama-speculative.dir/speculative.cpp.o
[ 78%] Linking CXX executable ../../bin/llama-speculative
[ 78%] Built target llama-speculative
[ 78%] Building CXX object examples/speculative-simple/CMakeFiles/llama-speculative-simple.dir/speculative-simple.cpp.o
[ 78%] Linking CXX executable ../../bin/llama-speculative-simple
[ 78%] Built target llama-speculative-simple
[ 78%] Building CXX object examples/gen-docs/CMakeFiles/llama-gen-docs.dir/gen-docs.cpp.o
[ 79%] Linking CXX executable ../../bin/llama-gen-docs
[ 79%] Built target llama-gen-docs
[ 80%] Building CXX object examples/training/CMakeFiles/llama-finetune.dir/finetune.cpp.o
[ 80%] Linking CXX executable ../../bin/llama-finetune
[ 80%] Built target llama-finetune
[ 80%] Building CXX object examples/diffusion/CMakeFiles/llama-diffusion-cli.dir/diffusion-cli.cpp.o
[ 81%] Linking CXX executable ../../bin/llama-diffusion-cli
[ 81%] Built target llama-diffusion-cli
[ 82%] Building CXX object examples/model-conversion/CMakeFiles/llama-logits.dir/logits.cpp.o
[ 82%] Linking CXX executable ../../bin/llama-logits
[ 82%] Built target llama-logits
[ 83%] Building CXX object examples/convert-llama2c-to-ggml/CMakeFiles/llama-convert-llama2c-to-ggml.dir/convert-llama2c-to-ggml.cpp.o
[ 83%] Linking CXX executable ../../bin/llama-convert-llama2c-to-ggml
[ 83%] Built target llama-convert-llama2c-to-ggml
[ 83%] Building CXX object pocs/vdot/CMakeFiles/llama-vdot.dir/vdot.cpp.o
[ 84%] Linking CXX executable ../../bin/llama-vdot
[ 84%] Built target llama-vdot
[ 85%] Building CXX object pocs/vdot/CMakeFiles/llama-q8dot.dir/q8dot.cpp.o
[ 85%] Linking CXX executable ../../bin/llama-q8dot
[ 85%] Built target llama-q8dot
[ 85%] Building CXX object tools/batched-bench/CMakeFiles/llama-batched-bench.dir/batched-bench.cpp.o
[ 86%] Linking CXX executable ../../bin/llama-batched-bench
[ 86%] Built target llama-batched-bench
[ 87%] Building CXX object tools/gguf-split/CMakeFiles/llama-gguf-split.dir/gguf-split.cpp.o
[ 87%] Linking CXX executable ../../bin/llama-gguf-split
[ 87%] Built target llama-gguf-split
[ 87%] Building CXX object tools/imatrix/CMakeFiles/llama-imatrix.dir/imatrix.cpp.o
[ 88%] Linking CXX executable ../../bin/llama-imatrix
[ 88%] Built target llama-imatrix
[ 88%] Building CXX object tools/llama-bench/CMakeFiles/llama-bench.dir/llama-bench.cpp.o
[ 89%] Linking CXX executable ../../bin/llama-bench
[ 89%] Built target llama-bench
[ 89%] Building CXX object tools/main/CMakeFiles/llama-cli.dir/main.cpp.o
[ 89%] Linking CXX executable ../../bin/llama-cli
[ 89%] Built target llama-cli
[ 89%] Building CXX object tools/perplexity/CMakeFiles/llama-perplexity.dir/perplexity.cpp.o
[ 89%] Linking CXX executable ../../bin/llama-perplexity
[ 89%] Built target llama-perplexity
[ 90%] Building CXX object tools/quantize/CMakeFiles/llama-quantize.dir/quantize.cpp.o
[ 90%] Linking CXX executable ../../bin/llama-quantize
[ 90%] Built target llama-quantize
[ 90%] Generating loading.html.hpp
[ 90%] Generating index.html.gz.hpp
[ 91%] Building CXX object tools/server/CMakeFiles/llama-server.dir/server.cpp.o
[ 91%] Linking CXX executable ../../bin/llama-server
[ 91%] Built target llama-server
[ 91%] Building CXX object tools/run/CMakeFiles/llama-run.dir/run.cpp.o
[ 91%] Building CXX object tools/run/CMakeFiles/llama-run.dir/linenoise.cpp/linenoise.cpp.o
[ 92%] Linking CXX executable ../../bin/llama-run
[ 92%] Built target llama-run
[ 93%] Building CXX object tools/tokenize/CMakeFiles/llama-tokenize.dir/tokenize.cpp.o
[ 93%] Linking CXX executable ../../bin/llama-tokenize
[ 93%] Built target llama-tokenize
[ 94%] Building CXX object tools/tts/CMakeFiles/llama-tts.dir/tts.cpp.o
[ 94%] Linking CXX executable ../../bin/llama-tts
[ 94%] Built target llama-tts
[ 94%] Building CXX object tools/mtmd/CMakeFiles/llama-llava-cli.dir/deprecation-warning.cpp.o
[ 94%] Linking CXX executable ../../bin/llama-llava-cli
[ 94%] Built target llama-llava-cli
[ 94%] Building CXX object tools/mtmd/CMakeFiles/llama-gemma3-cli.dir/deprecation-warning.cpp.o
[ 95%] Linking CXX executable ../../bin/llama-gemma3-cli
[ 95%] Built target llama-gemma3-cli
[ 96%] Building CXX object tools/mtmd/CMakeFiles/llama-minicpmv-cli.dir/deprecation-warning.cpp.o
[ 96%] Linking CXX executable ../../bin/llama-minicpmv-cli
[ 96%] Built target llama-minicpmv-cli
[ 96%] Building CXX object tools/mtmd/CMakeFiles/llama-qwen2vl-cli.dir/deprecation-warning.cpp.o
[ 97%] Linking CXX executable ../../bin/llama-qwen2vl-cli
[ 97%] Built target llama-qwen2vl-cli
[ 97%] Building CXX object tools/mtmd/CMakeFiles/llama-mtmd-cli.dir/mtmd-cli.cpp.o
[ 98%] Linking CXX executable ../../bin/llama-mtmd-cli
[ 98%] Built target llama-mtmd-cli
[ 99%] Building CXX object tools/cvector-generator/CMakeFiles/llama-cvector-generator.dir/cvector-generator.cpp.o
[ 99%] Linking CXX executable ../../bin/llama-cvector-generator
[ 99%] Built target llama-cvector-generator
[100%] Building CXX object tools/export-lora/CMakeFiles/llama-export-lora.dir/export-lora.cpp.o
[100%] Linking CXX executable ../../bin/llama-export-lora
[100%] Built target llama-export-lora
[18]:
!wget https://raw.githubusercontent.com/ggml-org/llama.cpp/refs/heads/master/convert_hf_to_gguf.py -O llama.cpp/convert_hf_to_gguf.py
if True: model.save_pretrained_gguf("/content/drive/MyDrive/lora_model/q4_k_m", tokenizer, quantization_method = "q4_k_m")
--2025-09-07 16:06:57--  https://raw.githubusercontent.com/ggml-org/llama.cpp/refs/heads/master/convert_hf_to_gguf.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 419460 (410K) [text/plain]
Saving to: ‘llama.cpp/convert_hf_to_gguf.py’

llama.cpp/convert_h 100%[===================>] 409.63K  --.-KB/s    in 0.003s

2025-09-07 16:06:57 (121 MB/s) - ‘llama.cpp/convert_hf_to_gguf.py’ saved [419460/419460]

Unsloth: You have 1 CPUs. Using `safe_serialization` is 10x slower.
We shall switch to Pytorch saving, which might take 3 minutes and not 30 minutes.
To force `safe_serialization`, set it to `None` instead.
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 2.4G
Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 3.59 out of 12.67 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...
100%|██████████| 28/28 [00:00<00:00, 28.42it/s]
Unsloth: Saving tokenizer... Done.
Unsloth: Saving /content/drive/MyDrive/lora_model/q4_k_m/pytorch_model-00001-of-00002.bin...
Unsloth: Saving /content/drive/MyDrive/lora_model/q4_k_m/pytorch_model-00002-of-00002.bin...
Done.
Unsloth: Converting llama model. Can use fast conversion = False.
==((====))==  Unsloth: Conversion from QLoRA to GGUF information
   \\   /|    [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF 16bits might take 3 minutes.
\        /    [2] Converting GGUF 16bits to ['q4_k_m'] might take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: [1] Converting model at /content/drive/MyDrive/lora_model/q4_k_m into f16 GGUF format.
The output location will be /content/drive/MyDrive/lora_model/q4_k_m/unsloth.F16.gguf
This might take 3 minutes...
INFO:hf-to-gguf:Loading model: q4_k_m
INFO:hf-to-gguf:Model architecture: LlamaForCausalLM
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:rope_freqs.weight,           torch.float32 --> F32, shape = {64}
INFO:hf-to-gguf:gguf: loading model weight map from 'pytorch_model.bin.index.json'
INFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00001-of-00002.bin'
INFO:hf-to-gguf:token_embd.weight,           torch.float16 --> F16, shape = {3072, 128256}
INFO:hf-to-gguf:blk.0.attn_q.weight,         torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.0.attn_k.weight,         torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.0.attn_v.weight,         torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.0.attn_output.weight,    torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,       torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.0.ffn_up.weight,         torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.0.ffn_down.weight,       torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.0.attn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.0.ffn_norm.weight,       torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.1.attn_q.weight,         torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.1.attn_k.weight,         torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.1.attn_v.weight,         torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.1.attn_output.weight,    torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.1.ffn_gate.weight,       torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.1.ffn_up.weight,         torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.1.ffn_down.weight,       torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.1.attn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.1.ffn_norm.weight,       torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.2.attn_q.weight,         torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.2.attn_k.weight,         torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.2.attn_v.weight,         torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.2.attn_output.weight,    torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.2.ffn_gate.weight,       torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.2.ffn_up.weight,         torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.2.ffn_down.weight,       torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.2.attn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.2.ffn_norm.weight,       torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.3.attn_q.weight,         torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.3.attn_k.weight,         torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.3.attn_v.weight,         torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.3.attn_output.weight,    torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.3.ffn_gate.weight,       torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.3.ffn_up.weight,         torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.3.ffn_down.weight,       torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.3.attn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.3.ffn_norm.weight,       torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.4.attn_q.weight,         torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.4.attn_k.weight,         torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.4.attn_v.weight,         torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.4.attn_output.weight,    torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.4.ffn_gate.weight,       torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.4.ffn_up.weight,         torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.4.ffn_down.weight,       torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.4.attn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.4.ffn_norm.weight,       torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.5.attn_q.weight,         torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.5.attn_k.weight,         torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.5.attn_v.weight,         torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.5.attn_output.weight,    torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.5.ffn_gate.weight,       torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.5.ffn_up.weight,         torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.5.ffn_down.weight,       torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.5.attn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.5.ffn_norm.weight,       torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.6.attn_q.weight,         torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.6.attn_k.weight,         torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.6.attn_v.weight,         torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.6.attn_output.weight,    torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.6.ffn_gate.weight,       torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.6.ffn_up.weight,         torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.6.ffn_down.weight,       torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.6.attn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.6.ffn_norm.weight,       torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.7.attn_q.weight,         torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.7.attn_k.weight,         torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.7.attn_v.weight,         torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.7.attn_output.weight,    torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.7.ffn_gate.weight,       torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.7.ffn_up.weight,         torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.7.ffn_down.weight,       torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.7.attn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.7.ffn_norm.weight,       torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.8.attn_q.weight,         torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.8.attn_k.weight,         torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.8.attn_v.weight,         torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.8.attn_output.weight,    torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.8.ffn_gate.weight,       torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.8.ffn_up.weight,         torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.8.ffn_down.weight,       torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.8.attn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.8.ffn_norm.weight,       torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.9.attn_q.weight,         torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.9.attn_k.weight,         torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.9.attn_v.weight,         torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.9.attn_output.weight,    torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.9.ffn_gate.weight,       torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.9.ffn_up.weight,         torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.9.ffn_down.weight,       torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.9.attn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,       torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.10.attn_q.weight,        torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.10.attn_k.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.10.attn_v.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.10.attn_output.weight,   torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.10.ffn_gate.weight,      torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.10.ffn_up.weight,        torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.10.ffn_down.weight,      torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.10.attn_norm.weight,     torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.10.ffn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.11.attn_q.weight,        torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.11.attn_k.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.11.attn_v.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.11.attn_output.weight,   torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.11.ffn_gate.weight,      torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.11.ffn_up.weight,        torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.11.ffn_down.weight,      torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.11.attn_norm.weight,     torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.11.ffn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.12.attn_q.weight,        torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.12.attn_k.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.12.attn_v.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.12.attn_output.weight,   torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.12.ffn_gate.weight,      torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.12.ffn_up.weight,        torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.12.ffn_down.weight,      torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.12.attn_norm.weight,     torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.12.ffn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.13.attn_q.weight,        torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.13.attn_k.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.13.attn_v.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.13.attn_output.weight,   torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.13.ffn_gate.weight,      torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.13.ffn_up.weight,        torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.13.ffn_down.weight,      torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.13.attn_norm.weight,     torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.13.ffn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.14.attn_q.weight,        torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.14.attn_k.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.14.attn_v.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.14.attn_output.weight,   torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.14.ffn_gate.weight,      torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.14.ffn_up.weight,        torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.14.ffn_down.weight,      torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.14.attn_norm.weight,     torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.14.ffn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.15.attn_q.weight,        torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.15.attn_k.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.15.attn_v.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.15.attn_output.weight,   torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.15.ffn_gate.weight,      torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.15.ffn_up.weight,        torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.15.ffn_down.weight,      torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.15.attn_norm.weight,     torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.15.ffn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.16.attn_q.weight,        torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.16.attn_k.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.16.attn_v.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.16.attn_output.weight,   torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.16.ffn_gate.weight,      torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.16.ffn_up.weight,        torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.16.ffn_down.weight,      torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.16.attn_norm.weight,     torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.16.ffn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.17.attn_q.weight,        torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.17.attn_k.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.17.attn_v.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.17.attn_output.weight,   torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.17.ffn_gate.weight,      torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.17.ffn_up.weight,        torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.17.ffn_down.weight,      torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.17.attn_norm.weight,     torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.17.ffn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.18.attn_q.weight,        torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.18.attn_k.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.18.attn_v.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.18.attn_output.weight,   torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.18.ffn_gate.weight,      torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.18.ffn_up.weight,        torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.18.ffn_down.weight,      torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.18.attn_norm.weight,     torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.18.ffn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.19.attn_q.weight,        torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.19.attn_k.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.19.attn_v.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.19.attn_output.weight,   torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.19.ffn_gate.weight,      torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.19.ffn_up.weight,        torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.19.ffn_down.weight,      torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.19.attn_norm.weight,     torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.19.ffn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.20.attn_q.weight,        torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.20.attn_k.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.20.attn_v.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.20.attn_output.weight,   torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.20.ffn_gate.weight,      torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.20.ffn_up.weight,        torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00002-of-00002.bin'
INFO:hf-to-gguf:blk.20.ffn_down.weight,      torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.20.attn_norm.weight,     torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.20.ffn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.21.attn_q.weight,        torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.21.attn_k.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.21.attn_v.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.21.attn_output.weight,   torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.21.ffn_gate.weight,      torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.21.ffn_up.weight,        torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.21.ffn_down.weight,      torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.21.attn_norm.weight,     torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.21.ffn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.22.attn_q.weight,        torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.22.attn_k.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.22.attn_v.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.22.attn_output.weight,   torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.22.ffn_gate.weight,      torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.22.ffn_up.weight,        torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.22.ffn_down.weight,      torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.22.attn_norm.weight,     torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.22.ffn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.23.attn_q.weight,        torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.23.attn_k.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.23.attn_v.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.23.attn_output.weight,   torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.23.ffn_gate.weight,      torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.23.ffn_up.weight,        torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.23.ffn_down.weight,      torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.23.attn_norm.weight,     torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.23.ffn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.24.attn_q.weight,        torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.24.attn_k.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.24.attn_v.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.24.attn_output.weight,   torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.24.ffn_gate.weight,      torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.24.ffn_up.weight,        torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.24.ffn_down.weight,      torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.24.attn_norm.weight,     torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.24.ffn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.25.attn_q.weight,        torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.25.attn_k.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.25.attn_v.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.25.attn_output.weight,   torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.25.ffn_gate.weight,      torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.25.ffn_up.weight,        torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.25.ffn_down.weight,      torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.25.attn_norm.weight,     torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.25.ffn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.26.attn_q.weight,        torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.26.attn_k.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.26.attn_v.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.26.attn_output.weight,   torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.26.ffn_gate.weight,      torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.26.ffn_up.weight,        torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.26.ffn_down.weight,      torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.26.attn_norm.weight,     torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.26.ffn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.27.attn_q.weight,        torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.27.attn_k.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.27.attn_v.weight,        torch.float16 --> F16, shape = {3072, 1024}
INFO:hf-to-gguf:blk.27.attn_output.weight,   torch.float16 --> F16, shape = {3072, 3072}
INFO:hf-to-gguf:blk.27.ffn_gate.weight,      torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.27.ffn_up.weight,        torch.float16 --> F16, shape = {3072, 8192}
INFO:hf-to-gguf:blk.27.ffn_down.weight,      torch.float16 --> F16, shape = {8192, 3072}
INFO:hf-to-gguf:blk.27.attn_norm.weight,     torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.27.ffn_norm.weight,      torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:output_norm.weight,          torch.float16 --> F32, shape = {3072}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 131072
INFO:hf-to-gguf:gguf: embedding length = 3072
INFO:hf-to-gguf:gguf: feed forward length = 8192
INFO:hf-to-gguf:gguf: head count = 24
INFO:hf-to-gguf:gguf: key-value head count = 8
INFO:hf-to-gguf:gguf: rope theta = 500000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
INFO:numexpr.utils:NumExpr defaulting to 2 threads.
WARNING:gguf.vocab:Unknown separator token '<|begin_of_text|>' in TemplateProcessing<pair>
INFO:gguf.vocab:Adding 280147 merge(s).
INFO:gguf.vocab:Setting special token type bos to 128000
INFO:gguf.vocab:Setting special token type eos to 128001
INFO:gguf.vocab:Setting special token type pad to 128004
INFO:gguf.vocab:Setting add_bos_token to True
INFO:gguf.vocab:Setting add_sep_token to False
INFO:gguf.vocab:Setting chat_template to {{- bos_token }}
{%- if custom_tools is defined %}
    {%- set tools = custom_tools %}
{%- endif %}
{%- if not tools_in_user_message is defined %}
    {%- set tools_in_user_message = true %}
{%- endif %}
{%- if not date_string is defined %}
    {%- set date_string = "26 July 2024" %}
{%- endif %}
{%- if not tools is defined %}
    {%- set tools = none %}
{%- endif %}

{#- This block extracts the system message, so we can slot it into the right place. #}
{%- if messages[0]['role'] == 'system' %}
    {%- set system_message = messages[0]['content'] %}
    {%- set messages = messages[1:] %}
{%- else %}
    {%- set system_message = "" %}
{%- endif %}

{#- System message + builtin tools #}
{{- "<|start_header_id|>system<|end_header_id|>

" }}
{%- if builtin_tools is defined or tools is not none %}
    {{- "Environment: ipython
" }}
{%- endif %}
{%- if builtin_tools is defined %}
    {{- "Tools: " + builtin_tools | reject('equalto', 'code_interpreter') | join(", ") + "

"}}
{%- endif %}
{{- "Cutting Knowledge Date: December 2023
" }}
{{- "Today Date: " + date_string + "

" }}
{%- if tools is not none and not tools_in_user_message %}
    {{- "You have access to the following functions. To call a function, please respond with JSON for a function call." }}
    {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
    {{- "Do not use variables.

" }}
    {%- for t in tools %}
        {{- t | tojson(indent=4) }}
        {{- "

" }}
    {%- endfor %}
{%- endif %}
{{- system_message }}
{{- "<|eot_id|>" }}

{#- Custom tools are passed in a user message with some extra guidance #}
{%- if tools_in_user_message and not tools is none %}
    {#- Extract the first user message so we can plug it in here #}
    {%- if messages | length != 0 %}
        {%- set first_user_message = messages[0]['content'] %}
        {%- set messages = messages[1:] %}
    {%- else %}
        {{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
{%- endif %}
    {{- '<|start_header_id|>user<|end_header_id|>

' -}}
    {{- "Given the following functions, please respond with a JSON for a function call " }}
    {{- "with its proper arguments that best answers the given prompt.

" }}
    {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
    {{- "Do not use variables.

" }}
    {%- for t in tools %}
        {{- t | tojson(indent=4) }}
        {{- "

" }}
    {%- endfor %}
    {{- first_user_message + "<|eot_id|>"}}
{%- endif %}

{%- for message in messages %}
    {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
        {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>

'+ message['content'] + '<|eot_id|>' }}
    {%- elif 'tool_calls' in message %}
        {%- if not message.tool_calls|length == 1 %}
            {{- raise_exception("This model only supports single tool-calls at once!") }}
        {%- endif %}
        {%- set tool_call = message.tool_calls[0].function %}
        {%- if builtin_tools is defined and tool_call.name in builtin_tools %}
            {{- '<|start_header_id|>assistant<|end_header_id|>

' -}}
            {{- "<|python_tag|>" + tool_call.name + ".call(" }}
            {%- for arg_name, arg_val in tool_call.arguments | items %}
                {{- arg_name + '="' + arg_val + '"' }}
                {%- if not loop.last %}
                    {{- ", " }}
                {%- endif %}
                {%- endfor %}
            {{- ")" }}
        {%- else  %}
            {{- '<|start_header_id|>assistant<|end_header_id|>

' -}}
            {{- '{"name": "' + tool_call.name + '", ' }}
            {{- '"parameters": ' }}
            {{- tool_call.arguments | tojson }}
            {{- "}" }}
        {%- endif %}
        {%- if builtin_tools is defined %}
            {#- This means we're in ipython mode #}
            {{- "<|eom_id|>" }}
        {%- else %}
            {{- "<|eot_id|>" }}
        {%- endif %}
    {%- elif message.role == "tool" or message.role == "ipython" %}
        {{- "<|start_header_id|>ipython<|end_header_id|>

" }}
        {%- if message.content is mapping or message.content is iterable %}
            {{- message.content | tojson }}
        {%- else %}
            {{- message.content }}
        {%- endif %}
        {{- "<|eot_id|>" }}
    {%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
    {{- '<|start_header_id|>assistant<|end_header_id|>

' }}
{%- endif %}

INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/content/drive/MyDrive/lora_model/q4_k_m/unsloth.F16.gguf: n_tensors = 255, total_size = 6.4G
Writing: 100%|██████████| 6.43G/6.43G [02:40<00:00, 40.1Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to /content/drive/MyDrive/lora_model/q4_k_m/unsloth.F16.gguf
Unsloth: Conversion completed! Output location: /content/drive/MyDrive/lora_model/q4_k_m/unsloth.F16.gguf
Unsloth: [2] Converting GGUF 16bit into q4_k_m. This might take 20 minutes...
main: build = 6403 (3b15924d)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu
main: quantizing '/content/drive/MyDrive/lora_model/q4_k_m/unsloth.F16.gguf' to '/content/drive/MyDrive/lora_model/q4_k_m/unsloth.Q4_K_M.gguf' as Q4_K_M using 4 threads
llama_model_loader: loaded meta data with 32 key-value pairs and 255 tensors from /content/drive/MyDrive/lora_model/q4_k_m/unsloth.F16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Q4_K_M
llama_model_loader: - kv   3:                       general.quantized_by str              = Unsloth
llama_model_loader: - kv   4:                         general.size_label str              = 3.2B
llama_model_loader: - kv   5:                           general.repo_url str              = https://huggingface.co/unsloth
llama_model_loader: - kv   6:                               general.tags arr[str,2]       = ["unsloth", "llama.cpp"]
llama_model_loader: - kv   7:                          llama.block_count u32              = 28
llama_model_loader: - kv   8:                       llama.context_length u32              = 131072
llama_model_loader: - kv   9:                     llama.embedding_length u32              = 3072
llama_model_loader: - kv  10:                  llama.feed_forward_length u32              = 8192
llama_model_loader: - kv  11:                 llama.attention.head_count u32              = 24
llama_model_loader: - kv  12:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  13:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  14:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  15:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  16:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  17:                          general.file_type u32              = 1
llama_model_loader: - kv  18:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  19:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  20:               general.quantization_version u32              = 2
llama_model_loader: - kv  21:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  22:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  23:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  24:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  25:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  26:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  27:                tokenizer.ggml.eos_token_id u32              = 128001
llama_model_loader: - kv  28:            tokenizer.ggml.padding_token_id u32              = 128004
llama_model_loader: - kv  29:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  30:               tokenizer.ggml.add_sep_token bool             = false
llama_model_loader: - kv  31:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - type  f32:   58 tensors
llama_model_loader: - type  f16:  197 tensors
[   1/ 255]                   output_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[   2/ 255]                    rope_freqs.weight - [   64,     1,     1,     1], type =    f32, size =    0.000 MB
[   3/ 255]                    token_embd.weight - [ 3072, 128256,     1,     1], type =    f16, converting to q6_K .. size =   751.50 MiB ->   308.23 MiB
[   4/ 255]                  blk.0.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[   5/ 255]               blk.0.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[   6/ 255]             blk.0.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[   7/ 255]                  blk.0.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[   8/ 255]                  blk.0.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q6_K .. size =     6.00 MiB ->     2.46 MiB
[   9/ 255]                blk.0.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q6_K .. size =    48.00 MiB ->    19.69 MiB
[  10/ 255]                blk.0.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  11/ 255]                blk.0.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  12/ 255]                  blk.0.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  13/ 255]                  blk.1.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[  14/ 255]               blk.1.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  15/ 255]             blk.1.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[  16/ 255]                  blk.1.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[  17/ 255]                  blk.1.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q6_K .. size =     6.00 MiB ->     2.46 MiB
[  18/ 255]                blk.1.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q6_K .. size =    48.00 MiB ->    19.69 MiB
[  19/ 255]                blk.1.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  20/ 255]                blk.1.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  21/ 255]                  blk.1.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  22/ 255]                  blk.2.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[  23/ 255]               blk.2.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  24/ 255]             blk.2.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[  25/ 255]                  blk.2.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[  26/ 255]                  blk.2.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q6_K .. size =     6.00 MiB ->     2.46 MiB
[  27/ 255]                blk.2.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q6_K .. size =    48.00 MiB ->    19.69 MiB
[  28/ 255]                blk.2.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  29/ 255]                blk.2.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  30/ 255]                  blk.2.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  31/ 255]                  blk.3.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[  32/ 255]               blk.3.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  33/ 255]             blk.3.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[  34/ 255]                  blk.3.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[  35/ 255]                  blk.3.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[  36/ 255]                blk.3.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  37/ 255]                blk.3.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  38/ 255]                blk.3.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  39/ 255]                  blk.3.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  40/ 255]                  blk.4.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[  41/ 255]               blk.4.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  42/ 255]             blk.4.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[  43/ 255]                  blk.4.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[  44/ 255]                  blk.4.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[  45/ 255]                blk.4.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  46/ 255]                blk.4.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  47/ 255]                blk.4.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  48/ 255]                  blk.4.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  49/ 255]                  blk.5.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[  50/ 255]               blk.5.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  51/ 255]             blk.5.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[  52/ 255]                  blk.5.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[  53/ 255]                  blk.5.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q6_K .. size =     6.00 MiB ->     2.46 MiB
[  54/ 255]                blk.5.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q6_K .. size =    48.00 MiB ->    19.69 MiB
[  55/ 255]                blk.5.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  56/ 255]                blk.5.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  57/ 255]                  blk.5.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  58/ 255]                  blk.6.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[  59/ 255]               blk.6.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  60/ 255]             blk.6.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[  61/ 255]                  blk.6.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[  62/ 255]                  blk.6.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[  63/ 255]                blk.6.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  64/ 255]                blk.6.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  65/ 255]                blk.6.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  66/ 255]                  blk.6.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  67/ 255]                  blk.7.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[  68/ 255]               blk.7.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  69/ 255]             blk.7.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[  70/ 255]                  blk.7.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[  71/ 255]                  blk.7.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[  72/ 255]                blk.7.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  73/ 255]                blk.7.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  74/ 255]                blk.7.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  75/ 255]                  blk.7.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  76/ 255]                  blk.8.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[  77/ 255]               blk.8.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  78/ 255]             blk.8.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[  79/ 255]                  blk.8.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[  80/ 255]                  blk.8.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q6_K .. size =     6.00 MiB ->     2.46 MiB
[  81/ 255]                blk.8.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q6_K .. size =    48.00 MiB ->    19.69 MiB
[  82/ 255]                blk.8.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  83/ 255]                blk.8.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  84/ 255]                  blk.8.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  85/ 255]                  blk.9.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[  86/ 255]               blk.9.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  87/ 255]             blk.9.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[  88/ 255]                  blk.9.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[  89/ 255]                  blk.9.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[  90/ 255]                blk.9.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  91/ 255]                blk.9.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  92/ 255]                blk.9.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  93/ 255]                  blk.9.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[  94/ 255]                 blk.10.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[  95/ 255]              blk.10.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[  96/ 255]            blk.10.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[  97/ 255]                 blk.10.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[  98/ 255]                 blk.10.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[  99/ 255]               blk.10.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 100/ 255]               blk.10.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 101/ 255]               blk.10.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 102/ 255]                 blk.10.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 103/ 255]                 blk.11.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 104/ 255]              blk.11.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 105/ 255]            blk.11.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 106/ 255]                 blk.11.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 107/ 255]                 blk.11.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q6_K .. size =     6.00 MiB ->     2.46 MiB
[ 108/ 255]               blk.11.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q6_K .. size =    48.00 MiB ->    19.69 MiB
[ 109/ 255]               blk.11.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 110/ 255]               blk.11.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 111/ 255]                 blk.11.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 112/ 255]                 blk.12.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 113/ 255]              blk.12.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 114/ 255]            blk.12.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 115/ 255]                 blk.12.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 116/ 255]                 blk.12.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 117/ 255]               blk.12.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 118/ 255]               blk.12.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 119/ 255]               blk.12.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 120/ 255]                 blk.12.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 121/ 255]                 blk.13.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 122/ 255]              blk.13.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 123/ 255]            blk.13.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 124/ 255]                 blk.13.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 125/ 255]                 blk.13.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 126/ 255]               blk.13.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 127/ 255]               blk.13.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 128/ 255]               blk.13.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 129/ 255]                 blk.13.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 130/ 255]                 blk.14.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 131/ 255]              blk.14.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 132/ 255]            blk.14.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 133/ 255]                 blk.14.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 134/ 255]                 blk.14.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q6_K .. size =     6.00 MiB ->     2.46 MiB
[ 135/ 255]               blk.14.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q6_K .. size =    48.00 MiB ->    19.69 MiB
[ 136/ 255]               blk.14.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 137/ 255]               blk.14.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 138/ 255]                 blk.14.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 139/ 255]                 blk.15.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 140/ 255]              blk.15.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 141/ 255]            blk.15.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 142/ 255]                 blk.15.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 143/ 255]                 blk.15.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 144/ 255]               blk.15.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 145/ 255]               blk.15.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 146/ 255]               blk.15.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 147/ 255]                 blk.15.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 148/ 255]                 blk.16.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 149/ 255]              blk.16.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 150/ 255]            blk.16.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 151/ 255]                 blk.16.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 152/ 255]                 blk.16.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 153/ 255]               blk.16.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 154/ 255]               blk.16.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 155/ 255]               blk.16.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 156/ 255]                 blk.16.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 157/ 255]                 blk.17.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 158/ 255]              blk.17.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 159/ 255]            blk.17.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 160/ 255]                 blk.17.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 161/ 255]                 blk.17.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q6_K .. size =     6.00 MiB ->     2.46 MiB
[ 162/ 255]               blk.17.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q6_K .. size =    48.00 MiB ->    19.69 MiB
[ 163/ 255]               blk.17.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 164/ 255]               blk.17.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 165/ 255]                 blk.17.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 166/ 255]                 blk.18.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 167/ 255]              blk.18.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 168/ 255]            blk.18.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 169/ 255]                 blk.18.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 170/ 255]                 blk.18.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 171/ 255]               blk.18.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 172/ 255]               blk.18.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 173/ 255]               blk.18.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 174/ 255]                 blk.18.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 175/ 255]                 blk.19.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 176/ 255]              blk.19.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 177/ 255]            blk.19.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 178/ 255]                 blk.19.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 179/ 255]                 blk.19.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 180/ 255]               blk.19.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 181/ 255]               blk.19.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 182/ 255]               blk.19.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 183/ 255]                 blk.19.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 184/ 255]                 blk.20.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 185/ 255]              blk.20.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 186/ 255]            blk.20.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 187/ 255]                 blk.20.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 188/ 255]                 blk.20.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q6_K .. size =     6.00 MiB ->     2.46 MiB
[ 189/ 255]               blk.20.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q6_K .. size =    48.00 MiB ->    19.69 MiB
[ 190/ 255]               blk.20.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 191/ 255]               blk.20.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 192/ 255]                 blk.20.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 193/ 255]                 blk.21.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 194/ 255]              blk.21.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 195/ 255]            blk.21.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 196/ 255]                 blk.21.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 197/ 255]                 blk.21.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 198/ 255]               blk.21.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 199/ 255]               blk.21.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 200/ 255]               blk.21.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 201/ 255]                 blk.21.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 202/ 255]                 blk.22.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 203/ 255]              blk.22.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 204/ 255]            blk.22.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 205/ 255]                 blk.22.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 206/ 255]                 blk.22.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 207/ 255]               blk.22.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 208/ 255]               blk.22.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 209/ 255]               blk.22.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 210/ 255]                 blk.22.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 211/ 255]                 blk.23.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 212/ 255]              blk.23.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 213/ 255]            blk.23.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 214/ 255]                 blk.23.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 215/ 255]                 blk.23.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q6_K .. size =     6.00 MiB ->     2.46 MiB
[ 216/ 255]               blk.23.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q6_K .. size =    48.00 MiB ->    19.69 MiB
[ 217/ 255]               blk.23.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 218/ 255]               blk.23.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 219/ 255]                 blk.23.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 220/ 255]                 blk.24.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 221/ 255]              blk.24.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 222/ 255]            blk.24.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 223/ 255]                 blk.24.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 224/ 255]                 blk.24.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q6_K .. size =     6.00 MiB ->     2.46 MiB
[ 225/ 255]               blk.24.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q6_K .. size =    48.00 MiB ->    19.69 MiB
[ 226/ 255]               blk.24.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 227/ 255]               blk.24.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 228/ 255]                 blk.24.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 229/ 255]                 blk.25.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 230/ 255]              blk.25.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 231/ 255]            blk.25.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 232/ 255]                 blk.25.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 233/ 255]                 blk.25.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q6_K .. size =     6.00 MiB ->     2.46 MiB
[ 234/ 255]               blk.25.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q6_K .. size =    48.00 MiB ->    19.69 MiB
[ 235/ 255]               blk.25.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 236/ 255]               blk.25.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 237/ 255]                 blk.25.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 238/ 255]                 blk.26.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 239/ 255]              blk.26.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 240/ 255]            blk.26.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 241/ 255]                 blk.26.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 242/ 255]                 blk.26.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q6_K .. size =     6.00 MiB ->     2.46 MiB
[ 243/ 255]               blk.26.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q6_K .. size =    48.00 MiB ->    19.69 MiB
[ 244/ 255]               blk.26.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 245/ 255]               blk.26.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 246/ 255]                 blk.26.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 247/ 255]                 blk.27.attn_k.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q4_K .. size =     6.00 MiB ->     1.69 MiB
[ 248/ 255]              blk.27.attn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 249/ 255]            blk.27.attn_output.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 250/ 255]                 blk.27.attn_q.weight - [ 3072,  3072,     1,     1], type =    f16, converting to q4_K .. size =    18.00 MiB ->     5.06 MiB
[ 251/ 255]                 blk.27.attn_v.weight - [ 3072,  1024,     1,     1], type =    f16, converting to q6_K .. size =     6.00 MiB ->     2.46 MiB
[ 252/ 255]               blk.27.ffn_down.weight - [ 8192,  3072,     1,     1], type =    f16, converting to q6_K .. size =    48.00 MiB ->    19.69 MiB
[ 253/ 255]               blk.27.ffn_gate.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
[ 254/ 255]               blk.27.ffn_norm.weight - [ 3072,     1,     1,     1], type =    f32, size =    0.012 MB
[ 255/ 255]                 blk.27.ffn_up.weight - [ 3072,  8192,     1,     1], type =    f16, converting to q4_K .. size =    48.00 MiB ->    13.50 MiB
llama_model_quantize_impl: model size  =  6128.17 MB
llama_model_quantize_impl: quant size  =  1918.35 MB

main: quantize time = 376145.88 ms
main:    total time = 376145.88 ms
Unsloth: Conversion completed! Output location: /content/drive/MyDrive/lora_model/q4_k_m/unsloth.Q4_K_M.gguf
Unsloth: Saved Ollama Modelfile to /content/drive/MyDrive/lora_model/q4_k_m/Modelfile