Image by Editor | Ideogram
Let’s learn how to prepare our Hugging Face Models for deployment in mobile.
Preparation
Let’s install the following packages so our tutorial runs well.
pip install onnx onnxruntime onnxruntime-tools
Our Top 3 Partner Recommendations
1. Best VPN for Engineers – 3 Months Free – Stay secure online with a free trial
2. Best Project Management Tool for Tech Teams – Boost team efficiency today
4. Best Password Management for Tech Teams – zero-trust and zero-knowledge security
Then, you must install the PyTorch package, which could work in your environment.
With the package installed, let’s get into the next part.
Mobile Deployment for Hugging Face Models
Mobile devices are different from computer devices. We can’t treat them the same, as their requirements are different. From limited memory size to different kinds of OS, we need to adjust our model to be suitable for mobile devices.
That’s why many preparations for the mobile deployment of hugging face models involve minimizing the model’s size and using a suitable format.
Let’s start by selecting the model. We would not try to fine-tune them, but we would only load the pre-trained model with a lightweight size.
from transformers import DistilBertModel
model = DistilBertModel.from_pretrained('distilbert-base-uncased')
model.eval()
The DistilBERT model is lightweight and suitable for mobile deployment. However, we still need to transform it into a format suitable for mobile devices.
We would use the ONNX (Open Neural Network Exchange) format in this case.
import torch
dummy_input = torch.ones(1, 512, dtype=torch.long)
torch.onnx.export(model, dummy_input, "distilbert.onnx",
input_names=["input_ids"],
output_names=["output"],
opset_version=11)
In the code above, we pass a sample input and the structure for using the model while transforming them into ONNX format.
Then, we would quantize the model to compress the model size even more.
from onnxruntime.quantization import quantize_dynamic, QuantType
model_fp32 = "distilbert.onnx"
model_quant = "distilbert_quantized.onnx"
# Perform dynamic quantization
quantize_dynamic(model_fp32, model_quant, weight_type=QuantType.QInt8)
If you check the model, the quantized model would be significantly smaller than the original.
Original model size (FP32): 253.24 MB
Quantized model size (INT8): 63.62 MB
Once the model is ready, we can test it to see if it works well. Remember that we didn’t fine-tune our model, so the model output here would be different than what you have in mind.
import onnxruntime as ort
import numpy as np
ort_session = ort.InferenceSession("distilbert_quantized.onnx")
dummy_input = np.ones((1, 512), dtype=np.int64)
outputs = ort_session.run(None, {"input_ids": dummy_input})
print("Model output:", outputs)
Model output: [array([[[ 1.8881904e-01, -3.3938486e-02, 2.1839237e-01, ...,
-1.6090244e-01, 9.5649131e-02, -3.0762717e-01],
[-1.3188489e-02, 1.4205594e-03, 3.3921045e-01, ...,
-1.6600204e-01, 5.7920091e-02, -2.0339653e-01],
[-1.9435942e-02, -2.5236234e-04, 3.3452547e-01, ...,
-1.6795774e-01, 4.4274464e-02, -1.8873917e-01],
...,
[ 2.1659568e-01, -2.0543179e-02, 2.1092147e-01, ...,
-1.3063732e-01, 5.9916750e-02, -3.5460258e-01],
[ 2.1566749e-01, -1.9638695e-02, 2.2383465e-01, ...,
-1.4067526e-01, 5.2998818e-02, -3.7176940e-01],
[ 2.0821217e-01, -4.6792708e-02, 2.1903740e-01, ...,
-1.2426962e-01, 4.2172089e-02, -4.0435579e-01]]], dtype=float32)]
The model is now ready for mobile device deployment. You can start to deploy the model on mobile like Android or iOS.
For Android deployment, you can use something similar to the code below.
import ai.onnxruntime.*;
import android.content.res.AssetFileDescriptor;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
public class MainActivity extends AppCompatActivity {
private OrtEnvironment env;
private OrtSession session;
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
try {
env = OrtEnvironment.getEnvironment();
String modelPath = "distilbert_quantized.onnx";
session = env.createSession(loadModelFile(modelPath), new OrtSession.SessionOptions());
float[][] inputVal = new float[1][512]; // Example input size
float[][] outputVal = new float[1][768]; // Adjust according to model's output
OrtSession.Result result = session.run(Collections.singletonMap("input_ids", inputVal));
System.out.println("Model output: " + result);
} catch (Exception e) {
e.printStackTrace();
}
}
private MappedByteBuffer loadModelFile(String modelPath) throws IOException {
AssetFileDescriptor fileDescriptor = this.getAssets().openFd(modelPath);
FileInputStream inputStream = new FileInputStream(fileDescriptor.getFileDescriptor());
FileChannel fileChannel = inputStream.getChannel();
long startOffset = fileDescriptor.getStartOffset();
long declaredLength = fileDescriptor.getDeclaredLength();
return fileChannel.map(FileChannel.MapMode.READ_ONLY, startOffset, declaredLength);
}
}
Try to master the model resizing and format change to have your model deployed in the mobile device.
Additional Resources
Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.
Source Link
Support Techcratic
If you find value in Techcratic’s insights and articles, consider supporting us with Bitcoin. Your support helps me, as a solo operator, continue delivering high-quality content while managing all the technical aspects, from server maintenance to blog writing, future updates, and improvements. Support Innovation! Thank you.
Bitcoin Address:bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge
Please verify this address before sending funds.Bitcoin QR Code
Simply scan the QR code below to support Techcratic.
Please read the Privacy and Security Disclaimer on how Techcratic handles your support.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.