P&ID Symbol Detection with YOLOv8 and PyTorch — Complete Tutorial
- Codersarts AI

- 22 hours ago
- 12 min read

Every P&ID is a dense map of symbols — valves, pumps, instruments, heat exchangers, control loops — where the position, shape, and connections between symbols carry meaning that no OCR engine can read.
This is the part of document intelligence that most tutorials skip entirely. OCR extracts text. But on a P&ID, a gate valve isn't labelled "gate valve" in plain text — it's a specific geometric symbol shape at a specific location connected to specific pipelines. Understanding that requires computer vision, not character recognition.
In this guide we train a custom YOLOv8 object detection model from scratch on P&ID symbols, covering everything: dataset preparation, annotation strategy, training configuration for high-resolution engineering drawings, inference, post-processing to associate symbols with instrument tags, and evaluation with precision/recall metrics.
This is the exact model architecture we use in production at docprocessing360.com — deployed for oil & gas, EPC, and manufacturing clients.
Why YOLOv8 for P&ID Symbols
Several object detection architectures exist. Here's why YOLOv8 wins for P&ID symbol detection specifically:
Criterion | YOLOv8 | Faster R-CNN | LayoutLM | Template Matching |
High-res image support | ✅ Native | ✅ Yes | ❌ No | ✅ Yes |
Small object detection | ✅ Strong | ✅ Strong | ❌ No | ⚠️ Fragile |
Custom class training | ✅ Simple | ⚠️ Complex | ⚠️ Moderate | ❌ Per-symbol |
Training speed | ✅ Fast | ⚠️ Slow | ⚠️ Slow | N/A |
Production deployment | ✅ ONNX/TorchScript | ⚠️ Heavier | ⚠️ Heavier | ⚠️ Brittle |
Handles symbol rotation | ✅ With aug | ⚠️ Limited | ❌ No | ❌ No |
Overlapping symbols | ✅ NMS handles | ✅ Yes | ❌ No | ❌ Fails |
YOLOv8 achieves high accuracy in P&ID symbol recognition and is proven effective for automating the identification of symbols in Piping and Instrumentation Diagrams. It also trains fast, deploys anywhere, and its Python API via Ultralytics makes the entire pipeline clean to maintain.
The Core Challenge: Why P&IDs Break Standard Models
Before writing any code, understand the specific challenges that make P&ID symbol detection harder than standard object detection:
1. Extreme Symbol Density
P&IDs pack dozens to hundreds of symbols onto a single sheet. Symbols overlap, share boundary regions, and are separated by pipeline lines rather than whitespace. Standard COCO-trained models assume objects are surrounded by background — P&IDs have almost no background.
2. No Large Public Dataset
Unlike natural image datasets where millions of labeled photos exist, there is no large public dataset of labeled engineering drawings. You must build or augment your own annotated dataset. This is the single biggest bottleneck.
3. Symbol Variation Across Standards
P&ID symbols vary by standard (ISA 5.1, ISO 14617), by company-specific symbol libraries, and by decade (1970s drawings look different from 2020s CAD exports). A model trained on one company's symbols may fail on another's without retraining.
4. High-Resolution Images
A single P&ID sheet may be 7000 × 4500 pixels or larger. Standard YOLOv8 training uses 640px images. Processing P&IDs at native resolution requires a tiled inference strategy.
5. Small Objects
Instrument tags like FIC-101A next to a 40×40 pixel valve symbol must both be detected reliably. Small object detection requires specific model configuration.
Environment Setup
pip install ultralytics opencv-python numpy pillow \
matplotlib labelImg pyyaml torch torchvision
Verify GPU:
import torch
print(torch.cuda.is_available()) # True
print(torch.cuda.get_device_name(0)) # NVIDIA RTX 3090 / A100 etc.
YOLOv8 requires CUDA for practical training speeds. On CPU, a single epoch on 500 images takes ~45 minutes. On GPU it takes ~2 minutes.
Step 1 — Dataset Preparation
Option A: Use the Digitize-PID Synthetic Dataset (Fastest Start)
A synthetic dataset of 500 annotated P&ID sheets with 32 symbol classes is publicly available from the Digitize-PIDresearch paper. This dataset includes sample images in JPEG format with label annotations and bounding boxes for each piece of text and symbol in the image.
This is the fastest way to get a working model. Download, convert to YOLO format, and train. Accuracy on real P&IDs from this baseline will be 65–75% — good enough to validate the approach, not good enough for production.
Option B: Build Your Own Dataset (Production Quality)
For production accuracy (90%+), you need annotated samples from your actual P&ID documents.
Recommended annotation tool: LabelImg (free, outputs YOLO format directly)
Minimum samples per class:
50 images per symbol class for acceptable accuracy
100–200 images per class for production accuracy
More is always better — quality matters more than quantity
Annotation workflow:
Raw P&ID sheet (high-res PDF/TIFF)
↓
Convert to PNG at 300 DPI
↓
Tile into 1280×1280 patches (with 20% overlap)
↓
Annotate each patch in LabelImg (YOLO format)
↓
Collect .txt annotation files
↓
Train/val split (80/20)
Why tile? P&IDs at 300 DPI produce images too large for GPU memory at once. Tiling into 1280×1280 patches lets you process the full document while keeping each training sample GPU-friendly.
import cv2
import numpy as np
from pathlib import Path
def tile_image(img_path: str, tile_size: int = 1280,
overlap: float = 0.2) -> list[tuple]:
"""
Tile a large P&ID image into overlapping patches for annotation.
Returns list of (patch_img, x_offset, y_offset) tuples.
"""
img = cv2.imread(img_path)
h, w = img.shape[:2]
step = int(tile_size * (1 - overlap))
tiles = []
for y in range(0, h, step):
for x in range(0, w, step):
x2 = min(x + tile_size, w)
y2 = min(y + tile_size, h)
patch = img[y:y2, x:x2]
# Pad to tile_size if edge patch
if patch.shape[0] < tile_size or patch.shape[1] < tile_size:
padded = np.zeros((tile_size, tile_size, 3), dtype=np.uint8)
padded[:patch.shape[0], :patch.shape[1]] = patch
patch = padded
tiles.append((patch, x, y))
return tiles
Step 2 — Symbol Classes (ISA Standard)
Define your symbol taxonomy before annotating. For ISA 5.1 compliant P&IDs, common classes include:
# pid_symbols.yaml — dataset configuration
path: ./datasets/pid
train: images/train
val: images/val
test: images/test
nc: 32 # Number of symbol classes
names:
0: gate_valve
1: ball_valve
2: butterfly_valve
3: check_valve
4: control_valve
5: globe_valve
6: needle_valve
7: plug_valve
8: safety_relief_valve
9: pump_centrifugal
10: pump_reciprocating
11: compressor
12: heat_exchanger_shell_tube
13: heat_exchanger_plate
14: vessel_vertical
15: vessel_horizontal
16: tank_atmospheric
17: filter_strainer
18: indicator_generic
19: transmitter_generic
20: controller_generic
21: recorder_generic
22: flow_element
23: level_gauge
24: pressure_gauge
25: temperature_element
26: actuator_pneumatic
27: actuator_electric
28: signal_line_pneumatic
29: signal_line_electric
30: reducer_concentric
31: blind_flange
Pro tip: Start with 10–15 most common symbols in your specific P&ID library rather than all 32 at once. A model with 92% accuracy on 12 classes beats 70% accuracy on 32 classes every time.
Step 3 — Dataset Directory Structure
YOLO expects a specific directory layout:
datasets/pid/
├── images/
│ ├── train/
│ │ ├── pid_001_tile_0_0.png
│ │ ├── pid_001_tile_0_1.png
│ │ └── ...
│ ├── val/
│ │ └── ...
│ └── test/
│ └── ...
└── labels/
├── train/
│ ├── pid_001_tile_0_0.txt
│ ├── pid_001_tile_0_1.txt
│ └── ...
├── val/
│ └── ...
└── test/
└── ...
Each .txt label file contains one row per symbol in that image tile:
# Format: class_id center_x center_y width height (all normalised 0–1)
4 0.523 0.341 0.042 0.038 # control_valve
0 0.712 0.198 0.031 0.029 # gate_valve
20 0.381 0.556 0.055 0.051 # controller_generic
Script to verify your dataset structure:
from pathlib import Path
import yaml
def verify_dataset(yaml_path: str):
with open(yaml_path) as f:
config = yaml.safe_load(f)
base = Path(config['path'])
issues = []
for split in ['train', 'val']:
img_dir = base / 'images' / split
lbl_dir = base / 'labels' / split
imgs = list(img_dir.glob('*.png')) + list(img_dir.glob('*.jpg'))
lbls = list(lbl_dir.glob('*.txt'))
print(f"{split}: {len(imgs)} images, {len(lbls)} labels")
for img in imgs:
lbl = lbl_dir / (img.stem + '.txt')
if not lbl.exists():
issues.append(f"Missing label: {img.name}")
if issues:
print(f"\n{len(issues)} issues found:")
for i in issues[:10]:
print(f" {i}")
else:
print("\nDataset structure valid.")
verify_dataset('pid_symbols.yaml')
Step 4 — Training Configuration
YOLOv8 has multiple model sizes. For P&ID symbol detection:
Model | Parameters | Speed | Accuracy | Best for |
yolov8n | 3.2M | Fastest | Lowest | Prototyping only |
yolov8s | 11.2M | Fast | Good | Quick validation |
yolov8m | 25.9M | Moderate | Better | Recommended |
yolov8l | 43.7M | Slow | High | High accuracy needs |
yolov8x | 68.2M | Slowest | Highest | Maximum accuracy |
Use yolov8m as your starting point. It balances training time and accuracy well for P&ID-sized datasets.
from ultralytics import YOLO
# Load pretrained model (downloads ~25MB weights)
model = YOLO('yolov8m.pt')
# Train on P&ID symbol dataset
results = model.train(
data='pid_symbols.yaml',
# Image size — critical for P&ID tiles
imgsz=1280, # Must match your tile size
# Training duration
epochs=150,
patience=30, # Early stopping if no improvement
# Batch size — reduce if GPU OOM
batch=8, # RTX 3090: 8-16 | A100: 16-32
# Optimisation
optimizer='AdamW',
lr0=0.001, # Initial learning rate
lrf=0.01, # Final LR = lr0 * lrf
warmup_epochs=5,
# Augmentation — critical for P&ID robustness
augment=True,
degrees=15, # Rotation (P&ID symbols can be rotated)
scale=0.5, # Scale variation
fliplr=0.5, # Horizontal flip
flipud=0.0, # No vertical flip (text would invert)
mosaic=0.8, # Mosaic augmentation
copy_paste=0.3, # Copy-paste augmentation
# Device
device='cuda', # 'cpu' if no GPU
# Output
project='pid_detection',
name='yolov8m_run1',
save=True,
plots=True,
# Multi-scale training (improves small object detection)
multi_scale=True,
)
print(f"Best mAP50: {results.results_dict['metrics/mAP50(B)']:.3f}")
Key Training Parameters for P&IDs
imgsz=1280 — Do not use 640. P&ID symbols are small relative to the full document. At 640px input, symbols that are 40×40 pixels in the original become 20×20 — below the reliable detection threshold for most models.
degrees=15 — P&ID symbols are sometimes drawn at slight angles, especially in scanned legacy documents. Rotation augmentation makes the model robust to this.
flipud=0.0 — Never flip vertically. Instrument tags and symbol labels would become mirrored text, confusing the model.
multi_scale=True — Trains on randomly resized images within ±50% of imgsz. Significantly improves small object detection.
Step 5 — Monitor Training
Training outputs are saved to pid_detection/yolov8m_run1/. Key files to watch:
pid_detection/yolov8m_run1/
├── weights/
│ ├── best.pt ← Use this for inference
│ └── last.pt ← Last epoch checkpoint
├── results.csv ← Metrics per epoch
└── plots/
├── confusion_matrix.png
├── PR_curve.png
└── results.png ← Loss + mAP curves
Healthy training looks like:
Box loss and classification loss decrease steadily for ~50 epochs
mAP50 climbs above 0.80 by epoch 100
No divergence or plateau before epoch 50
If mAP plateaus below 0.70 at epoch 50:
Add more training samples (most common fix)
Increase epochs to 200
Check annotation quality — mislabelled samples are more damaging than fewer samples
Step 6 — Tiled Inference on Full P&ID Sheets
The biggest production challenge: running inference on a full P&ID sheet that is 7000+ pixels wide.
import cv2
import numpy as np
from ultralytics import YOLO
from pathlib import Path
model = YOLO('pid_detection/yolov8m_run1/weights/best.pt')
def detect_pid_symbols(
image_path: str,
tile_size: int = 1280,
overlap: float = 0.2,
conf_threshold: float = 0.35,
iou_threshold: float = 0.45
) -> list[dict]:
"""
Run tiled inference on a full P&ID sheet.
Handles overlapping tiles via global NMS.
"""
img = cv2.imread(image_path)
h, w = img.shape[:2]
step = int(tile_size * (1 - overlap))
all_detections = []
for y in range(0, h, step):
for x in range(0, w, step):
x2 = min(x + tile_size, w)
y2 = min(y + tile_size, h)
tile = img[y:y2, x:x2]
# Pad edge tiles
if tile.shape[0] < tile_size or tile.shape[1] < tile_size:
padded = np.zeros((tile_size, tile_size, 3), dtype=np.uint8)
padded[:tile.shape[0], :tile.shape[1]] = tile
tile = padded
# Run inference on this tile
results = model.predict(
tile,
conf=conf_threshold,
iou=iou_threshold,
verbose=False
)
# Convert tile-local coordinates to global image coordinates
for result in results:
for box in result.boxes:
bx1, by1, bx2, by2 = box.xyxy[0].tolist()
# Offset back to global coordinates
gx1 = x + bx1
gy1 = y + by1
gx2 = x + bx2
gy2 = y + by2
# Skip detections in padding area
if gx1 >= w or gy1 >= h:
continue
all_detections.append({
'class_id': int(box.cls[0]),
'class_name': model.names[int(box.cls[0])],
'confidence': float(box.conf[0]),
'bbox_global': [gx1, gy1, gx2, gy2],
'center': [(gx1 + gx2) / 2, (gy1 + gy2) / 2]
})
# Apply global NMS to remove duplicate detections from overlapping tiles
all_detections = apply_global_nms(all_detections, iou_threshold=0.4)
return all_detections
def apply_global_nms(detections: list[dict],
iou_threshold: float = 0.4) -> list[dict]:
"""
Remove duplicate detections from overlapping tiles using NMS.
"""
if not detections:
return []
boxes = np.array([d['bbox_global'] for d in detections])
scores = np.array([d['confidence'] for d in detections])
class_ids = np.array([d['class_id'] for d in detections])
keep = []
for cls_id in np.unique(class_ids):
cls_mask = class_ids == cls_id
cls_boxes = boxes[cls_mask]
cls_scores = scores[cls_mask]
cls_indices = np.where(cls_mask)[0]
# NMS per class
nms_keep = nms(cls_boxes, cls_scores, iou_threshold)
keep.extend([cls_indices[i] for i in nms_keep])
return [detections[i] for i in sorted(keep)]
def nms(boxes: np.ndarray, scores: np.ndarray,
threshold: float) -> list[int]:
"""Standard Non-Maximum Suppression."""
x1, y1, x2, y2 = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
areas = (x2 - x1) * (y2 - y1)
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0, xx2 - xx1)
h = np.maximum(0, yy2 - yy1)
inter = w * h
iou = inter / (areas[i] + areas[order[1:]] - inter)
order = order[1:][iou <= threshold]
return keep
Step 7 — Associate Symbols with Instrument Tags
Detecting a valve is only half the job. The valve needs to be linked to its instrument tag — the text label nearby that identifies it as FCV-201 or XV-103.
This is done by spatial proximity: for each detected symbol, find the nearest OCR text block and associate them.
def associate_tags_to_symbols(
symbols: list[dict],
ocr_words: list[dict],
max_distance_px: int = 80
) -> list[dict]:
"""
Associate each detected symbol with its nearest instrument tag
from the OCR output.
symbols: list of detections from detect_pid_symbols()
ocr_words: list of {text, center, confidence, bbox} from OCR pipeline
max_distance_px: max pixel distance to search for a tag
"""
import re
# Instrument tag pattern (ISA 5.1)
tag_pattern = re.compile(
r'\b[A-Z]{1,4}-\d{3,5}[A-Z]?\b' # e.g. FIC-201, XV-1032A
)
enriched = []
for symbol in symbols:
sx, sy = symbol['center']
nearest_tag = None
nearest_tag_conf = 0.0
min_dist = float('inf')
for word in ocr_words:
# Only consider instrument tag-formatted text
if not tag_pattern.match(word['text']):
continue
wx, wy = word['center']
dist = ((wx - sx) ** 2 + (wy - sy) ** 2) ** 0.5
if dist < min_dist and dist <= max_distance_px:
min_dist = dist
nearest_tag = word['text']
nearest_tag_conf = word['confidence']
enriched.append({
**symbol,
'instrument_tag': nearest_tag,
'tag_confidence': nearest_tag_conf,
'tag_distance_px': round(min_dist, 1) if nearest_tag else None
})
return enriched
Output example:
{
"class_name": "control_valve",
"confidence": 0.94,
"bbox_global": [1240, 880, 1310, 950],
"center": [1275, 915],
"instrument_tag": "FCV-201",
"tag_confidence": 0.91,
"tag_distance_px": 38.2
}
Step 8 — Evaluation: Precision, Recall & mAP
Evaluate your trained model systematically. Never deploy based on visual inspection alone.
from ultralytics import YOLO
model = YOLO('pid_detection/yolov8m_run1/weights/best.pt')
# Evaluate on test set
metrics = model.val(
data='pid_symbols.yaml',
split='test',
conf=0.35,
iou=0.50,
imgsz=1280,
verbose=True
)
print(f"mAP50: {metrics.box.map50:.3f}")
print(f"mAP50-95: {metrics.box.map:.3f}")
print(f"Precision: {metrics.box.mp:.3f}")
print(f"Recall: {metrics.box.mr:.3f}")
# Per-class breakdown
for i, cls_name in model.names.items():
ap = metrics.box.ap50[i] if i < len(metrics.box.ap50) else 0
print(f" {cls_name:30s} AP50: {ap:.3f}")
Production Benchmarks to Target
Metric | Acceptable | Good | Production-ready |
mAP50 | >0.70 | >0.82 | >0.90 |
Precision | >0.75 | >0.85 | >0.92 |
Recall | >0.70 | >0.82 | >0.88 |
If recall is low but precision is high, lower the confidence threshold. If precision is low, raise it. The right threshold depends on your use case — high recall matters more when missing a symbol is worse than a false positive, which is usually the case in engineering documents.
Step 9 — Export for Production Deployment
Export the trained model to ONNX for cloud-agnostic deployment:
from ultralytics import YOLO
model = YOLO('pid_detection/yolov8m_run1/weights/best.pt')
# Export to ONNX (fastest cross-platform inference)
model.export(
format='onnx',
imgsz=1280,
opset=17,
simplify=True,
dynamic=False
)
# Or TorchScript for PyTorch serving
model.export(format='torchscript', imgsz=1280)
# Or TensorRT for NVIDIA GPU deployment (fastest on GPU)
model.export(format='engine', imgsz=1280, half=True) # FP16
Load ONNX model for inference without Ultralytics dependency:
import onnxruntime as ort
import numpy as np
import cv2
session = ort.InferenceSession(
'best.onnx',
providers=['CUDAExecutionProvider', 'CPUExecutionProvider']
)
def preprocess_for_onnx(img: np.ndarray, size: int = 1280) -> np.ndarray:
img = cv2.resize(img, (size, size))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = img.astype(np.float32) / 255.0
img = np.transpose(img, (2, 0, 1))
return np.expand_dims(img, axis=0)
Complete Pipeline: P&ID to Structured Output
Putting it all together — from raw P&ID image to structured JSON:
def process_pid_complete(
image_path: str,
ocr_words: list[dict]
) -> dict:
"""
Full pipeline: P&ID image → detected symbols → associated tags → JSON
"""
# 1. Detect symbols
symbols = detect_pid_symbols(image_path)
# 2. Associate with instrument tags from OCR
enriched = associate_tags_to_symbols(symbols, ocr_words)
# 3. Group by symbol class
by_class = {}
for sym in enriched:
cls = sym['class_name']
by_class.setdefault(cls, []).append({
'tag': sym['instrument_tag'],
'confidence': round(sym['confidence'], 3),
'bbox': sym['bbox_global']
})
# 4. Summary statistics
total = len(enriched)
with_tags = sum(1 for s in enriched if s['instrument_tag'])
avg_conf = sum(s['confidence'] for s in enriched) / total if total else 0
return {
'symbol_count': total,
'tagged_count': with_tags,
'tagging_rate': round(with_tags / total, 3) if total else 0,
'avg_confidence': round(avg_conf, 3),
'symbols_by_class': by_class,
'all_detections': enriched
}
Sample output:
{
"symbol_count": 147,
"tagged_count": 138,
"tagging_rate": 0.939,
"avg_confidence": 0.887,
"symbols_by_class": {
"control_valve": [
{ "tag": "FCV-201", "confidence": 0.94, "bbox": [1240, 880, 1310, 950] },
{ "tag": "PCV-301", "confidence": 0.91, "bbox": [2100, 1240, 2170, 1310] }
],
"pump_centrifugal": [
{ "tag": "P-101A", "confidence": 0.96, "bbox": [540, 1820, 650, 1930] }
]
}
}
Common Issues & Fixes
Low recall on small symbols (valves <40px) → Increase imgsz to 1280 or 1600. Add more annotated examples of small instances. Enable multi_scale=True.
False positives on pipeline lines → Add a pipeline_line class and annotate it as a negative class. This teaches the model what pipeline lines look like so it stops confusing them with symbols.
Model fails on a different company's P&IDs → Domain shift is expected. Annotate 30–50 samples from the new P&ID set and fine-tune the existing model (transfer learning) rather than retraining from scratch:
model = YOLO('pid_detection/yolov8m_run1/weights/best.pt') # Load existing
model.train(data='new_company_pid.yaml', epochs=50, lr0=0.0001) # Fine-tune
Duplicate detections from overlapping tiles → The apply_global_nms() function in Stage 6 handles this. Tune iou_threshold downward (0.3) if duplicates persist.
GPU out of memory → Reduce batch from 8 to 4 or 2. Or reduce imgsz from 1280 to 960 as a compromise.
What This Pipeline Doesn't Cover
Symbol detection gives you a list of detected symbols with bounding boxes and instrument tags. For a complete P&ID digitisation system you also need:
Line detection — identifying pipeline connections between symbols (graph extraction)
Line type classification — distinguishing process lines, signal lines, utility lines
Connection graph construction — building the P&ID as a graph where nodes are instruments/equipment and edges are pipelines
These are covered in the complete document intelligence pipeline guide → and in docprocessing360.com where the full stack runs live.
Live Demo
The symbol detection model described in this guide runs as part of the complete document intelligence stack at:
Upload a scanned P&ID and see detected symbols highlighted with bounding boxes, class labels, confidence scores, and associated instrument tags — in real time.
Build It With Codersarts
We train, deploy, and maintain custom YOLOv8 symbol detection models for engineering clients — including fine-tuning for company-specific P&ID symbol libraries, integration with OCR pipelines, and active learning systems that improve accuracy over time.
🔗 Live Demo: docprocessing360.com
💼 C2C / Contract engagements available
Tags: P&ID symbol detection, YOLOv8 PyTorch engineering documents, piping instrumentation diagram AI, object detection P&ID, YOLOv8 custom training, P&ID digitization deep learning, instrument tag detection computer vision, engineering drawing object detection, tiled inference large images YOLOv8



Comments