operations:troubleshooting
Differences
This shows you the differences between two versions of the page.
| operations:troubleshooting [2026/06/17 14:27] – created - external edit 127.0.0.1 | operations:troubleshooting [2026/06/17 14:30] (current) – privacyl0st | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== Troubleshooting & Health Diagnostics ====== | ||
| + | Due to the highly segmented nature of this architecture, | ||
| + | |||
| + | Use these validated diagnostic procedures to isolate and resolve ecosystem faults. | ||
| + | |||
| + | ===== 1. The VPN & Network Layer (VLAN 10) ===== | ||
| + | |||
| + | **Symptom: | ||
| + | * **Diagnostic Check (DNS Leak & Routing):** SSH into the Acquisition Server (VM-A) and execute a manual curl against a public IP checker using the VPN interface. | ||
| + | < | ||
| + | * **Resolution: | ||
| + | |||
| + | **Symptom: | ||
| + | * **Diagnostic Check (Cross-VLAN Pinhole):** | ||
| + | SSH into the Edge Proxy Node (VM-B) and attempt a raw socket connection to the target port on VLAN 10. | ||
| + | < | ||
| + | * **Resolution: | ||
| + | |||
| + | ===== 2. The Storage Fabric (VLAN 50) ===== | ||
| + | |||
| + | **Symptom: | ||
| + | * **Diagnostic Check (Stale File Handles):** | ||
| + | SSH into the affected compute node (Media Engine or Acquisition Server) and check the NFS mount status. | ||
| + | < | ||
| + | * **Resolution: | ||
| + | < | ||
| + | sudo umount -f -l /mnt/data | ||
| + | sudo mount -a | ||
| + | </ | ||
| + | |||
| + | ===== 3. The Reverse Proxy & Ingress (VLAN 20) ===== | ||
| + | |||
| + | **Symptom: | ||
| + | * **Diagnostic Check (Backend Availability): | ||
| + | < | ||
| + | * **Resolution: | ||
| + | |||
| + | **Symptom: | ||
| + | * **Resolution: | ||
| + | |||
| + | ===== 4. Hardware Transcoding (The Brawn) ===== | ||
| + | |||
| + | **Symptom: | ||
| + | * **Diagnostic Check (NVIDIA Drivers):** | ||
| + | SSH into Physical Host 2 and verify the kernel recognizes the GPU. | ||
| + | < | ||
| + | * **Resolution: | ||
| + | |||
| + | **Next Step:** Review how to safely power cycle this infrastructure in [[operations: | ||
operations/troubleshooting.txt · Last modified: by privacyl0st
