MDS Pods Not Running On Multus Network In Rook/Ceph
Having your MDS pods not running on the Multus network can be a perplexing issue, especially when other Ceph components seem to be functioning correctly. You've gone through the steps of adding Multus as a Day 2 operation to an existing, healthy Ceph cluster, and you're observing that while OSDs are happily residing on the Multus network and the overall cluster status is healthy (ceph status shows healthy), the MDS pods are stubbornly sticking to their original network. This is confirmed by commands like ceph fs dump, which clearly indicates the MDS pods are not utilizing the Multus network as expected. It’s even more baffling when new PVCs for both RBD and CephFS are provisioned without a hitch, and application pods are starting up and running smoothly. You've noticed other pods might restart after the network switch, but the MDS pods remain unaffected, and even manual restarts don't resolve the situation. This article aims to dissect this peculiar behavior, explore potential causes, and guide you through troubleshooting steps to get your MDS pods running on the Multus network.
Understanding the Multus and CephFS Interaction
To effectively troubleshoot why your MDS pods are not running on the Multus network, it's crucial to grasp how Multus CNI and CephFS, specifically the Metadata Server (MDS), interact within a Kubernetes environment managed by Rook. Multus acts as a meta-plugin for CNI, allowing pods to have multiple network interfaces. This is particularly useful in scenarios where you need dedicated networks for specific services, like storage traffic, to ensure performance, security, or isolation. CephFS, on the other hand, relies on MDS daemons to manage the filesystem's metadata. These MDS daemons need to communicate with the OSDs and client applications. When you integrate Multus into an existing Rook Ceph cluster, the intention is typically to route this critical metadata traffic through a separate, potentially higher-performance or more secure, network interface managed by Multus. The expected behavior is that Rook, upon detecting the Multus configuration, would instruct the MDS pods to utilize this new network. The fact that OSDs and new PVCs are working suggests that the fundamental Ceph cluster and Rook operator are functioning. The issue appears to be specific to how the MDS pods are configured or how they are picking up the network changes. Several factors could be at play, including the order of operations, the specific configuration within the Ceph cluster CRD, or even issues with how Kubernetes itself is reconciling the network changes for the MDS pods. Delving into the Rook operator's logic for network configuration and how it applies to different Ceph daemons is key. It's also worth considering if the MDS daemons have any hardcoded network assumptions or if their initialization process is sensitive to network topology changes after they've already been started.
Diagnosing the MDS Network Misalignment
When you encounter the problem where MDS pods are not running on the Multus network, a systematic diagnostic approach is essential. You've already confirmed that the OSDs are correctly placed on the Multus network, and ceph status reports a healthy cluster. This is a good starting point, as it indicates the underlying Ceph infrastructure is largely operational. The first step in diagnosis is to meticulously re-examine the Rook cluster.yaml configuration. Pay close attention to the spec.network.selectors or spec.network.provider sections, ensuring that the Multus network is correctly identified and applied. If Multus was added as a Day 2 operation, verify that the CephCluster custom resource was updated correctly and that the Rook operator has reconciled these changes. Check the logs of the Rook operator pod itself. The operator is responsible for translating the CephCluster CR into actual Ceph configurations and Kubernetes resources. Look for any errors or warnings related to network configuration, MDS deployment, or Multus integration. Also, inspect the logs of the MDS pods that are not on the Multus network. While you mentioned that restarting them didn't help, the logs might contain clues about why they aren't binding to the correct network interface or if they are encountering errors during initialization related to networking. Use kubectl -n <namespace> describe pod <mds-pod-name> to get detailed information about the MDS pod, including its network configuration and any events associated with it. You can also use kubectl -n <namespace> exec <mds-pod-name> -- ip addr to inspect the network interfaces available within the MDS pod itself. Compare this output with what you expect based on your Multus configuration. Another critical area to investigate is the Ceph configuration itself. Sometimes, Ceph might have internal configurations or monitors that aren't immediately updated to reflect the new network topology for all daemons. Commands like ceph osd network ls or examining the output of ceph mon dump and ceph mds dump for any network-related directives could be insightful. Ensure that the Ceph cluster's internal network configuration, as perceived by the monitors and managers, aligns with the desired Multus network setup for the MDS. Remember, even if new PVCs and applications are working, the MDS is a critical component for CephFS, and its network placement directly impacts performance and reliability. If the MDS pods are not on the intended network, metadata operations could be bottlenecked or experience higher latency. Keep an eye out for any discrepancies between the network configuration specified in the CephCluster CR and the actual network interfaces presented to the MDS pods by Kubernetes, especially in complex Multus setups.
Step-by-Step Troubleshooting Guide
When facing the issue of MDS pods not running on the Multus network, a structured troubleshooting process will help pinpoint the root cause. Begin by re-verifying the initial setup steps. Ensure that Multus itself is correctly installed and functioning in your Kubernetes cluster. Test Multus with a simple pod that has multiple network interfaces assigned to confirm its basic functionality independent of Rook and Ceph. Next, meticulously review the CephCluster custom resource definition (cluster.yaml). If Multus was added as a Day 2 operation, confirm that the spec.network.selectors field correctly targets the Multus network attachment definition (NAD). For example, if your NAD is named multus-conf, you might have a selector like spec.network.selectors: { "multus.io/net-name": "multus-conf" }. Ensure this exactly matches your Multus setup. After applying any changes to the CephCluster CR, wait for the Rook operator to reconcile the changes. Monitor the Rook operator logs for any errors during this reconciliation process. If the operator logs seem clean, proceed to check the MDS pods. Use kubectl -n <namespace> get pods to identify the MDS pods and their status. Then, use kubectl -n <namespace> describe pod <mds-pod-name> for a detailed view. Look at the Events section for any clues. The most crucial step is to inspect the network interfaces within the MDS pod. Execute kubectl -n <namespace> exec <mds-pod-name> -- ip addr. Compare the output with the expected network interfaces based on your Multus configuration. You should see an interface corresponding to the Multus network (often named something like net1 or similar, depending on the order it's added). If this interface is missing, the problem might lie in how Kubernetes is attaching the secondary network to the pod, or how Rook is instructing Kubernetes to do so. If the interface is present but the MDS isn't using it, delve into the Ceph configuration. Execute kubectl -n <namespace> exec <rook-toolbox-pod> -- ceph fs dump. Examine the output carefully for the MDS daemons. It should list the IP addresses associated with each MDS, and these should reflect the Multus network IPs. If they are still showing the original cluster network IPs, it suggests that the MDS daemons themselves haven't been reconfigured or restarted in a way that makes them aware of the new network. Sometimes, a simple restart of the MDS pods might not be enough if the underlying configuration hasn't fully propagated. Consider scaling the MDS deployment down to zero and then back up. This forces Kubernetes to recreate the pods with the updated network configurations. kubectl -n <namespace> scale deployment rook-ceph-mds-<fs-name> --replicas=0 followed by kubectl -n <namespace> scale deployment rook-ceph-mds-<fs-name> --replicas=3 (or your desired number of replicas). If the issue persists, examine the Ceph configuration files within the cluster, possibly by exec-ing into a Ceph mon or osd pod and looking at /etc/ceph/ceph.conf or related files. Ensure there are no network-specific settings that might be overriding the dynamic Multus configuration. Finally, if you're still stuck, consider collecting detailed logs from the operator, the MDS pods, and the output of relevant kubectl and ceph commands to file a bug report, providing as much detail as possible.
Potential Causes and Solutions
Several potential causes can lead to your MDS pods not running on the Multus network, even after successfully adding Multus to your Rook Ceph cluster. One common reason is the order of operations. If Multus was added after the MDS daemons were already running and configured, they might not automatically reconfigure themselves to use the new network. Solutions here involve forcing a reconfiguration. This could mean scaling the MDS deployment down to zero replicas and then back up. This process ensures that new pods are created with the correct network configurations injected by Kubernetes via Multus. Another possibility is an incorrect network selector configuration in the CephCluster CR. Double-check that the spec.network.selectors accurately point to your Multus Network Attachment Definition (NAD). A typo or incorrect key-value pair here will prevent Multus from attaching the secondary network to the MDS pods. Verify the NAD's YAML definition to ensure it's correctly configured for your environment. Rook operator reconciliation issues can also occur. If the operator fails to correctly process the network changes in the CephCluster CR, it won't update the MDS deployments accordingly. Reviewing the Rook operator logs is critical to catch any errors during this phase. Look for messages indicating network configuration failures or issues with updating MDS stateful sets or deployments. A more subtle cause could be Kubernetes CNI plugin conflicts or misconfigurations. While Multus is designed to work alongside other CNIs, an underlying issue with the primary CNI or how Kubernetes network policies are applied might interfere with secondary network attachment. Ensure your cluster's primary CNI is stable and that there are no conflicting network policies blocking traffic to/from the new interface. Sometimes, the Ceph configuration itself might hold outdated network information. While Rook manages Ceph configuration, it’s worth checking if any manual Ceph configuration steps or older configurations might be interfering. Use the ceph toolbox to run commands like ceph config dump or inspect monitor configuration to see if any hardcoded IP addresses or network ranges are present that conflict with the Multus setup. If the MDS pods have the Multus network interface but aren't using it for Ceph traffic, the issue might be within Ceph's internal service discovery or daemon communication. In such cases, ensuring the MDS daemons are restarted after the network is fully available and recognized by the OS is crucial. A stale configuration within the MDS pods themselves is also a possibility. Even with a recreated pod, if internal Ceph configurations are cached or not properly updated, the MDS might not pick up the new network. This is less common but can happen in complex scenarios. Always ensure you are using a recent, stable version of Rook and Ceph, as network management has seen significant improvements over time. If none of these specific solutions work, systematically collecting logs and configuration details is paramount for further debugging or seeking community support. Remember to check the health of the Multus installation itself, as a faulty Multus setup would prevent any pod from utilizing the secondary network correctly.
Conclusion: Getting Your MDS on the Right Network
Resolving the issue of MDS pods not running on the Multus network ultimately hinges on careful verification of your Rook CephCluster configuration, the correct implementation of Multus Network Attachment Definitions (NADs), and ensuring that Kubernetes and the Rook operator correctly reconcile these network changes for your MDS deployments. The fact that other Ceph components like OSDs are functioning on the Multus network, and that new storage provisioning works, suggests the core Ceph cluster is healthy and Multus is generally operational. The challenge lies specifically with the MDS daemons. By systematically checking operator logs, MDS pod descriptions, and the actual network interfaces available within the MDS pods, you can often identify where the configuration is falling short. Forcing a redeployment of the MDS pods by scaling the deployment down and then back up is a powerful technique to ensure they are instantiated with the correct network interfaces. Always double-check your network selectors in the CephCluster CR for accuracy. If you continue to experience difficulties, the Rook documentation and community resources are invaluable. For further exploration into advanced networking with Kubernetes and CNI plugins, you might find the official documentation on Kubernetes Networking and CNI plugins very insightful. Additionally, delving into the Multus CNI GitHub repository can provide deeper understanding and troubleshooting tips specific to Multus itself.