Optimizing user-level messaging performance and power consumption with monitor/mwait MPI implementations for RDMA NICs spend a significant amount of time polling memory for completion of messaging events. We evaluate the viability of using MONITOR/MWAIT instructions to reduce power consumption by the progress thread. We examine both kernel-level interfaces to this hardware, as well as virtualization-based access from user level. First we compare the power and latency costs of active polling of memory in a low power state at different rates, MONITOR/MWAIT calls through a kernel interface, and MONITOR/MWAIT calls through the Dune virtualization interface. Then we examine using these techniques in an MPI implementation to reduce power consumption by the progress thread.