Transkernel: An Executor for Commodity Kernels on Peripheral Cores
2018
https://doi.org/10.25394/PGS.8277515.V1Abstract
Modern mobile devices have numerous ephemeral tasks. These tasks are driven by background activities, such as push notifications and sensor readings. In order to execute these tasks, the whole platform has to periodically wake up beforehand, and go to sleep afterwards. During this process, the OS kernel operates on power state of various IO devices, which has been identified as the bottleneck for energy efficiency. To this end, we want to offload this kernel phase to a more energy efficient, microcontroller level core, named peripheral core. To execute commodity OS on a peripheral core, existing approaches either require much engineering effort or incur high execution cost. Therefore, we proposed a new OS model called transkernel. By utilizing cross-ISA dynamic binary translation (DBT) technique, transkernel creates a virtualized environment on the peripheral core. It relies on a small set of stable interfaces. It is specialized for frequently executed kernel path. It exploits ISA s...
FAQs
AI
How does transkernel improve energy efficiency during device suspend/resume phases?
Transkernel reduces system energy consumption by 34% during device suspend/resume phases, significantly outperforming native execution methods.
What are the primary challenges in offloading kernels to peripheral cores?
Key challenges include managing different ISAs, ensuring binary compatibility during kernel evolution, and maintaining effective communication protocols.
How does the ARK implementation handle interrupts during execution?
ARK emulates a brief stage of interrupt handling for ISA-specific tasks before transitioning to the kernel's neutral handling routine.
What criteria define the stable ABI utilized by transkernel for compatibility?
The stable ABI consists of 12 Linux kernel functions and a variable that have remained unchanged for several Linux versions since 2014.
What execution overhead does ARK incur compared to native kernel execution?
ARK incurs an execution overhead of 2.7× on average, which is significantly lower than the 13.9× overhead seen in baseline DBT implementations.
References (94)
- Y. Agarwal, S. Hodges, R. Chandra, J. Scott, P. Bahl, and R. Gupta. Somniloquy: Augmenting Network Inter- faces to Reduce PC Energy Usage. In Proc. USENIX Symp. Networked Systems Design and Implementation (NSDI), 2009.
- N. Asmussen, M. Völp, B. Nöthen, H. Härtig, and G. P. Fettweis. M3: A Hardware/Operating- System Co-Design to Tame Heterogeneous Many- cores. In Proc. ACM Int. Conf. Architectural Support for Programming Languages & Operating Systems (ASPLOS), 2016.
- S. Bansal and A. Aiken. Binary translation us- ing peephole superoptimizers. In Proc. USENIX Conf. Operating Systems Design and Implementation (OSDI), 2008.
- A. Barbalace, R. Lyerly, C. Jelesnianski, A. Carno, H.-R. Chuang, V. Legout, and B. Ravindran. Break- ing the boundaries in heterogeneous-ISA datacen- ters. In Proc. ACM Int. Conf. Architectural Support for Programming Languages & Operating Systems (ASPLOS), 2017.
- A. Barbalace, M. Sadini, S. Ansary, C. Jelesnianski, A. Ravichandran, C. Kendir, A. Murray, and B. Ravin- dran. Popcorn: Bridging the Programmability Gap in heterogeneous-ISA Platforms. In Proc. The European Conf. Computer Systems (EuroSys), 2015.
- A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The Multikernel: a new OS architecture for scalable multicore systems. In Proc. ACM Symp. Operating Systems Principles (SOSP), 2009.
- F. Bellard. QEMU, a Fast and Portable Dynamic Trans- lator. In Proc. USENIX Annual Technical Conference (ATC), 2005.
- E. Blem, J. Menon, T. Vijayaraghavan, and K. Sankar- alingam. ISA wars: Understanding the relevance of ISA being RISC or CISC to performance, power, and energy on modern architectures. ACM Transactions on Computer Systems (TOCS), 33(1):3, 2015.
- D. Boggs, G. Brown, N. Tuck, and K. S. Venkatraman. Denver: Nvidia's First 64-bit ARM Processor. IEEE Micro, 35(2):46-55, 2015.
- S. Boyd-Wickizer and N. Zeldovich. Tolerating Ma- licious Device Drivers in Linux. In Proc. USENIX Annual Technical Conference (ATC), 2010.
- A. L. Brown and R. J. Wysocki. Suspend-to-RAM in Linux. In Ottawa Linux Symposium, 2008.
- X. Chen, N. Ding, A. Jindal, Y. C. Hu, M. Gupta, and R. Vannithamby. Smartphone Energy Drain in the Wild: Analysis and Implications. In Proc. ACM SIGMETRICS (SIGMETRICS), 2015.
- X. Chen, A. Jindal, N. Ding, Y. C. Hu, M. Gupta, and R. Vannithamby. Smartphone Background Activ- ities in the Wild: Origin, Energy Drain, and Optimiza- tion. In Proc. Ann. Int. Conf. Mobile Computing & Networking (MobiCom), 2015.
- B.-G. Chun, S. Ihm, P. Maniatis, M. Naik, and A. Patti. CloneCloud: Elastic Execution Between Mobile Device and Cloud. In Proc. The European Conf. Computer Systems (EuroSys), 2011.
- E. Cuervo, A. Balasubramanian, D.-k. Cho, A. Wol- man, S. Saroiu, R. Chandra, and P. Bahl. MAUI: mak- ing smartphones last longer with code offload. In Proc. ACM Int. Conf. Mobile Systems, Applications, & Services (MobiSys), 2010.
- A. d'Antras, C. Gorgovan, J. Garside, J. Goodacre, and M. Luján. HyperMAMBO-X64: Using Virtual- ization to Support High-Performance Transparent Bi- nary Translation. In Proc. Int. Conf. Virtual Execution Environments (VEE), 2017.
- A. d'Antras, C. Gorgovan, J. Garside, and M. Luján. Low Overhead Dynamic Binary Translation on ARM. In Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI), 2017.
- M. DeVuyst, A. Venkat, and D. M. Tullsen. Execu- tion migration in a heterogeneous-ISA chip multiproces- sor. In Proc. ACM Int. Conf. Architectural Support for Programming Languages & Operating Systems (ASPLOS), 2012.
- eLinux.org. PandaBoard Power Measurements. http: //elinux.org/PandaBoard_Power_Measurements.
- D. R. Engler, M. F. Kaashoek, and J. O'Toole, Jr. Exoker- nel: An Operating System Architecture for Application- level Resource Management. In Proc. ACM Symp. Operating Systems Principles (SOSP), 1995.
- P. Feiner, A. D. Brown, and A. Goel. Comprehensive ker- nel instrumentation via dynamic binary translation. In ACM SIGARCH Computer Architecture News, 2012.
- V. Ganapathy, M. J. Renzelmann, A. Balakrishnan, M. M. Swift, and S. Jha. The Design and Imple- mentation of Microdrivers. In Proc. ACM Int. Conf. Architectural Support for Programming Languages & Operating Systems (ASPLOS), 2008.
- B. Gerofi, A. Santogidis, D. Martinet, and Y. Ishikawa. PicoDriver: Fast-path Device Drivers for Multi-kernel Operating Systems. In Proc. Int. Symp. on
- High-Performance Parallel and Distributed Computing (HPDC), 2018.
- P. Greenhalgh. Big.LITTLE processing with ARM Cortex-A15 and Cortex-A7. Technical report, 2011.
- M. Hähnel and H. Härtig. Heterogeneity by the numbers: A study of the ODROID XU+E big.little platform. In Y. Agarwal and K. Rajamani, editors, Proc. Workshp. Power-Aware Computing and Systems (HotPower), 2014.
- U. Hansson. SDIO power on/off time impacts system suspend/resume time! http://connect.linaro.org/ resource/sfo17/sfo17-402/, 2017.
- B. Hawkins, B. Demsky, D. Bruening, and Q. Zhao. Op- timizing Binary Translation of Dynamically Generated Code. In Proc. Int. Symp. on Code Generation and Optimization (CGO), 2015.
- D. Hong, C. Hsu, P. Yew, J. Wu, W. Hsu, P. Liu, C. Wang, and Y. Chung. HQEMU: a multi-threaded and retar- getable dynamic binary translator on multicores. In Proc. Int. Symp. on Code Generation and Optimization (CGO), 2012.
- R. J. Hookway and M. A. Herdeg. Digital FX! 32: Combining emulation and binary translation. Digital Technical Journal, 9:3-12, 1997.
- J. Howell, B. Parno, and J. R. Douceur. How to Run POSIX Apps in a Minimal Picoprocess. In Proc. USENIX Annual Technical Conference (ATC), 2013.
- Intel. Intel SuspendResume Project. https://01.org/ suspendresume, 2015.
- A. Kadav and M. M. Swift. Understanding Modern De- vice Drivers. In Proc. ACM Int. Conf. Architectural Support for Programming Languages & Operating Systems (ASPLOS), 2012.
- A. Kantee and J. Cormack. Rump Kernels No OS? No Problem! Login: USENIX Magazine, 39(5), 2014.
- P. Kedia and S. Bansal. Fast Dynamic Binary Trans- lation for the Kernel. In Proc. ACM Symp. Operating Systems Principles (SOSP), 2013.
- A. Klaiber. The technology behind Crusoe processors. Transmeta Technical Brief, 2000.
- G. Kroah-Hartman. The Linux Kernel Driver Interface -Stable API Nonsense. https://www.kernel.org/doc/Documentation/ process/stable-api-nonsense.rst. (Accessed on 05/04/2019).
- M. Larabel. A Stable Linux Kernel API/ABI? "The Most Insane Proposal" For Linux Develop- ment. https://www.phoronix.com/scan.php?page= news_item&px=Linux-Kernel-Stable-API-ABI, 2016.
- M. Lentz, J. Litton, and B. Bhattacharjee. Drowsy Power Management. In Proc. ACM Symp. Operating Systems Principles (SOSP), 2015.
- J. LeVasseur, V. Uhlig, J. Stoess, and S. Götz. Unmod- ified Device Driver Reuse and Improved System De- pendability via Virtual Machines. In Proc. USENIX Conf. Operating Systems Design and Implementation (OSDI), 2004.
- T. Li, P. Brett, R. Knauerhase, D. Koufaty, D. Reddy, and S. Hahn. Operating system support for overlapping- ISA heterogeneous multi-core architectures. In Proc. IEEE Int. Symp. on High Performance Computer Architecture (HPCA), 2010.
- Y. Li, B. Dolan-Gavitt, S. Weber, and J. Cappos. Lock- in-Pop: securing privileged operating system kernels by keeping on the beaten path. In Proc. USENIX Annual Technical Conference (ATC), 2017.
- F. X. Lin, Z. Wang, R. LiKamWa, and L. Zhong. Reflex: using low-power processors in smartphones without knowing them. In Proc. ACM Int. Conf. Architectural Support for Programming Languages & Operating Systems (ASPLOS), 2012.
- F. X. Lin, Z. Wang, and L. Zhong. K2: A mo- bile operating system for heterogeneous coherence do- mains. In Proc. ACM Int. Conf. Architectural Support for Programming Languages & Operating Systems (ASPLOS), 2014.
- R. Liu and F. X. Lin. Understanding the Characteristics of Android Wear OS. In Proc. ACM Int. Conf. Mobile Systems, Applications, & Services (MobiSys), 2016.
- X. Liu, T. Chen, F. Qian, Z. Guo, F. X. Lin, X. Wang, and K. Chen. Characterizing Smartwatch Usage in the Wild. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, 2017.
- LKML. [GIT PULL] PM updates for 2.6.33, 2009.
- D. Loghin, B. M. Tudor, H. Zhang, B. C. Ooi, and Y. M. Teo. A Performance Study of Big Data on Small Nodes. Proc. VLDB Endow., 8(7):762-773, 2015.
- G. Lu, J. Zhan, X. Lin, C. Tan, and L. Wang. On Hori- zontal Decomposition of the Operating System. CoRR, abs/1604.01378, 2016.
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazel- wood. Pin: Building customized program analysis tools with dynamic instrumentation. In Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI), 2005.
- LWN. Redesigning asynchronous suspend/resume. https://lwn.net/Articles/366915/, 2009.
- A. Madhavapeddy, R. Mortier, C. Rotsos, D. Scott, B. Singh, T. Gazagnaire, S. Smith, S. Hand, and J. Crowcroft. Unikernels: Library operating systems for the cloud. In Proc. ACM Int. Conf. Architectural Support for Programming Languages & Operating Systems (ASPLOS), 2013.
- MediaTek. Microsoft Azure Sphere MCU with ex- tensive I/O peripheral subsystem for diverse IoT ap- plications. https://www.mediatek.com/products/ azureSphere/mt3620, 2018.
- D. Meisner, B. T. Gold, and T. F. Wenisch. PowerNap: Eliminating Server Idle Power. In Proc. ACM Int. Conf. Architectural Support for Programming Languages & Operating Systems (ASPLOS), 2009.
- D. Meisner and T. F. Wenisch. DreamWeaver: architec- tural support for deep sleep. In Proc. ACM Int. Conf. Architectural Support for Programming Languages & Operating Systems (ASPLOS), 2012.
- Micron Technology, Inc. TN4201 LPDDR2 Sys- tem Power Calculator. https://www.micron.com/ support/tools-and-utilities/power-calc, 2013.
- Mike Turquette. The Common Clk Framework. https: //www.kernel.org/doc/Documentation/clk.txt.
- C. Min, W. Kang, M. Kumar, S. Kashyap, S. Maass, H. Jo, and T. Kim. Solros: a data-centric operating sys- tem architecture for heterogeneous computing. In Proc. The European Conf. Computer Systems (EuroSys), 2018.
- J. Mogul, J. Mudigonda, N. Binkert, P. Ranganathan, and V. Talwar. Using Asymmetric Single-ISA CMPs to Save Energy on Operating Systems. IEEE Micro, 28(3):26-41, 2008.
- J. Morrison, D. Yang, and C. Davis. Apple watch: teardown. https://www.techinsights.com/about- techinsights/overview/blog/apple-watch- teardown/. (Accessed on 01/10/2019).
- N. Nethercote and J. Seward. Valgrind: A Frame- work for Heavyweight Dynamic Binary Instrumenta- tion. In Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI), 2007.
- E. B. Nightingale, O. Hodson, R. McIlroy, C. Hawblitzel, and G. Hunt. Helios: heterogeneous multiprocessing with satellite kernels. In Proc. ACM Symp. Operating Systems Principles (SOSP), 2009.
- NXP Semiconductors. i.MX 6SoloX -fact sheet. https://www.nxp.com/docs/en/fact-sheet/ IMX6SOLOXFS.pdf. (Accessed on 05/14/2019).
- NXP Semiconductors. i.MX 7DS power consump- tion measurement. https://www.nxp.com/docs/en/ application-note/AN5383.pdf, 2016.
- NXP Semiconductors. i.MX 7 Series Applications Processors | Arm® Cortex®-A7, Cortex-M4 | NXP. https://www.nxp.com/products/processors-and- microcontrollers/arm-based-processors-and- mcus/i.mx-applications-processors/i.mx-7- processors:IMX7-SERIES, 2017. (Accessed on 05/14/2019).
- H. Oi. A Case Study of Energy Efficiency on a Het- erogeneous Multi-Processor. SIGMETRICS Perform. Eval. Rev., 45(2):70-72, 2017.
- Y. Padioleau, J. L. Lawall, R. R. Hansen, and G. Muller. Documenting and automating collateral evolutions in Linux device drivers. In J. S. Sventek and S. Hand, editors, Proc. The European Conf. Computer Systems (EuroSys), 2008.
- Y. Padioleau, J. L. Lawall, and G. Muller. Understanding collateral evolution in Linux device drivers. In ACM SIGOPS Operating Systems Review, 2006.
- N. Peters, S. Park, S. Chakraborty, B. Meurer, H. Payer, and D. Clifford. Web browser workload character- ization for power management on HMP platforms. In Proc. IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES), 2016.
- A. Ponomarenko. ABI Compliance Checker. https:// lvc.github.io/abi-compliance-checker/, 2018.
- D. E. Porter, S. Boyd-Wickizer, J. Howell, R. Olinsky, and G. C. Hunt. Rethinking the Library OS from the Top Down. In Proc. ACM Int. Conf. Architectural Support for Programming Languages & Operating Systems (ASPLOS), 2011.
- A. Reid. Trustworthy Specifications of ARM v8-A and v8-M System Level Architecture. In Proc. Formal Methods in Computer-Aided Design (FMCAD), 2016.
- S. Rokicki, E. Rohou, and S. Derrien. Hardware- accelerated dynamic binary translation. In Proc.
- ACM/IEEE Design Automation & Test in Europe Conf. (DATE), 2017.
- S. Rokicki, E. Rohou, and S. Derrien. Supporting runtime reconfigurable VLIWs cores through dynamic binary translation. In 2018 Design, Automation & Test in Europe Conference & Exhibition, DATE 2018, Dresden, Germany, March 19-23, 2018, 2018.
- Y. Shan, Y. Huang, Y. Chen, and Y. Zhang. LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation. In Proc. USENIX Conf. Operating Systems Design and Implementation (OSDI), 2018.
- H. Shen, A. Balasubramanian, A. LaMarca, and D. Wetherall. Enhancing Mobile Apps to Use Sensor Hubs Without Programmer Effort. In Proc. Int. Conf. Ubiquitous Computing (UbiComp), 2015.
- M. Silberstein, B. Ford, I. Keidar, and E. Witchel. GPUfs: Integrating a File System with GPUs. In Proc. ACM Int. Conf. Architectural Support for Programming Languages & Operating Systems (ASPLOS), 2013.
- J. Sorber, N. Banerjee, M. D. Corner, and S. Rollins. Turducken: hierarchical power management for mobile devices. In Proc. ACM Int. Conf. Mobile Systems, Applications, & Services (MobiSys), 2005.
- M. M. Swift, M. Annamalai, B. N. Bershad, and H. M. Levy. Recovering Device Drivers. In Proc. USENIX Conf. Operating Systems Design and Implementation (OSDI), 2004.
- M. M. Swift, B. N. Bershad, and H. M. Levy. Improv- ing the Reliability of Commodity Operating Systems. In Proc. ACM Symp. Operating Systems Principles (SOSP), 2003.
- Texas Instruments. AM5728 Sitara Processor: Dual Arm Cortex-A15 & Dual DSP, Multimedia | TI.com. http://www.ti.com/product/AM5728. (Accessed on 05/14/2019).
- Texas Instruments. Cortex-M3: Proces- sor technical reference manual. http: //infocenter.arm.com/help/index.jsp?topic= /com.arm.doc.ddi0337h/index.html. (Accessed on 05/07/2019).
- Texas Instruments. OMAP4 Applications Processor: Technical Reference Manual. http://www.ti.com/ lit/ug/swpu235ab/swpu235ab.pdf, 2010. (Ac- cessed on 05/14/2019).
- D. Vasisht, Z. Kapetanovic, J. Won, X. Jin, R. Chan- dra, S. Sinha, A. Kapoor, M. Sudarshan, and S. Strat- man. FarmBeats: An IoT Platform for Data-Driven Agri- culture. In Proc. USENIX Symp. Networked Systems Design and Implementation (NSDI), 2017.
- VMWARE. Virtual Machine to Physical Machine Migration. https://www.vmware.com/support/v2p/ doc/V2P_TechNote.pdf, 2004.
- W. Wang, S. McCamant, A. Zhai, and P.-C. Yew. Enhancing Cross-ISA DBT Through Automatically Learned Translation Rules. In Proc. ACM Int. Conf. Architectural Support for Programming Languages & Operating Systems (ASPLOS), 2018.
- W. Wang, P.-C. Yew, A. Zhai, S. McCamant, Y. Wu, and J. Bobba. Enabling Cross-ISA Offloading for COTS Binaries. In Proc. ACM Int. Conf. Mobile Systems, Applications, & Services (MobiSys), 2017.
- D. Wentzlaff and A. Agarwal. Factored operating sys- tems (fos): the case for a scalable operating system for multicores. SIGOPS Oper. Syst. Rev., 43(2):76-85, 2009.
- S. L. Xi, M. Guevara, J. Nelson, P. Pensabene, and B. C. Lee. Understanding the Critical Path in Power State Transition Latencies. In Proc. ACM/IEEE Int. Symp. Low Power Electronics & Design (ISLPED), 2013.
- C. Xu, F. X. Lin, Y. Wang, and L. Zhong. Au- tomated OS-level Device Power Management for SoCs. In Proc. ACM Int. Conf. Architectural Support for Programming Languages & Operating Systems (ASPLOS), 2015.
- F. Xu, Y. Liu, T. Moscibroda, R. Chandra, L. Jin, Y. Zhang, and Q. Li. Optimizing Background Email Sync on Smartphones. In Proc. ACM Int. Conf. Mobile Systems, Applications, & Services (MobiSys), 2013.
- S. Zhai, L. Guo, X. Li, and F. X. Lin. Decelerating Sus- pend and Resume in Operating Systems. In Proc. ACM Workshp. Mobile Computing Systems & Applications (HotMobile), 2017.
- Q. Zhu, M. Zhu, B. Wu, X. Shen, K. Shen, and Z. Wang. Software Engagement with Sleeping CPUs. In Proc. Workshp. Hot Topics in Operating Systems (HotOS), 2015.