Scalable real-time schedulers for many-cores
SCHED_DEADLINE is an open-source implementation  of an EDF-based  resource-reservation scheduler for the Linux kernel, that can be used to enhance predictability in the timing behaviour for real-time and multimedia workloads running on the Linux OS.
Initially supporting merely partitioned scheduling, SCHED_DEADLINE has recently been enriched with global and clustered scheduling support. In clustered scheduling, a subset of the available tasks is globally scheduled across a subset (i.e., a cluster) of the available CPUs. Whilst being attractive for its reduced run-time overheads, partitioned scheduling has the drawback that tasks need to be partitioned across the available CPUs and cores ahead of time. Optimally partitioning tasks sets across processors (e.g., so as to achieve the minimum maximum per-CPU utilisation) is unfortunately an NP-hard problem, something that doesn't hurt too much when small multi-cores are used, or when the task set characteristics is statically fixed and completely known ahead of time. However, for big multi-core and many-core systems, this task can take several hours of computations. Global scheduling relieves the system from such a burden, as the system itself is capable of automatically distributing the real-time workload across the available CPUs at run-time. Generally, this can lead to a more balanced spreading of the real-time workload over the CPUs, it does not require information about the real-time workload ahead of time, and it keeps re-configuring dynamically as new real-time tasks are started in the system, or existing ones are terminated.
Benefits do not come for free
These advantages do not come for free: a global scheduler has increased overheads, because it needs to decide which tasks to run over which CPUs, and especially the scheduling logic deployed over the various cores need to synchronise with each other, for example to guarantee that the earliest deadline tasks are running at any given time. Additionally, whenever migrating tasks across CPUs, the cache effectiveness is decreased, thus tasks' execution times may increase because of the need for fetching again data from the main memory. Still, they might get a chance to complete earlier, because of the potential exploitation of each idle cycle of any CPU in the system, as opposed to a partitioned scheduler, that may leave CPUs idle while tasks are ready to run on other CPUs.
However, at least the first problem of global (EDF) scheduling can be tackled by using proper data structures and synchronisation logics across the competing/collaborating CPUs. For example, the scheduling overheads can be controlled to scale logarithmically with the number of affine CPUs, by recurring to a heap data structure for speeding up the search for the CPU running the task with the least urgent deadline . Furthermore, on big multi-core machines, it is possible to define clusters of CPUs copying the hardware cache sharing architecture. It is worth noting that same problems arose within the stock SCHED_FIFO real-time Linux scheduling policy. Keeping track of tasks' priorities at a global level using a fine grained locking scheme came up to be highly inefficient. Same pathology, different treatment: in this case locks were substituted with atomic variables .
 D. Faggioli, F. Checconi, M. Trimarchi, C. Scordino, "An EDF scheduling class for the Linux kernel" In proceedings of the 11th Real-Time Linux Workshop (RTLWS), Dresden (Germany), October 2009.
 John A. Stankovic, K.Ramamritham, M.Spuri, G.Buttazzo, Deadline Scheduling for Real-Time Systems: EDF and Related Algorithms, Kluwer Academic Pubishers, 1998.
 Juri Lelli, Giuseppe Lipari, Dario Faggioli, Tommaso Cucinotta, An efficient and scalable implementation of global EDF in Linux In proceedings of the 7th Workshop on Operating Systems Platforms for Embedded Real-Time Applications (OSPERT), Porto (Portugal), July 2011.