21

I opened up Task Manager and looked under the "System" area and saw:

Threads: 1337

Since I have a dual-core processor with hyper-threading available (meaning four threads), how is it possible to have 1000+ threads when my processor is only supposed to have four?

Peter Mortensen
  • 12,090
  • 23
  • 70
  • 90
dpl47
  • 329
  • 1
  • 3
  • 8

3 Answers3

55

The simple answer is that not all threads are executing simultaneously. For a fuller explanation, read on.

The operating system's task scheduler is generally thought of to schedule applications, and doing so allows you to perform one task while the computer is working on another. In the old days, the litmus test of multitasking was formatting a floppy disk while doing something else. If you wanted to really put the OS to the test, you'd format a floppy disk while downloading a file over a modem connected to the serial port. As the hardware became powerful enough to actually do it in a meaningful way, video playback sometimes featured in such tests as well. If the OS's task scheduler could handle running those tasks smoothly, then it could handle anything.

However, the task scheduler doesn't actually schedule applications (processes), it schedules threads. Each application has at least one thread, but can potentially use a large number of threads to split the work it does into related or independent parts. For example, it is common for an application to have one thread that handles the user interface, and to create another thread when the user initiates a potentially long-running operation (which might be things like printing, recalculating a spreadsheet, a development environment doing a symbol lookup, etc. etc.). Some programming environments introduce some amount of threads invisibly-to-the-programmer; for example, Java and .NET might do garbage collection in a separate thread, which is out of the immediate control of the programmer. Some programs create a number of threads early on and pool them, because creating new threads is a comparatively expensive operation (so you don't necessarily want to have to create a thread every time you need one). Anything that does a preview is usually done in a separate thread, so the remainder of the UI remains responsive while the preview is being generated. And so on. Taken together, all this means that the number of threads on the system at any time can easily be many times the number of processes.

Each thread can be in one of a few possible states, but the most important distinction is between running, runnable and waiting states; the terminology can differ slightly, but that's the general idea. At any one time, only one thread per virtual (because of hyperthreading and similar technologies) CPU core can be running (that is, executing machine code instructions), but any number of threads can be runnable (meaning that it is a candidate to get the CPU the next time the scheduler needs to make a decision about which thread should be allowed to run). Waiting (also known as blocked) threads are just that, waiting for something - the most common cases probably are that it is waiting for user, disk or network I/O (user input in particular is exceptionally slow).

The thread count you see in the task manager is the total number of threads in any of these states. For example, the Windows 7 system I am typing this on currently has around 70 processes started but almost 900 threads. With all the background processes to handle various tasks and how they are probably subdivided into a multitude of threads each, this is not an outrageous number.

Going a bit more into the depths of technical implementation, at the very core of a pre-emptively multitasking operating system's task scheduler is usually some kind of hardware interrupt hook. This means that the kernel can halt the CPU when it has no useful work to perform (this is almost certainly one of the reasons, if not the reason, why Linux checks the HLT instruction on boot on IA-32-compatible CPUs, and probably does similar checks on other architectures), safe in the knowledge that at some reasonably determinate future time, an interrupt will fire and the task scheduler will be invoked. Since the interrupt fires regardless of what other work the CPU is performing (that's the idea behind interrupts), the scheduler gets executed regularly and gets a chance to determine which thread should be executed during the following time slice. Since context switches are relatively expensive it's usually possible (at least through the source code) to tune how aggressively the scheduler switches between threads; switching threads more often causes the system to become more responsive, but the switching overhead means that the overall time to finish a given set of tasks is longer. The fastest system will be one that only switches between threads when the running thread is no longer possible to run (meaning it gets blocked waiting on something, or it has finished its job) because that minimizes the overhead, whereas the most responsive system will switch between threads every time the scheduler is invoked because that minimizes the average time to wait before a particular thread gets CPU time. The ideal setting usually is somewhere in between these two, and the tradeoff between those choices is likely one big reason why Linux offers multiple schedulers to choose from as well as some tuning parameters through the kernel configuration.

Cooperatively multitasking OSes and environments, on the other hand (Windows 3.x being one example), rely on each application to regularly cede control to the scheduler. There's usually an API function specifically meant to do that, and oftentimes many API functions will do it as part of their internal execution flow, because it helps make the user experience smoother. That design approach works well as long as all applications are well-behaved and cede control with short intervals during any long-running operations (long-running meaning more than a small fraction of a second), but an application that does not can clog up the entire system. This is one big reason why Windows 3.x did so poorly on the multitasking test I mentioned above, while OS/2 strolled along merrily while performing the same tasks on the same hardware: an application could tell the floppy disk drive to write a certain sector, and the time it took to do so before the call returned could actually be measurable (tens to hundreds of milliseconds or more); a preemptively multitasking system would have its scheduler break in on its next scheduled invocation, notice that the thread that is currently "running" is actually blocked by the write call and simply switch to another thread that is runnable. (In practice it's a little more involved, but that's the general idea.)

In both preemptively multitasking and cooperative environments, there is also the possibility of different threads having different priorities. For example, it is probably more important to execute in a timely fashion the thread that is receiving data over a communications link than the one that updates the system time display, so the receiving thread has high priority and the time display updater thread has low priority. Thread priorities play a role in the scheduler's decision of which thread to allow to execute (for example, very simplified, high priority threads should always execute before low priority threads, so even if the low priority thread has work left to do, if the high priority thread becomes runnable it takes precedence), but such specific scheduling decisions do not affect the underlying mechanism design.

user
  • 29,449
  • 11
  • 99
  • 144
  • 11
    I love "user input in particular is exceptionally slow" – John Dvorak Jul 05 '13 at 08:56
  • The "threads" a CPU "has" refers to the number it can execute simultaneously at any given moment. It switches between active threads, giving each a turn, or share of the CPU for a slice of time, when possible. Processes/threads "blocked" or waiting on I/O (such as disk, keyboard/mouse input, etc.) or something else (such as synchronization primitives like mutexes, etc.) skip their turn. – LawrenceC Jul 05 '13 at 14:48
  • @ultrasawblade The OP is referring to the number of threads displayed by the Windows Task Manager, and asking how that number can be >300× the number of "threads" supported by the CPU. I believe my answer properly addresses that (and with +24/-0 votes so far, the community appears to agree). – user Jul 05 '13 at 15:59
  • @ultrasawblade Also, your comment seems contradictory to me: at any given moment, the maximum number of executable threads is identical to the number of execution units in the CPU (taking hyperthreading into account); the actual number of executing threads may be lower, but never higher. Anything else is an illusion made possible by the operating system's scheduler. – user Jul 05 '13 at 16:02
  • @MichaelKjörling I appreciate your in-depth answer! To recap, if I understand this correctly, 1000s of threads may be occupied at a given time, but not executing any particular machine-code at that given time, rather, a maximum of four are active at once, whilst the remaining 1000s are in a non-active state such as "runnable" or "waiting?" – dpl47 Jul 05 '13 at 23:10
  • @dpl47 Exactly. The CPU can only execute a small number of threads at the same time - historically, for PCs, it's been one, but with multi-core CPUs, hyperthreading and other techniques, it's grown into a small handful. This means that at any given moment, only so many threads are actually executing, and all the others are in some kind of wait state (if nothing else, they are waiting for their turn). – user Jul 06 '13 at 08:29
  • @MichaelKjörling Thanks! Now, would I be correct in saying that more than one thing can execute in one thread at a time? (For example a call to multiple methods/functions) – dpl47 Jul 06 '13 at 09:31
  • @dpl47 Not *in one thread at the same time*. However, one thread may make a call which ends up blocking the thread, causing the scheduler to run a different thread, with the *apparent effect* that the other thread is executing on the same core while the method call is executing. Multi-threaded programming is easy in principle, but hard to get right in practice because there basically are no inherent guarantees other than that each thread will get a chance to execute at some point. A lot of programming language innovation these days go into making multi-threaded programming easier and safer. – user Jul 06 '13 at 09:40
  • Thanks a lot, that's very well explained, I never heard about the compromises of a task scheduler. Now that I understand, are there reasons why CPU don't have many small cores and few big ones, instead of similar cores ? Wouldn't it be interesting for programmers, knowing that they can use thread priorities ? I don't understand why CPUs stop at 6 similar cores, instead of diversifying, which might enable better load balancing, and also enable more parallel processing. – jokoon Jul 06 '13 at 11:36
  • @jokoon It's just as easy (maybe even easier) to use the prioritization logic already built into the task scheduler as it would be/than to specifically run particular threads in particular cores. Remember that the scheduler has to be both generic enough to run on a wide range of hardware, well tested and extremely well optimized, more so than almost any code because it runs so often and has such a central role. And suppose it was done; suppose you're running on an 8-core CPU with 2 high-performance cores, and there are four equally high priority threads ready to run. How to schedule those? – user Jul 06 '13 at 12:29
  • @jokoon Also, each core needs roughly the same logic. You can't really do much work with just an [ALU](http://en.wikipedia.org/wiki/Arithmetic_logic_unit), you need tons of peripheral circuitry even for the ALU to be useful (and that's ignoring other parts of the CPU circuitry such as the [FPU](http://en.wikipedia.org/wiki/Floating-point_unit)). So there could probably be some savings, but it would come at greatly increased complexity and would present its own set of challenges for the scheduler. Much easier, then, to just make each core fully featured and let the OS sort it out. – user Jul 06 '13 at 12:36
  • I don't really undertand what were the real CPU improvements in the last decade, besides getting more cores (6 is not many if I think about it), since frequency is pretty much capped. Can't CPUs have many more 32 bit, modest cores with enough features to do anything, instead of bigger ones ? Why not decreasing frequency to 2 or 1 Ghz, and expanding the die size to put more cores ? At this rate, with the current engraving techs, wouldn't those kinds of CPUs be better ? – jokoon Jul 06 '13 at 13:04
  • Also, graphics cards seems to be quite more powerful than CPU, but are much less easy to program. Isn't there some sort of compromise between parallel capabilities and having a good enough instruction set ? That really boggles my mind, I sure don't know everything, but I'm still curious about computer architecture design. – jokoon Jul 06 '13 at 13:08
  • @jokoon: not really -- there are many tasks which aren't parallelizable. For instance, consider something like MD5. Each step in the algorithm depends on data from previous steps, so it is not parallelizable. Also don't forget there's a lot more to CPUs than clock speeds. For instance, Haswell can execute 8 instructions per clock, Ivy Bridge and Sandy Bridge could do 6, while the ARM A15 can only do 3, and A7/A9 can only do 2. – Billy ONeal Jul 06 '13 at 14:32
  • @jokoon: the compromise is to have a seperate CPU and GPU. About the closest thing to a true compromise was the Cell, which had pretty much the worst if both worlds. – Billy ONeal Jul 06 '13 at 14:35
  • @Michael: Of course, it would be possible to write a scheduler that schedules processes rather than threads, such that App A has the same total possible amount of CPU time to use as App B. (Although, one might view such a design as process priority handling atop a thread scheduler. – Billy ONeal Jul 06 '13 at 14:39
  • 1
    Comments are nice to ask for clarification, but they're not suited for extended discussion, and at some point become hard to read through. Can you please take this to [chat] instead? Thank you. – slhck Jul 06 '13 at 14:45
  • 1
    @slhck I created a room, but couldn't find any way to migrate the discussion in these comments, which would be nice. Is that something you can do manually as a moderator? http://chat.stackexchange.com/rooms/9547/couple-cores-couple-hundred-threads – user Jul 06 '13 at 20:12
  • 1
    Unfortunately no. There's an automatic migration process that cannot be triggered manually, but we can't move comments to a chat room. We'll let the comments here stay for the time being, but I'd encourage others to follow up on the discussion in the chat room you created. – slhck Jul 06 '13 at 20:20
  • cell did not have that many cores. – jokoon Jul 06 '13 at 20:36
18

Think about a four laned highway with 1037 vehicles.

Your OS need a lot of running processes to work for lots of services. Even the most simple graphical programs will require multithreaded programming. When you think of your lot of programs opened you see there is a need for sharing the computing power resources.

What your task manager is showing is the current system load. What your comp specs is showing is how many threads are (in frontend) being accepted for parallel execution. Without entering muchly in the difference between hyperthreading and multicore features, with more logical frontend thread accepting, a system will generally perform better.

174140
  • 94
  • 2
  • 13
  • 33
  • 8
    *"Even the most simple graphical programs will require multithreaded programming."* Wrong. It is perfectly possible to write single-threaded GUI applications; up until Windows 95, for all intents and purposes everyone did it that way. It makes certain tasks more complicated (for example, background printing is trivial with multiple threads but decidedly non-trivial in a single threaded application, particularly if you are also memory constrained as was the case then), but there is a huge difference between "X is made easier by Y" and "X requires Y". – user Jul 05 '13 at 08:59
  • 8
    @MichaelKjörling: "up until Windows 95, for all intents and purposes everyone did it that way"* - really? Even on *nix systems running Motif in the 80's? – LarsH Jul 05 '13 at 14:04
  • @LarsH Good point, and one I thought of too late to edit into the comment. But that does not negate the point: namely, that it is perfectly possible to write single-threaded GUI applications. You don't need multithreading for that, although it does make some tasks (considerably) easier on the programmer. – user Jul 05 '13 at 15:51
  • @MichaelKjörling: I agree, it doesn't negate your point, which is a valid one. (I don't think uprego's incorrect statement negates his point either.) – LarsH Jul 05 '13 at 18:14
  • As an example to what @MichaelKjörling said, Visual Basic (before .NET) as a programming language had *no* multi-threading support. *Everything* was run on a single thread. If you wanted to process user input in the middle of a long-running operation, you'd call `DoEvents`, which would process the message queue - but that was done on the same thread and would block that long-running operation until all messages were processed. (Of course, you could call Win32 API functions and/or create additional processes, but at that point you may as well use one of the lower-level languages.) – Bob Jul 06 '13 at 12:10
  • @Bob `DoEvents` also by design called out to the operating system, meaning it gave the scheduler a chance to kick in and decide that the CPU time could be put to better use elsewhere. Like I said in my answer, that works well as long as all applications are well-behaved, but not all of them were. – user Jul 06 '13 at 12:39
5

We should step back and ask ourselves: How can a computer with a single CPU have two threads?

Threads are software entities, not hardware. To have another thread, you just need memory for the objects that make up the thread, such as a descriptor structure and a stack.

The operating system switches among threads at various times, like inside certain interrupts (such as a timer interrupt) or when threads make calls into the operating system.

Out of all of the threads which exist in the system, only a subset are usually in a state which is commonly called "runnable". Runnable threads are eager to run: they are either executing, or sitting in a "run queue", waiting to be dispatched by the scheduler. Threads which are not runnable are "blocked", waiting to acquire some resource or receive input, or "sleeping" which is like being blocked on input, where the "input" is the passage of time. A "context switch" takes place when the scheduler function in the operating system takes a look at the run queue of a processor, and chooses a different thread to execute.

Do not be confused by "hyperthreading", which is Intel's name for a particular hardware feature.

Kaz
  • 2,631
  • 1
  • 18
  • 23