|
Overcoming the Limit of Thread CreationBy Jagruti TrivediWe want to hear from you! Please send us your FEEDBACK . The following Technical Article may contain actual software programs in source code form. This source code is made available for developers to use as needed, pursuant to the terms and conditions of this license. Introduction This Technical Article discusses the limit of thread creation and the ways to overcome this limit. It will also provide an in-depth explanation of the internals of thread creation. Problem Description When developing an application that creates a large number of threads,
after approximately the 3399th
call to The man page for In the following example, in each case, the thread performs some processing
and then exits by running
off the end of the thread function. Using the
Note: Compile the program with Forte[tm] Workshop 6 Update 2 on a machine running Solaris 8. testmachine% cc -mt -o pt pt.c -lpthread -lposix4 testmachine% ./pt "Testing for
thread resource problem Stopped with error 11: Resource temporarily unavailable
trying to create thread #3399"
Solution The above situation may appear to occur because of the way the
threads are created using the default attributes. The value of the default
attributes are as follows:
The default attributes imply that they are "joinable" (that is, the
application can at some later time extract information from a deceased thread as to its return code).
To facilitate joining, the threads libraries must maintain return code information for the deceased
thread. The user application may try to achieve this by simply keeping the user level thread
structure around, but freeing data such as the stack, which may be attached
to it until the (The example in our problem description is modified to set the attribute with detached mode by setting the attribute "detachstate" to PTHREAD_CREATE_DETACHED.)
testmachine% cc -mt -o pt_detached pt_detached.c -lpthread -lposix4 testmachine% ./pt.detached "Testing for thread resource problem Stopped with error 11: Resource temporarily unavailable trying to create thread #3399" Unfortunately, changing the attribute to detached state does not allow you to have more than 3399 threads either. The way to increase the number of threads beyond 3399 is to link the application with an alternate One-level libthread library, which is provided by the Solaris 8 Operating Environment. This One-level model provides the mapping of user level threads to one-to-one LWPs. To link with the alternate implementation, use the following run-path -R option when linking the application. For POSIX threads use:
For Solaris threads use:
Notice the difference between For multithreaded applications that have been previously linked with the standard threads library, the environment variable LD_LIBRARY_PATH can be set as follows to bind the application at run time to the alternate threads library: LD_LIBRARY_PATH Now, let's run our test application program, setting the environment variable LD_LIBRARY_PATH. testmachine% Testing for thread resource problem
testmachine% Testing for thread resource problem
NOTE: This allows you to create more than 100,000 threads with both default attributes set and the detached state attribute set. How This Works: In the Two-level threading model the user links their
application with
The Two-level model, which is standard for Solaris 8, has much different semantics when a thread exits than the One-level of an alternate libthread implementation. When a thread exits it is examined to see if it is joinable or detached.
If it is joinable, it is kept around until a Two-level model Here, the stacks are mmapped in slightly more than 8MB chunks but with MAP_NORESERVE.
Therefore, the actual amount of swap consumed is dependent on the application's
code. Using a test system, our basic program does 465 mmaps of
8421376 bytes. This works out to approximately 3.8 GB of
potential swap reservations. The real problem here is that we're running
out of address space in the
application, since it's a 32 bit application and can have only 4GB
of space. On a system with a smaller amount of swap, you might run
out of swap first. The net result is that you can only create approximately 3700
threads inside a simple 32-bit application using default attributes,
without using Why doesn't the detached model work? It suffers from the same problems
but in a different way. If a detached thread exits it is placed on a reaper
queue. There is a reaper thread whose job it is take things off the
queue and do the final resource cleanup. The problem is that the reaping
never seems to occur. When we ran the aforementioned basic program,
upon adding a pause a blizzard of munmaps occurred after the failed One-level model The One-level model takes a much more simplistic approach. When a detached thread exits, its resources are immediately given back to the system. There is no reaper thread and the stacks are immediately available for reuse. The "default" case works because the number and size of mmapped anonymous
memory is drastically reduced. Each mmap consumes 1040384 bytes and there
are fewer mmaps. This is driven more by the
fact that threads in the One-level model are able to run faster than
in the Two-level model. Since our threads just exit, the amount of work
done is very trivial. Immediately having your own LWP to run on simplifies
things greatly compared to the Two-level model. We had no more than 130
or so threads active at any one time in our One-level model test. The net result is that
the same program uses approximately 130MB of Virtual Address to create and run through
4000 threads, compared to almost 4GB of Virtual Address in the standard Two-level threads model.
Additionally, the
One-level model recycles stacks upon exit while the Two-level model
seems to recycle only when reclaimed via a Given the structure of the testcase in the preceding description, it seems likely that you could drive the number of threads astronomically high with the One-level model (we have done 100,000) without failure. This might lead you to a totally unreasonable conclusion about the number of threads you can have active at any one time. You might conclude that the number of threads created with default attributes and stack sizes that you can have active any one time inside a 32 bit application will be around 3500 (depending on how much Virtual Address the rest of the application uses) irrespective of the threading model. This is driven solely by address space considerations. In order to have more threads, you need to create the threads with smaller stacks. For the greatest number of threads, you need a system with a lots of swap and need to compile your code as a 64-bit application. References: Solaris 8 Software Developer Collection - Multithreaded Programming Guide. | |||||||||||||||||||||||||||||||||||||||||||||