Handling and Avoiding Stack Overflows
in the Solaris Operating Environment
by Andrew Fox
(May, 2003)
We want to hear from you! Please send us your
FEEDBACK.
The following article may contain actual software programs in source code form.
This source code is made available for developers to use as needed, pursuant to the
terms and conditions of this license.
Purpose
The purpose of this document is to discuss possible ways of avoiding,
trapping and potentially
recovering from user land stack overflow conditions in the Solaris Operating
Environment (OE).
Trapping User Land Stack Overflows
Since the release of the Solaris 2.6 OE, it has been possible to create stacks that
feature
'Red Zones'. The use of these
red zones is optional, but they are created by default when calling
something like thr_create.
The red zone is actually a protected page immediately beyond the stack's
limit. It is not mapped,
so it will subsequently force errors if written to. Red zones are not limited
to stacks, but can be
utilised to detect writes past the end of many types of buffers. Routines
like pagefault()
additionally check to make sure that they do not go beyond the red zone.
When calling a
function, the initial save to the stack will cause a SIGSEGV to be generated
if that save
occurs to memory within the red zone. It is then possible for this SIGSEGV
signal to be caught,
and by using the extra information supplied (see sigaction for this), determine
if a stack overflow
condition has occurred.
The following code illustrates how these
SIGSEGVs can be caught,
and also how to determine if the cause was a stack overflow. Once you
have caught the signal,
and determined that it was a stack overflow, the recovery procedures
are up to the
individual developer.
Example Code
test.c
#include <stdio.h>
#include <signal.h>
#include <setjmp.h>
sigjmp_buf env;
main()
{
trap_signal(SIGSEGV);
switch (sigsetjmp(env, 1)) {
case 0:
fprintf(stdout, "\nBeginning stack recursion\n");
a(1);
break;
case 1:
fprintf(stdout, "\nSEGV but not STACK overflow\n");
break;
case 2:
fprintf(stdout, "\nSEGV stack overflow\n");
break;
}
}
sigtrap.c
#include <stdio.h>
#include <signal.h>
#include <setjmp.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/frame.h>
#include <unistd.h>
#include <ucontext.h>
extern sigjmp_buf env;
void *altstack = NULL;
#define btopr(x) ((u_int)(((x) + pageoffset) >> pageshift))
#define FROM_WIDTH 20
void
sig_trap(int a, siginfo_t * b, void *c)
{
u_int pc;
u_int sp;
ucontext_t *uc;
gregset_t *regs;
u_int tspage;
struct frame *fp;
char from[FROM_WIDTH];
int i;
uc = (ucontext_t *) c;
uc->uc_mcontext.gregs;
fprintf(stderr, "\n*** SIGNAL TRAPPED: SIGNAL %d ***\n", a);
fprintf(stderr, "si_signo = %d\n", b->si_signo);
fprintf(stderr, "si_code = %d\n", b->si_code);
fprintf(stderr, "si_errno = %d\n", b->si_errno);
fprintf(stderr, "si_addr = %p\n", b->si_addr);
fprintf(stderr, "si_trapno= %p\n", b->si_trapno);
fprintf(stderr, "si_pc = %p\n\n", b->si_pc);
fprintf(stderr, "stack info:\nsp: %8x size: %8x\n"
"flags: %d\n\n",
uc->uc_stack.ss_sp, uc->uc_stack.ss_size,
uc->uc_stack.ss_flags);
fprintf(stderr, "Registers:\npc: %8x npc: %8x\n"
"o0: %8x o1: %8x o2: %8x o3: %8x\n"
"o4: %8x o5: %8x o6: %8x o7: %8x\n\n"
"g1: %8x g2: %8x g3: %8x g4: %8x\n"
"g5: %8x g6: %8x g7: %8x\n\n",
uc->uc_mcontext.gregs[REG_PC],
uc->uc_mcontext.gregs[REG_nPC],
uc->uc_mcontext.gregs[REG_O0],
uc->uc_mcontext.gregs[REG_O1],
uc->uc_mcontext.gregs[REG_O2],
uc->uc_mcontext.gregs[REG_O3],
uc->uc_mcontext.gregs[REG_O4],
uc->uc_mcontext.gregs[REG_O5],
uc->uc_mcontext.gregs[REG_O6],
uc->uc_mcontext.gregs[REG_O7],
uc->uc_mcontext.gregs[REG_G1],
uc->uc_mcontext.gregs[REG_G2],
uc->uc_mcontext.gregs[REG_G3],
uc->uc_mcontext.gregs[REG_G4],
uc->uc_mcontext.gregs[REG_G5],
uc->uc_mcontext.gregs[REG_G6],
uc->uc_mcontext.gregs[REG_G7]);
if (b->si_code == SEGV_MAPERR) {
siglongjmp(env, 2);
} else {
siglongjmp(env, 1);
}
}
int
trap_signal(int signo)
{
stack_t sigstk;
struct sigaction act;
u_int ps;
if (altstack == NULL &&
((sigstk.ss_sp = (char *) malloc(SIGSTKSZ)) == NULL)) {
fprintf(stderr, "can't alloc alt stack\n");
return (1);
}
sigstk.ss_size = SIGSTKSZ;
sigstk.ss_flags = 0;
if (sigaltstack(&sigstk, (stack_t *) 0) < 0) {
perror("sigaltstack");
return (1);
}
memset(&act, 0, sizeof(struct sigaction));
act.sa_sigaction = sig_trap;
act.sa_flags = (SA_SIGINFO | SA_ONSTACK);
if (sigaction(SIGSEGV, &act, NULL) != 0) {
perror("sigaction");
return (1);
}
return (0);
}
Recovering From a Stack Overflow
The capabilities for recovery from the stack overflow are dependent on the
Solaris OE version being
used. Prior to the Solaris 9 OE, the developer would need to keep information about
the addresses of the
stacks for each of their threads. This would need to be done at the time
of stack creation by the
application. In the handler, it would then be possible to compare the fault address
with the stored address of the stack base. If the fault address was less
than the stack base address, we would then initiate recovery. Recovery could possibly
take the form of mmapping in
another page. This approach would undoubtedly work, but dynamic stacks are
to be avoided if
possible.
The code illustrated above uses siglongjmp() and sigsetjmp() to
recover from the stack
overflow. As sigsetjmp() preserves the environment of the thread, another
recovery strategy could
be to kill off the thread and then restart it. In the Solaris 9 OE, several new
features have been introduced
in conjunction with the Sun ONE Studio 7.0 Compiler Collection release, which make
trapping and
handling stack
overflows a lot easier. In the compiler, we have introduced the '-xcheck=stkovf'
directive. This
directive enables stack overflow checks. When -xcheck=stkovf is specified
in a compilation step,
the compiler will generate code on routine entry to test whether the routine's
new stack frame will
extend beyond the stack's current bounds. If so, the new stack frame will
not be created; instead, a
SIGSEGV signal will be delivered to the current light-weight process (LWP) with a fault
address
in the stack's red zone.
In the Solaris 9 OE, signal handlers can distinguish stack overflow from other
address space violations by
calling stack_violation(3C). Applications designed to recover from stack
overflow should handle
SIGSEGV on an alternate signal stack, as illustrated in the preceding code example.
All routines compiled with -xcheck=stkovf will check their stack frames
for overflow.
However, since stack
overflow may occur in routines not compiled using -xcheck=stkovf, there can
be no guarantee
that stack overflow will always be detected. Undetected stack overflow in multi-threaded
programs may result in data corruption in a neighboring thread's stack.
The one caveat for programs
compiled with -xcheck=stkovf is that they must be dynamically linked.
Avoiding Stack Overflows
When compiling your application, there is a very simple way to help avoid
stack overflows.
Very often, applications are compiled using the -O flag for optimisation.
This equates to the -xO2
level of optimisation. With this flag set, the compiler does not perform
leaf node optimisation.
Without this optimistation, there can be a vastly increased number of save/restore
operations on
function entry and exit. This may eventually lead to a stack overflow if
a developer has a lot of recursion
in their application. Using the -xO3 option will enable leaf node
optimistaion,
which in turn results
in far fewer save/restore operations on function entry and exit.
|