2. SUBGRADIENTS

If f is a real-valued function de¬ned on a set C in RL, then the graph of f is

the set of points (x, z) in RL> of the form (x, f (x)), where x + C. If f is a

differentiable function, then given a point x in the interior of C, the graph of

f has a tangent plane at (x , f (x )) whose normal vector is (

f (x ), 91).

If C is convex and f is convex, f need not be differentiable at an interior

point of C, as the function f de¬ned by f (x) : "x" shows. In this section we

shall introduce a generalization of the gradient vector, called the subgradient,

that exists at points (x, f (x)) of the graph of a convex function f, where x is

an interior point of C. We shall show that if f is a convex function de¬ned

on C and x is an interior point of C, then epi( f ) has a supporting hyperplane

at the point (x, f (x)). The supporting hyperplane need not be unique. In

the next section we shall show that f is differentiable at x if and only if f has

a unique subgradient at x. This subgradient turns out to be

f (x). Thus, if f

is differentiable at x, there is a unique supporting hyperplane to epi( f ) at

(x, f (x)). This hyperplane is the tangent plane to the graph at (x, f (x)). If

( , 91) is a normal vector to a supporting hyperplane, then has some of the

properties that the gradient vector would have.

We now de¬ne the notion of subgradient.

De¬nition 2.1. A function f de¬ned on a set C is said to have a subgradient at

a point x in C if there exists a vector in RL such that

f (x) . f (x ) ; 1 , x 9 x 2

for all x + C. The vector is called a subgradient.

Geometrically, is a subgradient of f at x if the graph of f in RL> lies on

or above the graph of the hyperplane y : f (x );1 , x9x 2. Since (x , f (x ))

is in this hyperplane, this hyperplane will be a supporting hyperplane to epi( f )

at (x , f (x )). Thus, the existence of a subgradient is a statement about the

existence of a nonvertical supporting hyperplane to epi( f ). In Figure 3.5 we

illustrate the de¬nition with the function f (x) : "x" at x : 0.

Any line through the origin with slope satisfying 91 - - 1 has the

property that

"x" . x : "0" ; (x 9 0)

for all x. Thus any in the interval [91, 1] is a subgradient of "x" at x : 0. If

x 9 0, then : 1 is the only subgradient of "x" at x ; if x : 0, then : 91

is the only subgradient of "x" at x .

De¬nition 2.2. The set of subgradients of a function f at a point x is called

the subdifferential of f at x and is denoted by *f (x ).

SUBGRADIENTS 103

Figure 3.5.

The subdifferential of "x" at x : 0 is the interval [91, 1]. The subdifferential

of "x" at x 9 0 is the singleton +1,, and the subdifferential at x : 0 is +91,.

Example 2.1. The subdifferential will be the empty set if no subgradient exists.

This may happen, as the following example shows. Let C : [91, 1] and let

f (x) : 9(1 9 x. (Sketch the graph.) The graph of f has a nonvertical

tangent line at any point x in (91, 1). From the graph it appears that for such

x the graph lies above the tangent line, and so *f (x ) : + f (x ),. Although

this can be veri¬ed in this example by relatively straightforward calculations,

we shall not do so. The validity of the statement will follow from general

theorems that we will prove. At the points x : <1, the graph has a vertical

tangent line, but the tangent line does not lie below the graph. Thus it appears

that f does not have a subgradient at these points. We now show this

analytically.

A number is a subgradient of f at x : 91 if and only if

9(1 9 x . (x ; 1) for all x in [91, 1].

For x in (91, 1] this inequality is equivalent to

1 9 x

9 . x in (91, 1].

1;x

104 CONVEX FUNCTIONS

If we let x in (91, 1] tend to 91, the left-hand side of this inequality tends to

9-. Hence there is no + R such that the subgradient inequality holds for all

x in [91, 1].

A similar argument shows that *f (1) : `.

T 2.1. L et f be a convex function de¬ned on a convex set C. T hen at

each point x in int(C) there exists a vector in RL such that

f (x) . f (x ) ; 1 , x 9 x 2 (1)

for all x in C. In other words, *f (x ) " ` at every interior point x of C.

Example 2.1 shows that the requirement x + int(C) is essential.

Proof. We shall show that the conclusion of the theorem is a consequence

of the existence of a supporting hyperplane to epi( f ) at (x , f (x )).

The point (x , f (x )) is a boundary point of epi( f ). Since epi( f ) is convex,

we may apply Theorem II.4.1 with C : epi( f ) and get that there exists a vector

(a, ) " (0, 0) in RL> such that

1(a, ), (x, y)2 - 1(a, ), (x , f (x ))2

for all (x, y) in epi( f ). This relation is equivalent to

1a, x2 ; y - 1a, x 2 ; f (x ) (2)

for all (x, y) in epi( f ), that is, for all x in C and y . f (x).

Since the right-hand side of (2) is a constant, if were strictly positive, we

could obtain a contradiction by letting y become arbitrarily large. Hence - 0.

If : 0, then a " 0 and (2) becomes

1a, x2 - 1a, x 2 all x + C. (3)

Since x + int(C), there exists an 9 0 such that x : x ; a belongs to C.

Setting x : x in (3) gives 1a, a2 - 0, which is a contradiction. Thus, : 0.

In (2) we now divide through by : 0 and set y : f (x) to get

a

f (x) . f (x ) ; ,x 9x

for all x + C. Now set : 9a/ to get (1).

C¬¬ 1. If f is strictly convex, then strict inequality holds in (1).

We leave the proof as an exercise.

SUBGRADIENTS 105

The converse of Theorem 2.1 is false. To see this, let C be the square in R,

+x : 0 - x - 1, 0 - x - 1,. Let f (x) : 0 for x such that 0 - x - 1 and

0 : x - 1. Let f (x) : 9 "x 9 " for x such that 0 - x - 1 and x : 0. The

function f has a unique subgradient : 0 at every point in int(C), yet f is not

convex on C. The function f, however, is convex on int(C), and this phenom-

enon illustrates the general state of affairs.

T 2.2. L et C be a convex set in RL. If at each point x in C f has a

subgradient, then f is convex on C.

Remark 2.1. If we merely assume that at each point in int(C) the subdifferential

is not empty, then we can only conclude that f is convex on int(C).

Proof. Let x and y be two points in C and let z : x ; y, 9 0, 9 0,

; : 1. Since f has a subgradient at z, say ,

f (x) . f (z) ; 1 , x 9 z2 : f (z) ; 1 , x 9 y2,

f (y) . f (z) ; 1 , y 9 z2 : f (z) 9 1 , x 9 y2.

If we now multiply the ¬rst inequality by 9 0, the second inequality by 9 0,

and add the results, we get

f (x) ; f (y) . f (z) : f ( x ; y).

Hence f is convex.

Remark 2.2. The proof of the theorem shows that if at each x in C there exists

a subgradient such that f (x) 9 f (x ) ; 1 , x 9 x 2, for x " x , x in C, then

f is strictly convex on C.

Exercise 2.1. Show that if *f (x ) is not empty, then *f (x ) is closed and convex.

Exercise 2.2. Prove Corollary 1 to Theorem 2.1.

Exercise 2.3. Let f be a convex function de¬ned on a convex set C and let

x + int(C). Show that + *f (x ) if and only if f (x , v) . 1 , v2 for all v in RL.

Exercise 2.4. Let f be convex and concave on RL. Show that f is af¬ne, that

is, f (x) : 1a, x2 ; b.

Exercise 2.5. Let f (x):#x#. Show that *f (0):+ : # #-1,. F (From Theorem

3.1 we will get that, for x " 0, *f (x) : +x/#x#,).

106 CONVEX FUNCTIONS

Exercise 2.6. Let f (x , x ) : "x ; x ". Show that f is convex and calculate

*f ((0, 0)).

3. DIFFERENTIABLE CONVEX FUNCTIONS

We ¬rst show that if f is a convex function and differentiable [in the sense of

relation (5) in Section 11 of Chapter I] at an interior point of its domain, then

there is a unique subgradient at the point and the subgradient is the gradient

vector, and conversely, if f has a unique subgradient at x, then f is differenti-

able at x.

T 3.1. L et f be a convex function de¬ned on a convex set C in RL, and let

x be an interior point of C. T hen f is differentiable at x if and only if f has a

unique subgradient at x. Moreover, the unique element in the subgradient is

f (x).

Proof. Let f be differentiable at x. Since f is convex and x + int(C), by

Theorem 2.1 the subgradient *f (x) is not empty. Let + *f (x). Then for any h

in RL and suf¬ciently small t 9 0

f (x ; th) . f (x) ; 1 , th2.

Since f is differentiable at x, we have that for h " 0

f (x ; th) : f (x) ; 1

f (x), th2 ; (th),

where (th)/t#h# ; 0 as t ; 0. If we subtract the second relation from the ¬rst,

we get

0 . t[1 , h2 9 1

f (x, h2 9 (th)#h#/t#h#].

If we now divide by t 9 0 and then let t ; 0, we get that 1 9

f (x), h2 - 0

for all h " 0 in RL. In particular, if 9

f (x) " 0, then this inequality holds for

h : 9

f (x), whence # 9

f (x)# - 0. Thus, :

f (x).

We now suppose that the subgradient *f (x) consists of a unique element .

We will show that all the ¬rst-order partial derivatives of f exist at x. The

differentiability of f at x will then follow from Theorem 1.4.

To simplify the notation, we assume tha x : 0 and that f (x) : 0. There is

no loss of generality in making this assumption, since this amounts to a

translation of the origin in RL ; R to (x, f (x)). Let us now suppose that at least

one partial derivative of f at 0 does not exist. For de¬niteness let us assume

that it is the partial with respect to x . Let

g(t) : f (0 ; te ),

DIFFERENTIABLE CONVEX FUNCTIONS 107

where e is the ¬rst standard basis vector (1, 0, . . . , 0). Since f is convex and

0 + int(C), the function g is de¬ned and convex on some maximal interval I

containing 0 in its interior. Moreover, *f /*x does not exist at 0 if and only if

g(0) does not exist. Since g is convex, it follows from Corollary 2 to Theorem

1.2 that g (0) and g (0) exist and that g (0) - g (0). Therefore, if g(0) does

\ > \ >

not exist, then g (0) : g (0).

\ >

Let be any number satisfying

g (0) : : g (0). (1)