Motivated by applications in queueing theory, we consider a stochastic
control problem whose state space is the d-dimensional positive orthant. The
controlled process Z evolves as a reflected Brownian motion whose covariance
matrix is exogenously specified, as are its directions of reflection from the
orthant's boundary surfaces. A system manager chooses a drift vector θ(t) at
each time t based on the history of Z, and the cost rate at time t depends on
both Z(t) and θ(t). In our initial problem formulation, the objective is to
minimize expected discounted cost over an infinite planning horizon, after
which we treat the corresponding ergodic control problem. Extending the
earlier work by Han et al. [Han J, Jentzen A, Weinan E (2018) Solving
high-dimensional partial differential equations using deep learning. Proc.
Natl. Acad. Sci. USA 115(34):8505-8510], we develop and illustrate a
simulation-based computational method that relies heavily on deep neural
network technology. For the test problems studied thus far, our method is
accurate to within a fraction of 1% and is computationally feasible in
dimensions up to at least d = 30.