Goで真面目にerror handlingと向き合ってみた

f:id:hrk02:20190214211204p:plain — The Go gopher was designed by Renée French and has a CC BY 3.0 license.

メディアドゥでエンジニアをしております、武田です。

弊社では新規サービスのプロダクトをGoで開発しているのですが、必ずと言っていいほど世のGopherたちが頭を悩ませるエラーハンドリングについて、一旦方針を考えてみたので記事にまとめてみました。

きっかけ

新規サービス開発もある程度進み、staging環境で検証を進めようと思いAPIを叩いたところ…

{
  "code": "",
  "message": "Table Hoge.hoge doesn't exist"
}

という如何ともしがたいメッセージと出くわしてしまいました。これでは全くトラブル対応ができない…

今まで目を瞑っていたけれど、ちゃんとエラーハンドリングやろう！

解決課題の設定

Error handlingの方針を決めるために、まず解決したい課題をざっと整理しました。

エラー発生時に、呼び出し先関数でPanicになり、処理が意図しない形で中断するケースがある
エラーがそもそも握りつぶされてしまっていて、適切に処理されていない
エラー内容を見ても、解決のためのアクションがわからない

いくつか派生するケースはあるものの、概ね上の３つのケースが課題となっていました。

error handlingの方針

整理した課題を解決すべく、次のように方針を固めました。

エラーのハンドルのロジックは一箇所に集める
- batch処理 -> main.go
- api -> each handler
errorの内容に応じてリトライをする場合などは状況に応じて処理を書く
errorオブジェクトは発生した箇所でのみ生成する
- 3rd partyのerrorは、wrapしてカスタムエラーを生成する
errorを受け取ったら、contextを追加して呼び出し元に返す
APIに関しては、errorへの対応を取るべきアクターが自社か呼び出し元かを区別できるようにする
- 区別のためのエラーコードを別途用意

errorの大分類

内部エラー
- 自社が対応の責任を負う。
- 呼び出し元にはごめんなさいってメッセージだけ送る。
一般エラー
- 呼び出し元が対応の責任を負う。
- i.e., 指定されたパラメータが足りない、など。
- ただし、呼び出し元が対応しやすい情報を付与すること。

また、これらの内容をAWS CloudWatchなどで処理しやすいよう、JSON形式でログ出力する方針としました。

各方針のサンプル

設定したそれぞれの方針に従い、どのようにコードを実装したのかの例をいくつかご紹介します。

前提

今のプロジェクトでは、logggerとして「zap」を利用しています。 github.com

また、Webサービスにはベースとした「echo」を採用しています。 echo.labstack.com

エラーハンドリング

バッチ処理のエラーハンドリング

func main() {
  err  :=  runJob()
  if err !=  nil {
    logger.Fatal(errors.Wrapf(err, "[main.main] import job failed").Error())
  }
  logger.Info("[main.main] import job succeeded")
}

loggerを用いてerrorの内容を出力するなど、エラーハンドリングの処理はmainにまとめています。

APIのエラーハンドリング

apiに関しては、error handlingの主な処理はcustom contextに任せます。
custom contextはecho(golang web framework)の機能を利用します。

// HogeContext is custom echo context for hoge service
type HogeContext struct {
    echo.Context
}

// ErrorResponse generates custom error response of hoge service
func (c *HogeContext) ErrorResponse(err error) error {
    // is error to be resolved by customer
    ge, ok := errors.Cause(err).(generalError)
    if ok {
        logger.Info(fmt.Sprintf("general error: %s", err.Error()), zap.String("errorCode", ge.Code().String()))
        httpStatus := internal.GetHTTPStatus(ge.Code())
        er := &errorResponse{
            Code:             ge.Code(),
            Errors:           ge.Messages(),
        }
        return c.JSON(httpStatus, er)
    }
    // is internal error
    ie, ok := errors.Cause(err).(internalError)
    if ok && ie.Internal() {
        logger.Warn(fmt.Sprintf("internal error occurred: %s", err.Error()), zap.String("errorCode", internal.InternalError.String()))
    } else {
        // output stack trace to get detail for unexpected/unhandled errors. Basically, this should not happen at production.
        logger.Warn(fmt.Sprintf("unexpected error occurred: %+v", err), zap.String("errorCode", internal.UnHandledError.String()))
    }
    return c.JSON(http.StatusInternalServerError, &errorResponse{
        Code:             internal.InternalError,
        Errors:           []string{"We are very sorry, internal error occurred. We will start investigation immediately."},
    })
}

Goのウェブフレームワークであるechoの機能で、custom contextを作成することができます。
HogeContextはそのcustom contextを表すstruct
HogeContext.ErrorResponseにてerrの種類を判別し、APIレスポンスの生成、ログの出力も行う

大雑把に言えば、

一般エラー...APIレスポンスのコード、エラーメッセージをエラーから取得。エラー内容はログにInfoで出力
内部エラー...APIレスポンスのコード、エラーメッセージは内部エラー用のものを利用。エラー内容をログにWarnで出力し、開発チームが即対応できるようにzapで該当のコードも出力する

としています。

一般エラーと内部エラーはinterfaceを利用して呼び分けたいので、以下のようなinterfaceの実装を設けています。

// errorResponse define struct hold data
// response error for client
type errorResponse struct {
    Code             internal.ErrorCode `json:"code"`
    Errors           []string           `json:"errors"`
}

// generalError interface should be implemented by errors that are to be handled by customers
type generalError interface {
    // Code return internal.ErrorCode to help customers figure out the abstract of the error
    Code() internal.ErrorCode
    // Messages returns error details to be shown to customers
    Messages() []string
}

// internalError interface should be implemented by errors that should be handled by service provider.
// If there will be any necessity for categorization of internalErrors,
// i.e. automatic alert to different teams depending on error details,
// `func Code() internal.ErrorCode` should be added to this interface at that time.
type internalError interface {
    // Implementation should simply be "return true"
    Internal() bool
}

エラーコードに関して、一般エラー発生時にどのHTTP Statusを利用すれば良いかはマッピングを別途作成することで、エラー生成時はHTTP Statusのことまで考えずに、適切なエラーコードを返すことだけを考えれば良いようにしています。

// ErrorCode is error type to be used in hoge service api.
type ErrorCode string

func (ec ErrorCode) String() string {
    return string(ec)
}

// Error codes managed by us.
const (
    AuthenticationParamMissing ErrorCode = "0"
    AuthenticationFailure      ErrorCode = "1"
    InvalidParameter           ErrorCode = "2"
    InternalError              ErrorCode = "3"

    // Error codes for internal error
    UnHandledError ErrorCode = "999"
)

var codeStatusMap = map[ErrorCode]int{
    AuthenticationFailure:      http.StatusForbidden,
    AuthenticationParamMissing: http.StatusBadRequest,
    InvalidParameter:           http.StatusBadRequest,
    InternalError:              http.StatusInternalServerError,
}

// GetHTTPStatus returns http status that corresponds to the given ErrorCode.
func GetHTTPStatus(code ErrorCode) int {
    return codeStatusMap[code]
}

HogeContextは下記のようにhandlerで利用されます。

func (h *userHandler) user(ec echo.Context) error {
    c := ec.(*context.HogeContext)
    user, err := h.getUser(c)
    if err != nil {
        return c.ErrorResponse(err)
    }
    return c.SuccessResponse(user)
}

各handlerでerrorをcontextに渡すことで、error handlingの処理を書かなくてよくなります。言い換えると、errを生成する処理はcontextが適切なerror handlingを出来るように必要なinterfaceを実装する必要があります。

error生成

rdb層では、rdbのライブラリのgormから返されたerrorを用いてカスタムエラーを生成しています。

// FindByID will return user with given userID
func (r *userRepoImpl) FindByID(id string) (*User, error) {
    entity := User{}
    result := r.db.Where("user_id = ?", id).First(&entity)

    if result.RecordNotFound() {
        return nil, nil
    }
    if result.Error != nil {
        return nil, &rdbError{
            message:       fmt.Sprintf("[rdb.FindByID] failed to get user for userID '%s' from db", id),
            originalError: result.Error,
        }
    }
    return &entity, nil
}

gormから返されたerrをそのまま上に返すのではなく、rdbErrorのフィールドに代入して返す。こうすることにより、error handlingを行うロジックがgormのエラー型に依存することなく処理を書くことができ、同時に元のエラーの情報も残すことが出来る様になります。

rdbErrorはrdbパッケージ内で宣言されたカスタムエラーのstructで、下記のような実装を行なっています。

package rdb

// use rdbError for errors that are created by gorm or other rdb libraries
type rdbError struct {
    // custom message with context information, i.e., "[rdb.AddHoge] failed to add Hoge with hogeID hogehoge"
    message string
    // store original error created by libraries
    originalError error
}

func (e *rdbError) Internal() bool {
    return true
}

func (e *rdbError) Error() string {
    return e.message + ":" + e.originalError.Error()
}

rdbErrorは内部エラーとして扱いたいので、Internalを実装しています。

errorの伝播

呼び出したfuncから返されたerrorを更に上に返す際には、以下のようにコンテキストを付与します。

func  createUser(user *model.User) error {
  ur  := rdb.NewStoreRepository(infra.RDB)
  err  := ur.addUser(user)
  if err !=  nil {
    return errors.Wrapf(err, "[usecase.createUser] failed to create user '%s'", user.String())
  }
  return nil

ur.addUserから返ってきたerrに補足情報を加えて、また上にerrを返すようにしています。

errorの伝播に関しては、以下のような取り決めにしています。

errors.Wrap({originalError}, {contextInfo})またはerrors.Wrapf({originalError}, {contextInfo}, {format args...})を利用してerrorを返す
errors.Wrapfを使う場合に、引数が渡ってくる箇所は'で囲む。 i.e., '%s'
contextInfoは[{package}.{function name}]から始める
contextInfoには、そのfunc内で伝えられる情報を付与する。 i.e., 取得に失敗したID, etc.
contextInfoに:は使わない。 errors.Wrapの仕様で出力される:と区別が付きにくくなることを避けるため

コンテキストの付与に関して

なんでそんなこと考えなきゃいけないの？に関してはこの記事が詳しいです。

Golangのエラー処理とpkg/errors

簡単に言うと、発生したerrorをただただ上に投げていくだけだと最終的にno such file or directoryの文言のみを受け取ることとなり、ログからはerrorの原因を全く追えなくなってしまう、という問題に直面します。なので、failed to read "/hoge/fuga.txt": no such file or directoryとなるとエラーが追いやすくなる、というものです。

まとめ

こんな施策のおかげで、なんとかステージング環境でトラブル対応を進めていくことができるようになりましたし、若手エンジニアの多いチームですが「こんなログを追加したい」といったアイデアも出てくるようになり、少しずつですが改善の流れも出てくるようになってきました。まだまだやることは多いですが、Goの知見も蓄えつつ、本番リリースに向けて引き続き邁進していきます！