본문으로 건너뛰기

클라우드 네이티브 Observability Part 2 - 마이크로서비스 분산 추적

시리즈 소개

Series Introduction

  1. Part 1: OpenTelemetry Instrumentation
  2. Part 2: 마이크로서비스 분산 추적 (현재 글)
  3. Part 3: 구조화된 로깅과 Correlation ID
  4. Part 4: Prometheus/Grafana로 메트릭과 알림
  5. Part 5: Observability 데이터로 프로덕션 이슈 디버깅
  1. Part 1: OpenTelemetry Instrumentation
  2. Part 2: Microservices Distributed Tracing (this post)
  3. Part 3: Structured Logging and Correlation ID
  4. Part 4: Metrics and Alerting with Prometheus/Grafana
  5. Part 5: Debugging Production Issues with Observability Data

분산 추적이란?

분산 추적은 요청이 여러 서비스를 거쳐가는 전체 경로를 시각화합니다.

What is Distributed Tracing?

Distributed tracing visualizes the entire path of a request as it travels through multiple services.

User Request


┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ API Gateway │────▶│Order Service│────▶│Payment Svc │
│ Span A │ │ Span B │ │ Span C │
└─────────────┘ └──────┬──────┘ └─────────────┘


┌─────────────┐
│Inventory Svc│
│ Span D │
└─────────────┘

Trace Context 구조

W3C Trace Context 표준

Trace Context Structure

W3C Trace Context Standard

traceparent: 00-{trace-id}-{span-id}-{trace-flags}
tracestate: vendor1=value1,vendor2=value2

예시:

traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
tracestate: congo=t61rcWkgMzE
  • trace-id: 전체 트레이스를 식별하는 32자리 hex
  • span-id: 현재 스팬을 식별하는 16자리 hex
  • trace-flags: 01 = sampled

Example:

traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
tracestate: congo=t61rcWkgMzE
  • trace-id: 32-character hex identifying the entire trace
  • span-id: 16-character hex identifying the current span
  • trace-flags: 01 = sampled

실전 분산 추적 구현

멀티 서비스 아키텍처

Practical Distributed Tracing Implementation

Multi-Service Architecture

# docker-compose.yml
version: '3.8'
services:
api-gateway:
build: ./api-gateway
ports:
- "8080:8080"
environment:
- OTEL_SERVICE_NAME=api-gateway
- OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4317

order-service:
build: ./order-service
ports:
- "8081:8081"
environment:
- OTEL_SERVICE_NAME=order-service
- OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4317

payment-service:
build: ./payment-service
ports:
- "8082:8082"
environment:
- OTEL_SERVICE_NAME=payment-service
- OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4317

inventory-service:
build: ./inventory-service
ports:
- "8083:8083"
environment:
- OTEL_SERVICE_NAME=inventory-service
- OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4317

jaeger:
image: jaegertracing/all-in-one:1.53
ports:
- "16686:16686"
- "4317:4317"
environment:
- COLLECTOR_OTLP_ENABLED=true

API Gateway

API Gateway

@RestController
@RequestMapping("/api")
class GatewayController(
private val orderServiceClient: OrderServiceClient,
private val tracer: Tracer
) {
@PostMapping("/orders")
fun createOrder(@RequestBody request: CreateOrderRequest): ResponseEntity<OrderResponse> {
val span = tracer.spanBuilder("gateway.createOrder")
.setSpanKind(SpanKind.SERVER)
.setAttribute("http.method", "POST")
.setAttribute("http.route", "/api/orders")
.startSpan()

return try {
span.makeCurrent().use {
val order = orderServiceClient.createOrder(request)
span.setAttribute("order.id", order.id)
ResponseEntity.created(URI.create("/api/orders/${order.id}")).body(order)
}
} catch (e: Exception) {
span.recordException(e)
span.setStatus(StatusCode.ERROR)
throw e
} finally {
span.end()
}
}
}

Order Service Client (Context 전파)

Order Service Client (Context Propagation)

@Component
class OrderServiceClient(
private val webClient: WebClient,
private val openTelemetry: OpenTelemetry
) {
fun createOrder(request: CreateOrderRequest): OrderResponse {
return webClient.post()
.uri("/orders")
.bodyValue(request)
.headers { headers ->
// Trace Context 주입
openTelemetry.propagators.textMapPropagator.inject(
Context.current(),
headers
) { carrier, key, value ->
carrier?.set(key, value)
}
}
.retrieve()
.bodyToMono(OrderResponse::class.java)
.block()!!
}
}

Order Service

Order Service

@RestController
@RequestMapping("/orders")
class OrderController(
private val orderService: OrderService,
private val tracer: Tracer,
private val openTelemetry: OpenTelemetry
) {
@PostMapping
fun createOrder(
@RequestBody request: CreateOrderRequest,
@RequestHeader headers: HttpHeaders
): ResponseEntity<OrderResponse> {
// 부모 Context 추출
val parentContext = openTelemetry.propagators.textMapPropagator.extract(
Context.current(),
headers
) { carrier, key -> carrier?.getFirst(key) }

val span = tracer.spanBuilder("order.create")
.setParent(parentContext)
.setSpanKind(SpanKind.SERVER)
.startSpan()

return try {
span.makeCurrent().use {
val order = orderService.createOrder(request)
ResponseEntity.ok(OrderResponse(order))
}
} finally {
span.end()
}
}
}

@Service
class OrderService(
private val orderRepository: OrderRepository,
private val paymentClient: PaymentClient,
private val inventoryClient: InventoryClient,
private val tracer: Tracer
) {
@Transactional
fun createOrder(request: CreateOrderRequest): Order {
// 재고 확인
val inventorySpan = tracer.spanBuilder("order.checkInventory")
.setSpanKind(SpanKind.CLIENT)
.startSpan()

try {
inventorySpan.makeCurrent().use {
inventoryClient.checkAndReserve(request.items)
}
} finally {
inventorySpan.end()
}

// 주문 저장
val saveSpan = tracer.spanBuilder("order.save")
.setAttribute("db.system", "postgresql")
.startSpan()

val order = try {
saveSpan.makeCurrent().use {
orderRepository.save(Order.create(request))
}
} finally {
saveSpan.end()
}

// 결제 처리
val paymentSpan = tracer.spanBuilder("order.processPayment")
.setSpanKind(SpanKind.CLIENT)
.startSpan()

try {
paymentSpan.makeCurrent().use {
paymentClient.charge(order.customerId, order.totalAmount)
}
} finally {
paymentSpan.end()
}

return order
}
}

Span 계층 구조

부모-자식 관계

Span Hierarchy Structure

Parent-Child Relationship

Trace: abc123

├── Span A: gateway.createOrder (Root Span)
│ │
│ └── Span B: order.create (Child of A)
│ │
│ ├── Span C: order.checkInventory (Child of B)
│ │ │
│ │ └── Span E: inventory.reserve (Child of C)
│ │
│ ├── Span D: order.save (Child of B)
│ │
│ └── Span F: order.processPayment (Child of B)
│ │
│ └── Span G: payment.charge (Child of F)
@Service
class BatchOrderProcessor(
private val tracer: Tracer
) {
fun processBatch(orders: List<Order>) {
val batchSpan = tracer.spanBuilder("batch.process")
.startSpan()

try {
batchSpan.makeCurrent().use {
orders.parallelStream().forEach { order ->
val orderSpan = tracer.spanBuilder("batch.processOrder")
.addLink(batchSpan.spanContext) // 링크로 연결
.setAttribute("order.id", order.id)
.startSpan()

try {
orderSpan.makeCurrent().use {
processOrder(order)
}
} finally {
orderSpan.end()
}
}
}
} finally {
batchSpan.end()
}
}
}

샘플링 전략

Head-based Sampling

요청 시작 시점에 샘플링 결정:

Sampling Strategies

Head-based Sampling

Sampling decision made at request start:

@Configuration
class SamplingConfig {

@Bean
fun sdkTracerProvider(): SdkTracerProvider {
return SdkTracerProvider.builder()
.setSampler(
Sampler.parentBased(
Sampler.traceIdRatioBased(0.1) // 10% 샘플링
)
)
.build()
}
}

Tail-based Sampling (OTel Collector)

요청 완료 후 샘플링 결정:

Tail-based Sampling (OTel Collector)

Sampling decision made after request completion:

# otel-collector-config.yaml
processors:
tail_sampling:
decision_wait: 10s
num_traces: 100
expected_new_traces_per_sec: 10
policies:
- name: errors-policy
type: status_code
status_code:
status_codes: [ERROR]
- name: slow-traces-policy
type: latency
latency:
threshold_ms: 1000
- name: probabilistic-policy
type: probabilistic
probabilistic:
sampling_percentage: 10

service:
pipelines:
traces:
receivers: [otlp]
processors: [tail_sampling, batch]
exporters: [otlp/jaeger]

Jaeger UI 활용

트레이스 검색

Using Jaeger UI

service=order-service operation=order.create minDuration=100ms

서비스 의존성 그래프

Jaeger UI에서 System Architecture 탭을 통해 서비스 간 의존성을 시각화할 수 있습니다.

성능 분석

  • Critical Path 분석
  • Span 간 시간 비교
  • 병목 지점 식별

Service Dependency Graph

You can visualize service dependencies through the System Architecture tab in Jaeger UI.

Performance Analysis

  • Critical Path analysis
  • Time comparison between Spans
  • Bottleneck identification

Span Attributes 모범 사례

Semantic Conventions

Span Attributes Best Practices

Semantic Conventions

// HTTP 관련
span.setAttribute(SemanticAttributes.HTTP_METHOD, "POST")
span.setAttribute(SemanticAttributes.HTTP_URL, "/api/orders")
span.setAttribute(SemanticAttributes.HTTP_STATUS_CODE, 200)

// Database 관련
span.setAttribute(SemanticAttributes.DB_SYSTEM, "postgresql")
span.setAttribute(SemanticAttributes.DB_OPERATION, "SELECT")
span.setAttribute(SemanticAttributes.DB_STATEMENT, "SELECT * FROM orders WHERE id = ?")

// Messaging 관련
span.setAttribute(SemanticAttributes.MESSAGING_SYSTEM, "kafka")
span.setAttribute(SemanticAttributes.MESSAGING_DESTINATION, "order-events")
span.setAttribute(SemanticAttributes.MESSAGING_OPERATION, "publish")

커스텀 Attributes

Custom Attributes

// 비즈니스 컨텍스트
span.setAttribute("order.id", orderId)
span.setAttribute("customer.tier", "premium")
span.setAttribute("order.item_count", items.size.toLong())
span.setAttribute("order.total_amount", totalAmount.toDouble())

에러 추적

Error Tracking

try {
processOrder(order)
} catch (e: PaymentException) {
span.setStatus(StatusCode.ERROR, "Payment processing failed")
span.recordException(e, Attributes.builder()
.put("exception.escaped", false)
.put("payment.error_code", e.errorCode)
.build()
)
throw e
}

정리

분산 추적의 핵심:

항목설명
Trace ContextW3C 표준으로 서비스 간 컨텍스트 전파
Span 계층부모-자식 관계로 요청 흐름 표현
샘플링Head/Tail 기반으로 비용 최적화
AttributesSemantic Conventions 준수

다음 글에서는 구조화된 로깅과 Correlation ID를 다루겠습니다.

Summary

Key points of distributed tracing:

ItemDescription
Trace ContextContext propagation between services using W3C standard
Span HierarchyRequest flow expressed through parent-child relationships
SamplingCost optimization with Head/Tail-based approaches
AttributesAdherence to Semantic Conventions

In the next post, we will cover structured logging and Correlation ID.