团队接手了一个棘手的遗留项目组合:一个基于Phoenix框架的Elixir服务,负责处理所有WebSocket实时通信;一个庞大的Java monolith,使用MyBatis与一个老旧的Oracle数据库交互,提供核心的REST API;以及一个用TypeScript和React编写的前端单页应用。这三者的部署流程完全独立,充满手动操作,堪称灾难。Phoenix服务需要手动SSH到服务器,拉取代码,执行mix release;Java应用需要运维手动打包WAR包,上传到Tomcat;前端则需要本地npm run build后,将产物FTP到静态服务器。
一次发布,三个团队需要紧密协调,任何一个环节出错都可能导致回滚困难,耗费大量时间。我们的痛点非常明确:没有统一的构建流程、没有版本化的部署单元、环境不一致导致“在我机器上是好的”问题频发、发布过程高度依赖人力且风险极高。
初步构想是利用容器化来抹平技术栈差异。无论底层是BEAM虚拟机、JVM还是Nginx,最终交付物都应该是一个标准的、自包含的Docker镜像。而串联起从代码提交到镜像部署整个流程的最佳粘合剂,无疑是Git。我们的目标是建立一个以Git为唯一事实来源(Single Source of Truth)的自动化交付管道,开发者只需要git push,剩下的事情应该自动、可靠地发生。这便是GitOps的核心思想。
技术选型与架构决策
代码仓库策略:Monorepo
尽管存在争议,但对于我们这个紧密协作的小团队而言,将三个服务的代码放在一个Git仓库(Monorepo)中优势明显。最关键的一点是,跨服务的原子提交成为可能。当一个功能需要同时修改前端和后端API时,一个commit就能包含所有变更,便于追溯和回滚。同时,CI流水线可以配置路径过滤器,只构建和测试发生变化的应用,避免不必要的资源浪费。容器化:Docker
这是将异构应用标准化的不二之选。我们将为Phoenix、MyBatis/Java和TypeScript/React应用分别编写Dockerfile,构建生产就绪的镜像。CI/CD平台:GitHub Actions
团队已经在使用GitHub,其内置的Actions功能足够强大,可以满足我们的CI需求:编译、测试、构建镜像、推送镜像。部署目标:Kubernetes
为了统一管理和编排这些容器化应用,Kubernetes是自然的选择。它提供了服务发现、自动扩缩容、滚动更新等生产环境必需的能力。GitOps工具:ArgoCD
在CI(构建)和CD(部署)之间,我们选择引入ArgoCD。GitHub Actions负责将应用构建成镜像并更新部署清单(Manifests),而ArgoCD则负责监控这些清单的变化,并自动将集群状态同步到清单所描述的期望状态。这实现了CI和CD的解耦,并确保了Kubernetes集群状态的声明式管理。
整体流程设计如下:
sequenceDiagram
participant Dev as Developer
participant AppRepo as App Code Monorepo (Git)
participant CI as GitHub Actions
participant Registry as Docker Registry
participant ManifestRepo as K8s Manifests Repo (Git)
participant CD as ArgoCD
participant K8s as Kubernetes Cluster
Dev->>+AppRepo: git push (feature changes)
AppRepo->>CI: Trigger Workflow
CI->>CI: 1. Checkout & Test
CI->>CI: 2. Build Docker Image (for changed app)
CI->>Registry: 3. Push Image (e.g., my-app:git-sha)
CI->>+ManifestRepo: 4. Update image tag in YAML
ManifestRepo->>-CD: Webhook/Poll notifies changes
CD->>ManifestRepo: Fetch latest manifests
CD->>K8s: Compare desired state vs. actual state
CD->>K8s: Apply changes (Sync)
K8s->>Registry: Pull new Docker image
K8s->>K8s: Perform rolling update
步骤化实现:从代码到集群
1. Monorepo 结构
我们的Git仓库结构如下,所有应用都放置在apps目录下。
.
├── .github/
│ └── workflows/
│ └── ci-pipeline.yml # 统一的CI工作流
├── apps/
│ ├── phoenix-rt-service/ # Phoenix 实时服务
│ │ ├── Dockerfile
│ │ └── ... (mix project files)
│ ├── java-data-api/ # MyBatis Java API 服务
│ │ ├── Dockerfile
│ │ └── ... (maven project files)
│ └── typescript-frontend/ # TypeScript 前端
│ ├── Dockerfile
│ └── ... (package.json, etc.)
└── ...
2. 容器化异构应用
容器化是整个方案的基石。这里的关键是为每个应用创建优化的、生产级的Dockerfile。
a) Phoenix 服务 (apps/phoenix-rt-service/Dockerfile)
在真实项目中,Elixir应用应该使用Releases进行打包,这是一个自包含、可执行的产物。我们使用多阶段构建来减小最终镜像体积。
# Dockerfile for Phoenix/Elixir application using multi-stage builds
# ---- Builder Stage ----
# Use an official Elixir image with a specific OTP version
FROM hexpm/elixir:1.14.5-erlang-25.3.2.4-alpine-3.17.3 AS builder
# Set build arguments
ARG MIX_ENV=prod
ENV MIX_ENV=${MIX_ENV}
# Install build dependencies
RUN apk add --no-cache build-base git python3
WORKDIR /app
# Install Hex and Rebar
RUN mix local.hex --force && \
mix local.rebar --force
# Copy project files
COPY mix.exs mix.lock ./
COPY config config/
# Fetch dependencies
# This layer is cached as long as mix.lock doesn't change
RUN mix deps.get --only prod
RUN mix deps.compile
# Copy the rest of the application source
COPY priv priv/
COPY lib lib/
COPY assets assets/
# Compile the application and build assets
RUN mix assets.deploy
RUN mix compile
# Build the release
# The release is self-contained and includes the Erlang runtime
RUN mix release
# ---- Runner Stage ----
# Use a minimal Alpine image for the final stage
FROM alpine:3.17.3
# Set environment variables
ENV LANG=C.UTF-8
# Install runtime dependencies required by Erlang/Elixir
RUN apk add --no-cache libstdc++ ncurses-libs openssl
# Set the working directory
WORKDIR /app
# Copy the built release from the builder stage
COPY /app/_build/prod/rel/phoenix_rt_service ./
# Expose the application port
EXPOSE 4000
# The command to run the application
# We use `bin/server` which is a convenience script generated by `mix release`.
# It handles secrets and migrations before starting the application.
CMD ["bin/server"]
注记: 这个Dockerfile的核心在于mix release。它将整个应用,包括BEAM虚拟机,打包到一个目录中。最终镜像非常小,因为它不包含Elixir编译器和所有构建时的依赖。CMD ["bin/server"]启动应用,它会自动执行我们在runtime.exs中配置的数据库迁移等任务。
b) MyBatis/Java 服务 (apps/java-data-api/Dockerfile)
同样采用多阶段构建。第一阶段使用包含Maven和JDK的镜像来编译和打包应用,第二阶段只使用JRE来运行,大幅缩小镜像体积。
# Dockerfile for Java/MyBatis application with Maven and multi-stage builds
# ---- Builder Stage ----
# Use an official Maven image which includes JDK
FROM maven:3.8.5-openjdk-11 AS builder
WORKDIR /app
# Copy the pom.xml to leverage Docker layer caching
COPY pom.xml .
# Download dependencies
RUN mvn dependency:go-offline
# Copy the source code
COPY src src/
# Build the application, skipping tests as they should be run in a separate CI step
RUN mvn package -DskipTests
# ---- Runner Stage ----
# Use a slim JRE image for the final stage
FROM openjdk:11-jre-slim
WORKDIR /app
# Copy the JAR file from the builder stage
COPY /app/target/java-data-api-1.0.0.jar ./app.jar
# Expose the application port
EXPOSE 8080
# Environment variables for JVM tuning. These are critical in production.
ENV JAVA_OPTS="-XX:+UseG1GC -Xms512m -Xmx512m -Djava.security.egd=file:/dev/./urandom"
# Command to run the application
# `exec` is used so that the Java process receives signals (like SIGTERM) correctly for graceful shutdown.
ENTRYPOINT ["sh", "-c", "exec java $JAVA_OPTS -jar app.jar"]
注记: JAVA_OPTS是生产环境中的关键。这里设置了G1垃圾收集器和初始/最大堆大小。ENTRYPOINT中使用exec确保Java进程是PID 1,这样它能正确接收Kubernetes发送的终止信号,实现优雅停机。
c) TypeScript/React 前端 (apps/typescript-frontend/Dockerfile)
前端的构建过程是编译TypeScript和打包静态文件,然后用一个轻量级的Web服务器(如Nginx)来托管这些文件。
# Dockerfile for TypeScript/React frontend using multi-stage builds
# ---- Builder Stage ----
# Use an official Node.js image
FROM node:18-alpine AS builder
WORKDIR /app
# Copy package.json and lock file to leverage layer caching
COPY package*.json ./
# Install dependencies
RUN npm install
# Copy the rest of the application source code
COPY . .
# Build the production-ready static files
# The output is typically in a `build` or `dist` directory.
RUN npm run build
# ---- Runner Stage ----
# Use an official Nginx image
FROM nginx:1.23-alpine
# Remove the default Nginx welcome page
RUN rm /etc/nginx/conf.d/default.conf
# Copy our custom Nginx configuration
COPY nginx.conf /etc/nginx/conf.d/
# Copy the static files from the builder stage
COPY /app/build /usr/share/nginx/html
# Expose the Nginx port
EXPOSE 80
# Nginx is managed by the base image's entrypoint, so we just need to start it.
CMD ["nginx", "-g", "daemon off;"]
还需要一个配套的nginx.conf文件,用于处理单页应用的路由问题:
# nginx.conf
server {
listen 80;
location / {
root /usr/share/nginx/html;
index index.html index.htm;
# This is the key for SPAs: if a file is not found, serve index.html
# This allows client-side routing to work correctly.
try_files $uri $uri/ /index.html;
}
# Add other configurations like gzip compression for performance
gzip on;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
}
3. 统一的CI流水线
现在,我们用GitHub Actions来自动化这个构建过程。核心是.github/workflows/ci-pipeline.yml。
name: Unified CI Pipeline for Polyglot Services
on:
push:
branches:
- main
tags:
- 'v*' # Trigger on version tags like v1.2.3
jobs:
build-and-push:
name: Build and Push Docker Images
runs-on: ubuntu-latest
strategy:
# Use a matrix to define the services. This simplifies the job steps.
matrix:
service: [phoenix-rt-service, java-data-api, typescript-frontend]
# Add path filters. A job for a service only runs if its code has changed.
# This is a major optimization for monorepos.
if: |
(github.event_name == 'push' && contains(join(github.event.commits.*.message, ' '), '[ci build all]')) ||
(github.event_name == 'push' && (
(matrix.service == 'phoenix-rt-service' && startsWith(github.ref, 'refs/tags/v') || contains(join(github.event.commits.*.modified, ''), 'apps/phoenix-rt-service/')) ||
(matrix.service == 'java-data-api' && startsWith(github.ref, 'refs/tags/v') || contains(join(github.event.commits.*.modified, ''), 'apps/java-data-api/')) ||
(matrix.service == 'typescript-frontend' && startsWith(github.ref, 'refs/tags/v') || contains(join(github.event.commits.*.modified, ''), 'apps/typescript-frontend/'))
))
steps:
- name: Checkout repository
uses: actions/checkout@v3
with:
fetch-depth: 0 # Needed to get all history for git tags
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Log in to Docker Hub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Generate Docker image tags
id: meta
run: |
# Use Git SHA for pushes to main (staging)
# Use Git tag for version tags (production)
if [[ "${{ github.ref_type }}" == "tag" ]]; then
IMAGE_TAG=${{ github.ref_name }}
else
IMAGE_TAG=${GITHUB_SHA::7}
fi
echo "IMAGE_TAG=${IMAGE_TAG}" >> $GITHUB_ENV
echo "IMAGE_NAME=myorg/${{ matrix.service }}" >> $GITHUB_ENV
- name: Build and push Docker image
uses: docker/build-push-action@v4
with:
context: ./apps/${{ matrix.service }}
push: true
tags: ${{ env.IMAGE_NAME }}:${{ env.IMAGE_TAG }}
cache-from: type=registry,ref=${{ env.IMAGE_NAME }}:buildcache
cache-to: type=registry,ref=${{ env.IMAGE_NAME }}:buildcache,mode=max
update-manifests:
name: Update Kubernetes Manifests
runs-on: ubuntu-latest
needs: build-and-push # This job runs only after all images are successfully built
# Only run this for pushes to main or tag pushes, not on feature branches
if: github.ref == 'refs/heads/main' || startsWith(github.ref, 'refs/tags/v')
steps:
- name: Determine target environment
id: env
run: |
if [[ "${{ github.ref_type }}" == "tag" ]]; then
echo "TARGET_ENV=production" >> $GITHUB_ENV
else
echo "TARGET_ENV=staging" >> $GITHUB_ENV
fi
- name: Checkout manifests repository
uses: actions/checkout@v3
with:
repository: 'my-org/k8s-manifests' # The separate GitOps repo
token: ${{ secrets.MANIFEST_REPO_PAT }} # A Personal Access Token with write access
- name: Update image tags using yq
run: |
# We need to know which images were built in the previous job.
# A more robust solution would pass this information as artifacts.
# For simplicity, we assume we update all services on a main/tag push.
IMAGE_TAG=""
if [[ "${{ github.ref_type }}" == "tag" ]]; then
IMAGE_TAG=${{ github.ref_name }}
else
IMAGE_TAG=${GITHUB_SHA::7}
fi
sudo wget https://github.com/mikefarah/yq/releases/download/v4.30.8/yq_linux_amd64 -O /usr/bin/yq && sudo chmod +x /usr/bin/yq
# Update the YAML file for the target environment
yq e -i '.spec.template.spec.containers[0].image = "myorg/phoenix-rt-service:'"$IMAGE_TAG"'"' ./${{ env.TARGET_ENV }}/phoenix-service.yaml
yq e -i '.spec.template.spec.containers[0].image = "myorg/java-data-api:'"$IMAGE_TAG"'"' ./${{ env.TARGET_ENV }}/java-api.yaml
yq e -i '.spec.template.spec.containers[0].image = "myorg/typescript-frontend:'"$IMAGE_TAG"'"' ./${{ env.TARGET_ENV }}/frontend.yaml
- name: Commit and push changes
run: |
git config --global user.name 'GitHub Actions Bot'
git config --global user.email 'actions-bot@github.com'
git add .
git commit -m "Update image tags for ${{ env.TARGET_ENV }} to ${{ github.sha }}" || echo "No changes to commit"
git push
这里的坑在于:
- Monorepo 路径过滤:
if条件语句是关键,它确保了只有在对应服务的代码被修改时,相关的构建任务才会运行。这在大型Monorepo中能节省大量的CI时间。 - 镜像标签策略: 推送到
main分支的提交使用Git SHA作为镜像标签,部署到staging环境。而创建的Git tag (如v1.2.0) 则作为镜像标签,部署到production环境。这建立了代码版本和部署产物之间的清晰对应关系。 - 更新清单的原子性:
update-manifests任务在所有build-and-push任务成功后才运行,保证了部署的原子性。它检出另一个独立的Git仓库(k8s-manifests),修改其中的YAML文件,然后推送回去。这个推送动作就是触发ArgoCD进行同步的信号。
4. GitOps 部署
我们的 k8s-manifests 仓库结构如下:
.
├── staging/
│ ├── phoenix-service.yaml
│ ├── java-api.yaml
│ └── frontend.yaml
└── production/
├── phoenix-service.yaml
├── java-api.yaml
└── frontend.yaml
一个示例的 staging/java-api.yaml 文件可能如下:
apiVersion: apps/v1
kind: Deployment
metadata:
name: java-data-api-staging
namespace: staging
spec:
replicas: 2
selector:
matchLabels:
app: java-data-api
env: staging
template:
metadata:
labels:
app: java-data-api
env: staging
spec:
containers:
- name: java-data-api
# This image tag is what our CI pipeline updates
image: myorg/java-data-api:a1b2c3d
ports:
- containerPort: 8080
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: staging-db-credentials
key: url
# Production readiness probes are essential
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
在ArgoCD中,我们会创建两个应用:
-
staging-apps:监控k8s-manifests仓库的main分支下的staging/目录。 -
production-apps:监控k8s-manifests仓库的main分支下的production/目录。
当CI流水线更新了staging/目录下的YAML文件并推送到Git,ArgoCD会检测到这个变化,并自动执行kubectl apply,将staging环境中的java-data-api部署更新到a1b2c3d这个版本。整个过程无需人工干预。
方案的局限性与未来展望
这个方案成功地将三个技术栈迥异的应用统一到了一个声明式的、自动化的交付流程中,极大地提升了发布效率和可靠性。然而,它并非银弹。
一个显著的挑战是,随着服务增多,Monorepo的CI流水线可能会变得缓慢。虽然我们使用了路径过滤,但共享的构建环境和复杂的依赖关系仍可能成为瓶颈。未来可能需要引入更智能的构建系统,如Bazel或Nx,来精确地分析依赖图,实现更细粒度的增量构建和测试。
其次,使用简单的脚本(如yq)来更新YAML清单在规模扩大时会变得脆弱。更成熟的方案是使用Kustomize或Helm。CI流水线可以只负责更新kustomization.yaml中的镜像标签或values.yaml中的版本号,由Kustomize或Helm来渲染最终的部署清单,这样能更好地管理不同环境间的配置差异。
最后,这个管道解决了“交付什么”和“如何交付”的问题,但“运行得怎么样”——也就是可观测性——是下一个需要攻克的堡垒。为这三个异构系统建立统一的日志收集、指标监控和分布式追踪体系,将是确保整个系统在生产环境中稳定运行的关键下一步。