官方文档：https://kubernetes.io/

中文文档: http://docs.kubernetes.org.cn/

GitHub: https://github.com/kubernetes/kubernetes

[TOC]

API

集群外部:

1.使用kubectl proxy

1	kubectl proxy --port=8080

2.使用curl http方式访问

1	curl http://localhost:8080/api/

3.使用curl https方式访问,认证方式为token

APISERVER=$(kubectl config view | grep server | cut -f 2- -d ":" | tr -d " ")

TOKEN=$(kubectl describe secret $(kubectl get secrets | grep default | cut -f1 -d ' ') | grep -E '^token' | cut -f2 -d':' | tr -d '\t')

curl  --header "Authorization: Bearer $TOKEN" --insecure $APISERVER/api/v1/nodes

4.使用kubectl https方式访问,认证方式为证书,双向认证

#cd /etcd/kubernetes/ssl

#openssl genrsa -out pwm.key 2048

#openssl req -new -key pwm.key -out pwm.csr -subj "/CN=pwm"

#openssl x509 -req -days 365 -in pwm.csr -CA ca.pem -CAkey ca-key.pem -CAcreateserial -out pwm.crt

#创建用户pwm,已经想要的role和rolebongding

# kubectl --server=https://192.168.61.100:6443 \

--certificate-authority=ca.pem  \

--client-certificate=pwm.crt \

--client-key=pwm.key \


kubectl get nodes

集群内部:

1.在使用了service account的pod内部

1
2
3

curl -k -v -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://kubernetes.default/api/v1/namespaces/

curl --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.cem -v -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://kubernetes.default/api/v1/namespaces/

ConfigMap

ConfigMap是用来存储配置文件的kubernetes资源对象，所有的配置内容都存储在etcd中。

官方示例：https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/

创建ConfigMap

创建ConfigMap的方式有4种：

通过直接在命令行中指定configmap参数创建，即--from-literal
通过指定文件创建，即将一个配置文件创建为一个ConfigMap--from-file=<文件>
通过指定目录创建，即将一个目录下的所有配置文件创建为一个ConfigMap，--from-file=<目录>
事先写好标准的configmap的yaml文件，然后kubectl create -f 创建

通过命令行参数`--from-literal`创建

创建命令：

1	kubectl create configmap test-config1 --from-literal=db.host=10.5.10.116 --from-listeral=db.port='3306'

指定文件创建

配置文件app.properties的内容：

1
2
3

[mysqld]
port=3306
socket=......

创建命令（可以有多个--from-file）：

1	kubectl create configmap test-config2 --from-file=./app.properties1

假如不想configmap中的key为默认的文件名，还可以在创建时指定key名字：

1	kubectl create configmap game-config-3 --from-file=<my-key-name>=<path-to-file>

指定目录创建

configs 目录下的config-1和config-2内容如下所示：

config-1:
aaaaa
bbbbb
ccccc

config-2:
ddddd
eeeee

创建命令：

1	kubectl create configmap test-config3 --from-file=./configs1

可以看到指定目录创建时configmap内容中的各个文件会创建一个key/value对，key是文件名，value是文件内容。

那假如目录中还包含子目录呢？继续做实验：
在上一步的configs目录下创建子目录subconfigs，并在subconfigs下面创建两个配置文件，指定目录configs创建名为test-config4的configmap:

1	kubectl create configmap test-config4 --from-file=./configs1

结果说明指定目录时只会识别其中的文件，忽略子目录

通过事先写好configmap的标准yaml文件创建

yaml文件如图所示：

注意其中一个key的value有多行内容时的写法

apiVersion: v1
kind: ConfigMap
metadata:
  name: test-cfg
  namespace: default
data:
  cache_host: memcached-gcxt
  cache_port: "11211"
  cache_prefix: gcxt
  my.cnf: |
    [mysqld]
    log-bin = mysql-bin
  app.properties: |
    property.1 = value-1
    property.2 = value-2
    property.3 = value-3

创建：

1	kubectl create -f test-cfg.yml

使用ConfigMap

使用ConfigMap有三种方式:

第一种是通过环境变量的方式，直接传递给pod
第二种是通过在pod的命令行下运行的方式(启动命令中)
第三种是作为volume的方式挂载到pod内

通过环境变量使用

使用valueFrom.configMapKeyRef name、key指定要用的key:

apiVersion: v1
kind: Pod
metadata:
  name: dapi-test-pod
spec:
  containers:
    - name: test-container
      image: k8s.gcr.io/busybox
      command: [ "/bin/sh", "-c", "env" ]
      env:
        - name: SPECIAL_LEVEL_KEY
          valueFrom:
            configMapKeyRef:
              name: special-config
              key: special.how
        - name: LOG_LEVEL
          valueFrom:
            configMapKeyRef:
              name: env-config
              key: log_level
  restartPolicy: Never

还可以通过envFrom.configMapRef、name使得configmap中的所有key/value对都自动变成环境变量

apiVersion: v1
kind: Pod
metadata:
  name: dapi-test-pod
spec:
  containers:
    - name: test-container
      image: k8s.gcr.io/busybox
      command: [ "/bin/sh", "-c", "env" ]
      envFrom:
      - configMapRef:
          name: special-config
  restartPolicy: Never

在启动命令中引用

在命令行下引用时，需要先设置为环境变量，之后可以通过$(VAR_NAME)设置容器启动命令的启动参数：

apiVersion: v1
kind: Pod
metadata:
  name: dapi-test-pod
spec:
  containers:
    - name: test-container
      image: k8s.gcr.io/busybox
      command: [ "/bin/sh", "-c", "echo $(SPECIAL_LEVEL_KEY) $(SPECIAL_TYPE_KEY)" ]
      env:
        - name: SPECIAL_LEVEL_KEY
          valueFrom:
            configMapKeyRef:
              name: special-config
              key: SPECIAL_LEVEL
        - name: SPECIAL_TYPE_KEY
          valueFrom:
            configMapKeyRef:
              name: special-config
              key: SPECIAL_TYPE
  restartPolicy: Never

作为volume挂载使用

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-configmap
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx-configmap
    spec:
      containers:
      - name: nginx-configmap
        image: nginx
        ports:
        - containerPort: 80
        volumeMounts:     
        - name: config-volume4
          mountPath: /tmp/config4   
      volumes:
      - name: config-volume4
        configMap:
          name: test-config

这样会在在/tmp/config4文件夹下以每一个key为文件名value为值创建了多个文件

注意：如果原有/tmp/config4目录下有文件或者文件夹，将会全部覆盖掉，只留有test-config中的key文件

假如不想以key名作为配置文件名可以引入items 字段，在其中逐个指定要用相对路径path替换的key：

volumes:
 - name: config-volume4
   configMap:
     name: test-config4
     items:
     - key: my.cnf
       path: path/to/mysql-key #最终生成/tmp/config4/path/to/mysql-key文件
     - key: cache_host
       path: cache-host

备注：

删除configmap后原pod不受影响；然后再删除pod后，重启的pod的events会报找不到cofigmap的volume；
pod起来后再通过kubectl edit configmap …修改configmap，过一会pod内部的配置也会刷新。
在容器内部修改挂进去的配置文件后，过一会内容会再次被刷新为原始configmap内容

深度解析mountPath,subPath,key,path的关系和作用

结论：

kubernetes key (pod.spec.volums[0].configMap.items[0].key)用于指定configMap中的哪些条目可用于挂载

kubernetes path (pod.spec.volums[0].configMap.items[0].path)用于将key重命名

kubernetes suPath (pod.spec.containers[0].volumeMounts.subPath)决定容器中有无挂载（按名字从key，有path时以path为主，中比对是否存在要的条目）

kubernetes mountPath (pod.spec.containers[0].volumeMounts.mountPath)决定容器中挂载的结果文件名

无subPath时：

mountPath为文件夹，其下挂载ConfigMap中的文件，key为文件名，value为文件内容，并且会覆盖掉原有mountPath中的内容。如果此时指定了vloumes.configMap.items[0].path，则path条目用于将key重命名

有subPath时：

subPath匹配为true时，mountPath为文件名

subPath匹配为false时，mountPath为文件夹名

mountPath结合subPath作用

有subPath时且subPath推荐筛选结果为true，mountPath指定到文件名

[root@k8s-master k8s-objs]# cat pod-configmap-testvolume.yaml
apiVersion: v1
kind: Pod
metadata:
  labels:
    purpose: test-configmap-volume
  name: testvolume
spec:
  containers:
    - name: test-configmap-volume
      image: tomcat:8
      imagePullPolicy: IfNotPresent
      #command: [ "/bin/sh", "-c", "echo $(MY_CACHE_HOST)" ]
      volumeMounts:
        - name: config-volume
          mountPath: /etc/config/app.properties #此处配合suPath使用时，app.properties为文件名，即pod容器中只生成了/etc/config目录，目录之下为文件，只有一个名为app.properties的文件（subPath筛选只挂载app.properties文件）
          subPath: app.properties
  volumes:
    - name: config-volume
      configMap:
         name: test-cfg
         items:
           - key: cache_host
             path: path/to/special-key-cache
           - key: app.properties
             path: app.properties

进入容器查看：

[root@k8s-master k8s-objs]# kubectl exec -it testvolume /bin/bash
root@testvolume:/usr/local/tomcat# cd /etc/config
root@testvolume:/etc/config# ls -l
total 4
-rw-r--r-- 1 root root 63 Apr 12 01:59 app.properties
root@testvolume:/etc/config# cat app.properties 
property.1 = value-1
property.2 = value-2
property.3 = value-3

有subPath但筛选结果为false,

容器中生成一个空目录/etc/config/app.properties，无文件

subPath筛选范围优先级为pod.spec.volums[0].configMap.items[0].path>pod.spec.volums[0].configMap.items[0].key>configMap.key，本例中为path,即在path指定的条目【“cache_host”,”app-properties “注意中间是横杠不是点】找是否有subPath项“app.properties”注意中间为点，查找结果为false,所以无文件挂载。容器将“/etc/config/app.properties”当成一个待创建的路径。

[root@k8s-master k8s-objs]# vi pod-configmap-testvolume.yaml
apiVersion: v1
kind: Pod
metadata:
  labels:
    purpose: test-configmap-volume
  name: testvolume
spec:
  containers:
    - name: test-configmap-volume
      image: tomcat:8
      imagePullPolicy: IfNotPresent
      #command: [ "/bin/sh", "-c", "echo $(MY_CACHE_HOST)" ]
      volumeMounts:
        - name: config-volume
          mountPath: /etc/config/app.properties
          subPath: app.properties
  volumes:
    - name: config-volume
      configMap:
         name: test-cfg
         items:
           - key: cache_host
             path: path/to/special-key-cache
           - key: app.properties
             path: app-properties #此处path相当于更改文件名mv app.properties app-properties

进入容器查看：

[root@k8s-master k8s-objs]# kubectl exec -it testvolume /bin/bash
root@testvolume:/usr/local/tomcat# cd /etc/config
root@testvolume:/etc/config# ls -l
total 0
drwxrwxrwx 2 root root 6 Apr 12 02:11 app.properties
root@testvolume:/etc/config# cat app.properties
cat: app.properties: Is a directory
root@testvolume:/etc/config# cd app.properties
root@testvolume:/etc/config/app.properties# ls #此目录下为空

无 subPath,path相当于重命名

[root@k8s-master k8s-objs]# cat pod-configmap-testvolume.yaml
apiVersion: v1
kind: Pod
metadata:
  labels:
    purpose: test-configmap-volume
  name: testvolume
spec:
  containers:
    - name: test-configmap-volume
      image: tomcat:8
      imagePullPolicy: IfNotPresent
      #command: [ "/bin/sh", "-c", "echo $(MY_CACHE_HOST)" ]
      volumeMounts:
        - name: config-volume
          mountPath: /etc/config/app.properties ##此处app.properties为目录
          #subPath: app.properties
  volumes:
    - name: config-volume
      configMap:
         name: test-cfg
         items:
           - key: cache_host
             path: path/to/special-key-cache
           - key: app.properties
             path: app-properties #此处path相当于重命名

进入容器内查看

[root@k8s-master k8s-objs]# kubectl exec -it testvolume /bin/bash
root@testvolume:/usr/local/tomcat# cd /etc/config
root@testvolume:/etc/config# ls -l
total 0
drwxrwxrwx 3 root root 93 Apr 12 02:20 app.properties
root@testvolume:/etc/config# cd app.properties
root@testvolume:/etc/config/app.properties# ls -l
total 0
lrwxrwxrwx 1 root root 21 Apr 12 02:20 app-properties -> ..data/app-properties
lrwxrwxrwx 1 root root 11 Apr 12 02:20 path -> ..data/path
root@testvolume:/etc/config/app.properties# cat app-properties 
property.1 = value-1
property.2 = value-2
property.3 = value-3
root@testvolume:/etc/config/app.properties# cat path
cat: path: Is a directory
root@testvolume:/etc/config/app.properties# cd path
root@testvolume:/etc/config/app.properties/path# cd to
root@testvolume:/etc/config/app.properties/path/to# ls -l
total 4
-rw-r--r-- 1 root root 9 Apr 12 02:20 special-key-cache
root@testvolume:/etc/config/app.properties/path/to# cat special-key-cache 
mysql-k8s

有subPath且筛选结果为true,mouthPath指定文件名，可以和subPath不一样

subPath决定有无，mountPath决定文件名

[root@k8s-master k8s-objs]# vi pod-configmap-testvolume.yaml
apiVersion: v1
kind: Pod
metadata:
  labels:
    purpose: test-configmap-volume
  name: testvolume
spec:
  containers:
    - name: test-configmap-volume
      image: tomcat:8
      imagePullPolicy: IfNotPresent
      #command: [ "/bin/sh", "-c", "echo $(MY_CACHE_HOST)" ]
      volumeMounts:
        - name: config-volume
          mountPath: /etc/config/z.txt #subPath决定有无，mountPath决定文件名为z.txt
          subPath: app-properties
  volumes:
    - name: config-volume
      configMap:
         name: test-cfg
         items:
           - key: cache_host
             path: path/to/special-key-cache
           - key: app.properties
             path: app-properties

进入容器查看：

[root@k8s-master k8s-objs]# kubectl exec -it testvolume /bin/bash
root@testvolume:/usr/local/tomcat# cd /etc/config
root@testvolume:/etc/config# ls
z.txt
root@testvolume:/etc/config# pwd
/etc/config
root@testvolume:/etc/config# cat z.txt 
property.1 = value-1
property.2 = value-2
property.3 = value-3

configmap的热更新研究

更新 ConfigMap 后：

使用该 ConfigMap 挂载的 Env 不会同步更新
使用该 ConfigMap 挂载的 Volume 中的数据需要一段时间（实测大概10秒）才能同步更新

ENV 是在容器启动的时候注入的，启动之后 kubernetes 就不会再改变环境变量的值，且同一个 namespace 中的 pod 的环境变量是不断累加的，参考 Kubernetes中的服务发现与docker容器间的环境变量传递源码探究。为了更新容器中使用 ConfigMap 挂载的配置，可以通过滚动更新 pod 的方式来强制重新挂载 ConfigMap，也可以在更新了 ConfigMap 后，先将副本数设置为 0，然后再扩容。

ServiceAccount

Service account是为了方便Pod里面的进程调用Kubernetes API或其他外部服务而设计的。它与User account不同

User account是为人设计的，而service account则是为Pod中的进程调用Kubernetes API而设计；
User account是跨namespace的，而service account则是仅局限它所在的namespace；
每个namespace都会自动创建一个default service account
Token controller检测service account的创建，并为它们创建secret
开启ServiceAccount Admission Controller后
- 每个Pod在创建后都会自动设置spec.serviceAccount为default（除非指定了其他ServiceAccout）
- 验证Pod引用的service account已经存在，否则拒绝创建
- 如果Pod没有指定ImagePullSecrets，则把service account的ImagePullSecrets加到Pod中
- 每个container启动后都会挂载该service account的token和ca.crt到/var/run/secrets/kubernetes.io/serviceaccount/

$ kubectl exec nginx-3137573019-md1u2 ls /run/secrets/kubernetes.io/serviceaccount
ca.crt
namespace
token

创建Service Account

$ kubectl create serviceaccount jenkins
serviceaccount "jenkins" created
$ kubectl get serviceaccounts jenkins -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  creationTimestamp: 2017-05-27T14:32:25Z
  name: jenkins
  namespace: default
  resourceVersion: "45559"
  selfLink: /api/v1/namespaces/default/serviceaccounts/jenkins
  uid: 4d66eb4c-42e9-11e7-9860-ee7d8982865f
secrets:
- name: jenkins-token-l9v7v

自动创建的secret：

kubectl get secret jenkins-token-l9v7v -o yaml
apiVersion: v1
data:
  ca.crt: (APISERVER CA BASE64 ENCODED)
  namespace: ZGVmYXVsdA==
  token: (BEARER TOKEN BASE64 ENCODED)
kind: Secret
metadata:
  annotations:
    kubernetes.io/service-account.name: jenkins
    kubernetes.io/service-account.uid: 4d66eb4c-42e9-11e7-9860-ee7d8982865f
  creationTimestamp: 2017-05-27T14:32:25Z
  name: jenkins-token-l9v7v
  namespace: default
  resourceVersion: "45558"
  selfLink: /api/v1/namespaces/default/secrets/jenkins-token-l9v7v
  uid: 4d697992-42e9-11e7-9860-ee7d8982865f
type: kubernetes.io/service-account-token

添加ImagePullSecrets

apiVersion: v1
kind: ServiceAccount
metadata:
  creationTimestamp: 2015-08-07T22:02:39Z
  name: default
  namespace: default
  selfLink: /api/v1/namespaces/default/serviceaccounts/default
  uid: 052fb0f4-3d50-11e5-b066-42010af0d7b6
secrets:
- name: default-token-uudge
imagePullSecrets:
- name: myregistrykey

授权

Service Account为服务提供了一种方便的认证机制，但它不关心授权的问题。可以配合RBAC来为Service Account鉴权：

配置–authorization-mode=RBAC和–runtime-config=rbac.authorization.k8s.io/v1alpha1
配置–authorization-rbac-super-user=admin
定义Role、ClusterRole、RoleBinding或ClusterRoleBinding

比如

# This role allows to read pods in the namespace "default"
kind: Role
apiVersion: rbac.authorization.k8s.io/v1alpha1
metadata:
  namespace: default
  name: pod-reader
rules:
  - apiGroups: [""] # The API group "" indicates the core API Group.
    resources: ["pods"]
    verbs: ["get", "watch", "list"]
    nonResourceURLs: []
---
# This role binding allows "default" to read pods in the namespace "default"
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1alpha1
metadata:
  name: read-pods
  namespace: default
subjects:
  - kind: ServiceAccount # May be "User", "Group" or "ServiceAccount"
    name: default
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

示例

#定义namespace：test
cat >> test.yaml << EOF
apiVersion: v1
kind: Namespace
metadata:
    name: test
     labels:
         name: test

1 2	#创建namespace：test kubectl create -f ./test.yaml

#查看命名空间test的sa
kubectl get sa -n test
NAME      SECRETS   AGE
default   1         3h
##说明：
（1）如果kubernetes开启了ServiceAccount（–admission_control=…,ServiceAccount,… ）那么会在每个namespace下面都会创建一个默认的default的sa。如上命令查看的default ！
（2）ServiceAccount默认是开启的。

#查看命名空间test生成的default
kubectl get sa default -o yaml -n test
apiVersion: v1
kind: ServiceAccount
metadata:
    creationTimestamp: 2018-05-31T06:21:10Z
    name: default
    namespace: test
    resourceVersion: "45560"
    selfLink: /api/v1/namespaces/test/serviceaccounts/default
    uid: cf57c735-649a-11e8-adc5-000c290a7d06
secrets:
- name: default-token-ccf9m
##说明：
（1）当用户再该namespace下创建pod的时候都会默认使用这个sa；
（2）每个Pod在创建后都会自动设置spec.serviceAccount为default（除非指定了其他ServiceAccout）；
（3）每个container启动后都会挂载对应的token和ca.crt到/var/run/secrets/kubernetes.io/serviceaccount/。

#创建deploy
cat >> nginx_deploy.yaml << EOF
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
    name: nginx-test
    namespace: test
spec:
    replicas: 2
    template:
        metadata:
            labels:
                app: nginx
        spec:
            containers:
            - name: nginx
                image: nginx:1.7.9
                ports:
                - containerPort: 80

#查看生成的Pods
kubectl get po -n test
NAME                          READY     STATUS    RESTARTS   AGE
nginx-test-75675f5897-7l5bc   1/1       Running   0          1h
nginx-test-75675f5897-b7pcn   1/1       Running   0          1h

#查看其中一个Pod的详细信息，如：nginx-test-75675f5897-7l5bc
kubectl describe po nginx-test-75675f5897-7l5bc -n test
##其中default-token-ccf9m，请留意！
Environment:    <none>
Mounts:
  /var/run/secrets/kubernetes.io/serviceaccount from default-token-ccf9m (ro)
Conditions:
Type           Status
Initialized    True
Ready          True
PodScheduled   True
Volumes:
default-token-ccf9m:
Type:        Secret (a volume populated by a Secret)
SecretName:  default-token-ccf9m
##说明：
（1）每个Pod在创建后都会自动设置spec.serviceAccount为default（除非指定了其他ServiceAccout）；
（2）每个container启动后都会挂载对应的token和ca.crt到/var/run/secrets/kubernetes.io/serviceaccount/。

#进入其中一个Pod的容器内，如：nginx-test-75675f5897-7l5bc
kubectl exec -it nginx-test-75675f5897-7l5bc  /bin/bash --namespace=test
##在容器内执行：
ls -l  /var/run/secrets/kubernetes.io/serviceaccount/
lrwxrwxrwx 1 root root 13 May 31 08:15 ca.crt -> ..data/ca.crt
lrwxrwxrwx 1 root root 16 May 31 08:15 namespace -> ..data/namespace
lrwxrwxrwx 1 root root 12 May 31 08:15 token -> ..data/token
##说明：
可以看到已将ca.crt 、namespace和token放到容器内了，那么这个容器就
可以通过https的请求访问apiserver了。

手工创建ServiceAccount

#编辑heapster_test.yaml文件
cat >> heapster_test.yaml <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
    name: heapster
    namespace: test
EOF

1
2
3

#创建Service Account：heapster
kubectl create -f heapster_test.yaml
serviceaccount "heapster" created

#查看Service Account：heapster
kubectl get sa -o yaml -n test
##主要内容如下：
    secrets:
    - name: heapster-token-7xrlg

Service

概念

　运行在Pod中的应用是向客户端提供服务的守护进程，比如，nginx、tomcat、etcd等等，它们都是受控于控制器的资源对象，存在生命周期，我们知道Pod资源对象在自愿或非自愿终端后，只能被重构的Pod对象所替代，属于不可再生类组件。而在动态和弹性的管理模式下，Service为该类Pod对象提供了一个固定、统一的访问接口和负载均衡能力。

　　其实，就是说Pod存在生命周期，有销毁，有重建，无法提供一个固定的访问接口给客户端。并且为了同类的Pod都能够实现工作负载的价值，由此Service资源出现了，可以为一类Pod资源对象提供一个固定的访问接口和负载均衡，类似于阿里云的负载均衡或者是LVS的功能。

　　但是要知道的是，Service和Pod对象的IP地址，一个是虚拟地址，一个是Pod IP地址，都仅仅在集群内部可以进行访问，无法接入集群外部流量。而为了解决该类问题的办法可以是在单一的节点上做端口暴露（hostPort）以及让Pod资源共享工作节点的网络名称空间（hostNetwork）以外，还可以使用NodePort或者是LoadBalancer类型的Service资源，或者是有7层负载均衡能力的Ingress资源。

　　Service是Kubernetes的核心资源类型之一，Service资源基于标签选择器将一组Pod定义成一个逻辑组合，并通过自己的IP地址和端口调度代理请求到组内的Pod对象，如下图所示，它向客户端隐藏了真是的，处理用户请求的Pod资源，使得从客户端上看，就像是由Service直接处理并响应一样，是不是很像负载均衡器呢！

　　Service对象的IP地址也称为Cluster IP，它位于为Kubernetes集群配置指定专用的IP地址范围之内，是一种虚拟的IP地址，它在Service对象创建之后保持不变，并且能够被同一集群中的Pod资源所访问。Service端口用于接受客户端请求，并将请求转发至后端的Pod应用的相应端口，这样的代理机制，也称为端口代理，它是基于TCP/IP 协议栈的传输层。

Service的实现模型

　　在 Kubernetes 集群中，每个 Node 运行一个 kube-proxy 进程。kube-proxy 负责为 Service 实现了一种 VIP（虚拟 IP）的形式，而不是 ExternalName 的形式。在 Kubernetes v1.0 版本，代理完全在 userspace。在 Kubernetes v1.1 版本，新增了 iptables 代理，但并不是默认的运行模式。从 Kubernetes v1.2 起，默认就是 iptables 代理。在Kubernetes v1.8.0-beta.0中，添加了ipvs代理。在 Kubernetes v1.0 版本，Service 是 “4层”（TCP/UDP over IP）概念。在 Kubernetes v1.1 版本，新增了 Ingress API（beta 版），用来表示 “7层”（HTTP）服务。

kube-proxy 这个组件始终监视着apiserver中有关service的变动信息，获取任何一个与service资源相关的变动状态，通过watch监视，一旦有service资源相关的变动和创建，kube-proxy都要转换为当前节点上的能够实现资源调度规则（例如：iptables、ipvs）

userspace代理模式

这种模式，当客户端Pod请求内核空间的service iptables后，把请求转到给用户空间监听的kube-proxy 的端口，由kube-proxy来处理后，再由kube-proxy将请求转给内核空间的 service ip，再由service iptalbes根据请求转给各节点中的的service pod。

　　由此可见这个模式有很大的问题，由客户端请求先进入内核空间的，又进去用户空间访问kube-proxy，由kube-proxy封装完成后再进去内核空间的iptables，再根据iptables的规则分发给各节点的用户空间的pod。这样流量从用户空间进出内核带来的性能损耗是不可接受的。在Kubernetes 1.1版本之前，userspace是默认的代理模型。

iptables代理模式

　　客户端IP请求时，直接请求本地内核service ip，根据iptables的规则直接将请求转发到到各pod上，因为使用iptable NAT来完成转发，也存在不可忽视的性能损耗。另外，如果集群中存在上万的Service/Endpoint，那么Node上的iptables rules将会非常庞大，性能还会再打折扣。iptables代理模式由Kubernetes 1.1版本引入，自1.2版本开始成为默认类型。

ipvs代理模式

Kubernetes自1.9-alpha版本引入了ipvs代理模式，自1.11版本开始成为默认设置。客户端IP请求时到达内核空间时，根据ipvs的规则直接分发到各pod上。kube-proxy会监视Kubernetes Service对象和Endpoints，调用netlink接口以相应地创建ipvs规则并定期与Kubernetes Service对象和Endpoints对象同步ipvs规则，以确保ipvs状态与期望一致。访问服务时，流量将被重定向到其中一个后端Pod。

与iptables类似，ipvs基于netfilter 的 hook 功能，但使用哈希表作为底层数据结构并在内核空间中工作。这意味着ipvs可以更快地重定向流量，并且在同步代理规则时具有更好的性能。此外，ipvs为负载均衡算法提供了更多选项，例如：

rr：轮询调度
lc：最小连接数
dh：目标哈希
sh：源哈希
sed：最短期望延迟
nq：不排队调度

注意： ipvs模式假定在运行kube-proxy之前在节点上都已经安装了IPVS内核模块。当kube-proxy以ipvs代理模式启动时，kube-proxy将验证节点上是否安装了IPVS模块，如果未安装，则kube-proxy将回退到iptables代理模式。

如果某个服务后端pod发生变化，标签选择器适应的pod有多一个，适应的信息会立即反映到apiserver上,而kube-proxy一定可以watch到etc中的信息变化，而将它立即转为ipvs或者iptables中的规则，这一切都是动态和实时的，删除一个pod也是同样的原理。如图：

Service的定义

Service字段含义

[root@k8s-master ~]# kubectl explain svc
KIND:     Service
VERSION:  v1

DESCRIPTION:
     Service is a named abstraction of software service (for example, mysql)
     consisting of local port (for example 3306) that the proxy listens on, and
     the selector that determines which pods will answer requests sent through
     the proxy.

FIELDS:
   apiVersion    <string>
     APIVersion defines the versioned schema of this representation of an
     object. Servers should convert recognized schemas to the latest internal
     value, and may reject unrecognized values. More info:
     https://git.k8s.io/community/contributors/devel/api-conventions.md#resources

   kind    <string>
     Kind is a string value representing the REST resource this object
     represents. Servers may infer this from the endpoint the client submits
     requests to. Cannot be updated. In CamelCase. More info:
     https://git.k8s.io/community/contributors/devel/api-conventions.md#types-kinds

   metadata    <Object>
     Standard object's metadata. More info:
     https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata

   spec    <Object>
     Spec defines the behavior of a service.
     https://git.k8s.io/community/contributors/devel/api-conventions.md#spec-and-status

   status    <Object>
     Most recently observed status of the service. Populated by the system.
     Read-only. More info:
     https://git.k8s.io/community/contributors/devel/api-conventions.md#spec-and-status

其中重要的4个字段：

apiVersion:
kind:
metadata:
spec:
　　clusterIP: 可以自定义，也可以动态分配
　　ports:（与后端容器端口关联）
　　selector:（关联到哪些pod资源上）
　　type：服务类型

service的类型

对一些应用（如 Frontend）的某些部分，可能希望通过外部（Kubernetes 集群外部）IP 地址暴露 Service。

Kubernetes ServiceTypes 允许指定一个需要的类型的 Service，默认是 ClusterIP 类型。

Type 的取值以及行为如下：

ClusterIP：通过集群的内部 IP 暴露服务，选择该值，服务只能够在集群内部可以访问，这也是默认的 ServiceType。
NodePort：通过每个 Node 上的 IP 和静态端口（NodePort）暴露服务。NodePort 服务会路由到 ClusterIP 服务，这个 ClusterIP 服务会自动创建。通过请求 <NodeIP>:<NodePort>，可以从集群的外部访问一个 NodePort 服务。
LoadBalancer：使用云提供商的负载均衡器，可以向外部暴露服务。外部的负载均衡器可以路由到 NodePort 服务和 ClusterIP 服务。
ExternalName：通过返回 CNAME 和它的值，可以将服务映射到 externalName 字段的内容（例如， foo.bar.example.com）。没有任何类型代理被创建，这只有 Kubernetes 1.7 或更高版本的 kube-dns 才支持。

ClusterIP的service类型演示

[root@k8s-master mainfests]# cat redis-svc.yaml 
apiVersion: v1
kind: Service
metadata:
  name: redis
  namespace: default
spec:
  selector:　　#标签选择器，必须指定pod资源本身的标签
    app: redis
    role: logstor
  type: ClusterIP　　#指定服务类型为ClusterIP
  ports: 　　#指定端口
  - port: 6379　　#暴露给服务的端口
  - targetPort: 6379　　#容器的端口
[root@k8s-master mainfests]# kubectl apply -f redis-svc.yaml 
service/redis created
[root@k8s-master mainfests]# kubectl get svc
NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
kubernetes   ClusterIP   10.96.0.1        <none>        443/TCP    36d
redis        ClusterIP   10.107.238.182   <none>        6379/TCP   1m

[root@k8s-master mainfests]# kubectl describe svc redis
Name:              redis
Namespace:         default
Labels:            <none>
Annotations:       kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"redis","namespace":"default"},"spec":{"ports":[{"port":6379,"targetPort":6379}...
Selector:          app=redis,role=logstor
Type:              ClusterIP
IP:                10.107.238.182　　#service ip
Port:              <unset>  6379/TCP
TargetPort:        6379/TCP
Endpoints:         10.244.1.16:6379　　#此处的ip+端口就是pod的ip+端口
Session Affinity:  None
Events:            <none>

[root@k8s-master mainfests]# kubectl get pod redis-5b5d6fbbbd-v82pw -o wide
NAME                     READY     STATUS    RESTARTS   AGE       IP            NODE
redis-5b5d6fbbbd-v82pw   1/1       Running   0          20d       10.244.1.16   k8s-node01

从上演示可以总结出：service不会直接到pod，service是直接到endpoint资源，就是地址加端口，再由endpoint再关联到pod。

service只要创建完，就会在dns中添加一个资源记录进行解析，添加完成即可进行解析。资源记录的格式为：SVC_NAME.NS_NAME.DOMAIN.LTD.
默认的集群service 的A记录：svc.cluster.local.
redis服务创建的A记录：redis.default.svc.cluster.local.

NodePort的service类型演示

　　NodePort即节点Port，通常在部署Kubernetes集群系统时会预留一个端口范围用于NodePort，其范围默认为：30000~32767之间的端口。定义NodePort类型的Service资源时，需要使用.spec.type进行明确指定。

[root@k8s-master mainfests]# kubectl get pods --show-labels |grep myapp-deploy
myapp-deploy-69b47bc96d-4hxxw   1/1       Running   0          12m       app=myapp,pod-template-hash=2560367528,release=canary
myapp-deploy-69b47bc96d-95bc4   1/1       Running   0          12m       app=myapp,pod-template-hash=2560367528,release=canary
myapp-deploy-69b47bc96d-hwbzt   1/1       Running   0          12m       app=myapp,pod-template-hash=2560367528,release=canary
myapp-deploy-69b47bc96d-pjv74   1/1       Running   0          12m       app=myapp,pod-template-hash=2560367528,release=canary
myapp-deploy-69b47bc96d-rf7bs   1/1       Running   0          12m       app=myapp,pod-template-hash=2560367528,release=canary

[root@k8s-master mainfests]# cat myapp-svc.yaml #为myapp创建service
apiVersion: v1
kind: Service
metadata:
  name: myapp
  namespace: default
spec:
  selector:
    app: myapp
    release: canary
  type: NodePort
  ports: 
  - port: 80
    targetPort: 80
    nodePort: 30080
[root@k8s-master mainfests]# kubectl apply -f myapp-svc.yaml 
service/myapp created
[root@k8s-master mainfests]# kubectl get svc
NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP   10.96.0.1        <none>        443/TCP        36d
myapp        NodePort    10.101.245.119   <none>        80:30080/TCP   5s
redis        ClusterIP   10.107.238.182   <none>        6379/TCP       28m

[root@k8s-master mainfests]# while true;do curl http://192.168.56.11:30080/hostname.html;sleep 1;done
myapp-deploy-69b47bc96d-95bc4
myapp-deploy-69b47bc96d-4hxxw
myapp-deploy-69b47bc96d-pjv74
myapp-deploy-69b47bc96d-rf7bs
myapp-deploy-69b47bc96d-95bc4
myapp-deploy-69b47bc96d-rf7bs
myapp-deploy-69b47bc96d-95bc4

[root@k8s-master mainfests]# while true;do curl http://192.168.56.11:30080/;sleep 1;done
  Hello MyApp | Version: v1 | <a href="hostname.html">Pod Name</a>
  Hello MyApp | Version: v1 | <a href="hostname.html">Pod Name</a>
  Hello MyApp | Version: v1 | <a href="hostname.html">Pod Name</a>
  Hello MyApp | Version: v1 | <a href="hostname.html">Pod Name</a>
  Hello MyApp | Version: v1 | <a href="hostname.html">Pod Name</a>
  Hello MyApp | Version: v1 | <a href="hostname.html">Pod Name</a>

从以上例子，可以看到通过NodePort方式已经实现了从集群外部端口进行访问，访问链接如下：http://192.168.56.11:30080/。实践中并不鼓励用户自定义使用节点的端口，因为容易和其他现存的Service冲突，建议留给系统自动配置。

Pod的会话保持

　　Service资源还支持Session affinity（粘性会话）机制，可以将来自同一个客户端的请求始终转发至同一个后端的Pod对象，这意味着它会影响调度算法的流量分发功用，进而降低其负载均衡的效果。因此，当客户端访问Pod中的应用程序时，如果有基于客户端身份保存某些私有信息，并基于这些私有信息追踪用户的活动等一类的需求时，那么应该启用session affinity机制。

　　Service affinity的效果仅仅在一段时间内生效，默认值为10800秒，超出时长，客户端再次访问会重新调度。该机制仅能基于客户端IP地址识别客户端身份，它会将经由同一个NAT服务器进行原地址转换的所有客户端识别为同一个客户端，由此可知，其调度的效果并不理想。Service 资源通过. spec. sessionAffinity 和. spec. sessionAffinityConfig 两个字段配置粘性会话。 spec. sessionAffinity 字段用于定义要使用的粘性会话的类型，它仅支持使用“ None” 和“ ClientIP” 两种属性值。如下：

[root@k8s-master mainfests]# kubectl explain svc.spec.sessionAffinity
KIND:     Service
VERSION:  v1

FIELD:    sessionAffinity <string>

DESCRIPTION:
     Supports "ClientIP" and "None". Used to maintain session affinity. Enable
     client IP based session affinity. Must be ClientIP or None. Defaults to
     None. More info:
     https://kubernetes.io/docs/concepts/services-networking/service/#virtual-ips-and-service-proxies

sessionAffinity支持ClientIP和None 两种方式，默认是None（随机调度） ClientIP是来自于同一个客户端的请求调度到同一个pod中

[root@k8s-master mainfests]# vim myapp-svc.yaml 
apiVersion: v1
kind: Service
metadata:
  name: myapp
  namespace: default
spec:
  selector:
    app: myapp
    release: canary
  sessionAffinity: ClientIP
  type: NodePort
  ports: 
  - port: 80
    targetPort: 80
    nodePort: 30080
[root@k8s-master mainfests]# kubectl apply -f myapp-svc.yaml 
service/myapp configured
[root@k8s-master mainfests]# kubectl describe svc myapp
Name:                     myapp
Namespace:                default
Labels:                   <none>
Annotations:              kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"myapp","namespace":"default"},"spec":{"ports":[{"nodePort":30080,"port":80,"ta...
Selector:                 app=myapp,release=canary
Type:                     NodePort
IP:                       10.101.245.119
Port:                     <unset>  80/TCP
TargetPort:               80/TCP
NodePort:                 <unset>  30080/TCP
Endpoints:                10.244.1.18:80,10.244.1.19:80,10.244.2.15:80 + 2 more...
Session Affinity:         ClientIP
External Traffic Policy:  Cluster
Events:                   <none>
[root@k8s-master mainfests]# while true;do curl http://192.168.56.11:30080/hostname.html;sleep 1;done
myapp-deploy-69b47bc96d-hwbzt
myapp-deploy-69b47bc96d-hwbzt
myapp-deploy-69b47bc96d-hwbzt
myapp-deploy-69b47bc96d-hwbzt
myapp-deploy-69b47bc96d-hwbzt
myapp-deploy-69b47bc96d-hwbzt
myapp-deploy-69b47bc96d-hwbzt
myapp-deploy-69b47bc96d-hwbzt

也可以使用打补丁的方式进行修改yaml内的内容，如下：

1
2
3

kubectl patch svc myapp -p '{"spec":{"sessionAffinity":"ClusterIP"}}'  #session保持，同一ip访问同一个pod

kubectl patch svc myapp -p '{"spec":{"sessionAffinity":"None"}}'    #取消session

Headless Service

有时不需要或不想要负载均衡，以及单独的 Service IP。遇到这种情况，可以通过指定 Cluster IP（spec.clusterIP）的值为 "None" 来创建 Headless Service。

这个选项允许开发人员自由寻找他们自己的方式，从而降低与 Kubernetes 系统的耦合性。应用仍然可以使用一种自注册的模式和适配器，对其它需要发现机制的系统能够很容易地基于这个 API 来构建。

对这类 Service 并不会分配 Cluster IP，kube-proxy 不会处理它们，而且平台也不会为它们进行负载均衡和路由。 DNS 如何实现自动配置，依赖于 Service 是否定义了 selector。

（1）编写headless service配置清单
[root@k8s-master mainfests]# cp myapp-svc.yaml myapp-svc-headless.yaml 
[root@k8s-master mainfests]# vim myapp-svc-headless.yaml
apiVersion: v1
kind: Service
metadata:
  name: myapp-headless
  namespace: default
spec:
  selector:
    app: myapp
    release: canary
  clusterIP: "None"　　#headless的clusterIP值为None
  ports: 
  - port: 80
    targetPort: 80

（2）创建headless service 
[root@k8s-master mainfests]# kubectl apply -f myapp-svc-headless.yaml 
service/myapp-headless created
[root@k8s-master mainfests]# kubectl get svc
NAME             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
kubernetes       ClusterIP   10.96.0.1        <none>        443/TCP        36d
myapp            NodePort    10.101.245.119   <none>        80:30080/TCP   1h
myapp-headless   ClusterIP   None             <none>        80/TCP         5s
redis            ClusterIP   10.107.238.182   <none>        6379/TCP       2h

（3）使用coredns进行解析验证
[root@k8s-master mainfests]# dig -t A myapp-headless.default.svc.cluster.local. @10.96.0.10

; <<>> DiG 9.9.4-RedHat-9.9.4-61.el7 <<>> -t A myapp-headless.default.svc.cluster.local. @10.96.0.10
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62028
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;myapp-headless.default.svc.cluster.local. IN A

;; ANSWER SECTION:
myapp-headless.default.svc.cluster.local. 5 IN A 10.244.1.18
myapp-headless.default.svc.cluster.local. 5 IN A 10.244.1.19
myapp-headless.default.svc.cluster.local. 5 IN A 10.244.2.15
myapp-headless.default.svc.cluster.local. 5 IN A 10.244.2.16
myapp-headless.default.svc.cluster.local. 5 IN A 10.244.2.17

;; Query time: 4 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Thu Sep 27 04:27:15 EDT 2018
;; MSG SIZE  rcvd: 349

[root@k8s-master mainfests]# kubectl get svc -n kube-system
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP   36d

[root@k8s-master mainfests]# kubectl get pods -o wide -l app=myapp
NAME                            READY     STATUS    RESTARTS   AGE       IP            NODE
myapp-deploy-69b47bc96d-4hxxw   1/1       Running   0          1h        10.244.1.18   k8s-node01
myapp-deploy-69b47bc96d-95bc4   1/1       Running   0          1h        10.244.2.16   k8s-node02
myapp-deploy-69b47bc96d-hwbzt   1/1       Running   0          1h        10.244.1.19   k8s-node01
myapp-deploy-69b47bc96d-pjv74   1/1       Running   0          1h        10.244.2.15   k8s-node02
myapp-deploy-69b47bc96d-rf7bs   1/1       Running   0          1h        10.244.2.17   k8s-node02

（4）对比含有ClusterIP的service解析
[root@k8s-master mainfests]# dig -t A myapp.default.svc.cluster.local. @10.96.0.10

; <<>> DiG 9.9.4-RedHat-9.9.4-61.el7 <<>> -t A myapp.default.svc.cluster.local. @10.96.0.10
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50445
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;myapp.default.svc.cluster.local. IN    A

;; ANSWER SECTION:
myapp.default.svc.cluster.local. 5 IN    A    10.101.245.119

;; Query time: 1 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Thu Sep 27 04:31:16 EDT 2018
;; MSG SIZE  rcvd: 107

[root@k8s-master mainfests]# kubectl get svc
NAME             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
kubernetes       ClusterIP   10.96.0.1        <none>        443/TCP        36d
myapp            NodePort    10.101.245.119   <none>        80:30080/TCP   1h
myapp-headless   ClusterIP   None             <none>        80/TCP         11m
redis            ClusterIP   10.107.238.182   <none>        6379/TCP       2h

从以上的演示可以看到对比普通的service和headless service，headless service做dns解析是直接解析到pod的，而servcie是解析到ClusterIP的，那么headless有什么用呢？？？这将在statefulset中应用到，这里暂时仅仅做了解什么是headless service和创建方法。

Ingress

概述

向外网暴露集群内服务，以使客户端能够访问，有以下几种方法，本文重点描述Ingress。

LoadBalancer

LoadBalancer一般由云服务供应商提供或者用户自定义，运行在集群之外。在创建service时为其配置LoadBalancer相关参数，当从外网访问集群内servcie时，用户直接连接到LoadBalancer服务器，LoadBalancer服务器再将流量转发到集群内service。Loadbalancer配置及使用方法与各云服务供应商有关，本文不详细描述。

NodePort

这种方式要求集群中部分节点有被外网访问的能力。Kubernetes为每个NodePort类型的服务在集群中的每个节点上分配至少一个主机网络端口号。客户通过能被外网访问的节点IP加上节点端口的方式访问服务。大多数情况下不会通过这种方式向集群外暴露服务，原因有四。

其一：大多情况下，为了安全起见，集群中的节点位于完全内网环境中，不应该有被外网直接访问的能力。一般外网访问集群中的节点都是通过边界服务器如网关、跳板等，而这种边界服务器需要通过各种方式进行安全加固。
其二：如果集群内节点可以从外网直接访问的话，则会将集群内节点地址、服务名称、端口号等信息直接暴露在外，非常不安全。
其三：服务端口号一般由系统自动分配，并非固定，而服务名称也可能发生变更，此时外部客户端需要跟踪变更并修改，属于重试耦合。
其四：这种方式，每个服务至少向外网暴露一个端口号，当服务很多时不易于管理。

Ingress

Ingress不是某种产品、组件的名称，它应该是kubernetes向集群外暴露服务的一种思路、技术，用户完全可以根据这种思路提供自己的Ingress实现，当然kubernetes提供了默认Ingress实现还有其它第三方实现，一般无需自己开发。它的思路是这样的，首先在集群内运行一个服务或者pod也可以是容器，不管是什么它至少应该有一个外网可以访问的IP，至少向外网开放一个端口号，让它充当反向代理服务器。当外网想要访问集群内service时，只需访问这个反向代理服务器并指定相关参数，代理服务器根据请求参数并结合内部规则，将请求转发到service。这种思路与LoadBalancer的不同之处是它就位于集群内，而LoadBalancer位于集群外。与NodePort的不同之处是集群只向外暴露一个服务或者pod等，而NodePort是暴露全部service。

Kubernetes用nginx实现反向代理服务器，称为Ingress Controller，是pod类型资源。同时提供了Ingress类型对象，通过创建Ingress对象配置nginx反向代理服务器的转发规则。Nginx反向代理服务器收到来自外网的请求后，用请求的URL地址、请求头字段区别不同service，然后转发请求。

部署

GitHub：https://github.com/kubernetes/ingress-nginx/tree/nginx-0.20.0/deploy

mandatory.yml

apiVersion: v1
kind: Namespace
metadata:
  name: ingress-nginx

---

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: default-http-backend
  labels:
    app.kubernetes.io/name: default-http-backend
    app.kubernetes.io/part-of: ingress-nginx
  namespace: ingress-nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: default-http-backend
      app.kubernetes.io/part-of: ingress-nginx
  template:
    metadata:
      labels:
        app.kubernetes.io/name: default-http-backend
        app.kubernetes.io/part-of: ingress-nginx
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: default-http-backend
          # Any image is permissible as long as:
          # 1. It serves a 404 page at /
          # 2. It serves 200 on a /healthz endpoint
          image: k8s.gcr.io/defaultbackend-amd64:1.5
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
              scheme: HTTP
            initialDelaySeconds: 30
            timeoutSeconds: 5
          ports:
            - containerPort: 8080
          resources:
            limits:
              cpu: 10m
              memory: 20Mi
            requests:
              cpu: 10m
              memory: 20Mi

---
apiVersion: v1
kind: Service
metadata:
  name: default-http-backend
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: default-http-backend
    app.kubernetes.io/part-of: ingress-nginx
spec:
  ports:
    - port: 80
      targetPort: 8080
  selector:
    app.kubernetes.io/name: default-http-backend
    app.kubernetes.io/part-of: ingress-nginx

---

kind: ConfigMap
apiVersion: v1
metadata:
  name: nginx-configuration
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx

---

kind: ConfigMap
apiVersion: v1
metadata:
  name: tcp-services
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx

---

kind: ConfigMap
apiVersion: v1
metadata:
  name: udp-services
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx

---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: nginx-ingress-serviceaccount
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: nginx-ingress-clusterrole
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
rules:
  - apiGroups:
      - ""
    resources:
      - configmaps
      - endpoints
      - nodes
      - pods
      - secrets
    verbs:
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - get
  - apiGroups:
      - ""
    resources:
      - services
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - "extensions"
    resources:
      - ingresses
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - events
    verbs:
      - create
      - patch
  - apiGroups:
      - "extensions"
    resources:
      - ingresses/status
    verbs:
      - update

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
  name: nginx-ingress-role
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
rules:
  - apiGroups:
      - ""
    resources:
      - configmaps
      - pods
      - secrets
      - namespaces
    verbs:
      - get
  - apiGroups:
      - ""
    resources:
      - configmaps
    resourceNames:
      # Defaults to "<election-id>-<ingress-class>"
      # Here: "<ingress-controller-leader>-<nginx>"
      # This has to be adapted if you change either parameter
      # when launching the nginx-ingress-controller.
      - "ingress-controller-leader-nginx"
    verbs:
      - get
      - update
  - apiGroups:
      - ""
    resources:
      - configmaps
    verbs:
      - create
  - apiGroups:
      - ""
    resources:
      - endpoints
    verbs:
      - get

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
  name: nginx-ingress-role-nisa-binding
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: nginx-ingress-role
subjects:
  - kind: ServiceAccount
    name: nginx-ingress-serviceaccount
    namespace: ingress-nginx

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: nginx-ingress-clusterrole-nisa-binding
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: nginx-ingress-clusterrole
subjects:
  - kind: ServiceAccount
    name: nginx-ingress-serviceaccount
    namespace: ingress-nginx

---

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-ingress-controller
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: ingress-nginx
      app.kubernetes.io/part-of: ingress-nginx
  template:
    metadata:
      labels:
        app.kubernetes.io/name: ingress-nginx
        app.kubernetes.io/part-of: ingress-nginx
      annotations:
        prometheus.io/port: "10254"
        prometheus.io/scrape: "true"
    spec:
      serviceAccountName: nginx-ingress-serviceaccount
      containers:
        - name: nginx-ingress-controller
          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.20.0
          args:
            - /nginx-ingress-controller
            - --default-backend-service=$(POD_NAMESPACE)/default-http-backend
            - --configmap=$(POD_NAMESPACE)/nginx-configuration
            - --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
            - --udp-services-configmap=$(POD_NAMESPACE)/udp-services
            - --publish-service=$(POD_NAMESPACE)/ingress-nginx
            - --annotations-prefix=nginx.ingress.kubernetes.io
          securityContext:
            capabilities:
              drop:
                - ALL
              add:
                - NET_BIND_SERVICE
            # www-data -> 33
            runAsUser: 33
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          ports:
            - name: http
              containerPort: 80
            - name: https
              containerPort: 443
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /healthz
              port: 10254
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /healthz
              port: 10254
              scheme: HTTP
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
---

更改暴露方式

此刻问题来了通过yaml创建的deploy以及server来看好像并没有把nginx端口映射到宿主机上,那么我访问宿主机ip就不会有任何返回,这里可以通过hostport+DaemonSet来解决这个问题
修改yaml文件
1.修改nginx 部署方式为DaemonSet
2.注释replicas: 1
3.增加 hostNetwork: true 在spec: 段内增加
4.增加hostPort 在Ports段内增加

---

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: nginx-ingress-controller
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
spec:
 # replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: ingress-nginx
      app.kubernetes.io/part-of: ingress-nginx
  template:
    metadata:
      labels:
        app.kubernetes.io/name: ingress-nginx
        app.kubernetes.io/part-of: ingress-nginx
      annotations:
        prometheus.io/port: "10254"
        prometheus.io/scrape: "true"
    spec:
      serviceAccountName: nginx-ingress-serviceaccount
      hostNetwork: true
      containers:
        - name: nginx-ingress-controller
          image: siriuszg/nginx-ingress-controller:0.20.0
          args:
            - /nginx-ingress-controller
            - --default-backend-service=$(POD_NAMESPACE)/default-http-backend
            - --configmap=$(POD_NAMESPACE)/nginx-configuration
            - --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
            - --udp-services-configmap=$(POD_NAMESPACE)/udp-services
            - --publish-service=$(POD_NAMESPACE)/ingress-nginx
            - --annotations-prefix=nginx.ingress.kubernetes.io
          securityContext:
            capabilities:
              drop:
                - ALL
              add:
                - NET_BIND_SERVICE
            # www-data -> 33
            runAsUser: 33
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          ports:
            - name: http
              containerPort: 80
              hostPort: 80
            - name: https
              containerPort: 443
              hostPort: 443
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /healthz
              port: 10254
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /healthz
              port: 10254
              scheme: HTTP
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
---

创建一个tomcat并用ingress7层代理转发

创建

cat tomcat-ingress.yaml 

apiVersion: v1
kind: Service
metadata:
  name: tomcat
  namespace: default
spec:
  type: ClusterIP
  selector:
    app: tomcat
    release: canary
  ports:
  - name: http
    port: 8080
    targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata: 
  name: tomcat-deploy
spec:
  replicas: 1
  selector: 
    matchLabels:
      app: tomcat
      release: canary
  template:
    metadata:
      labels:
        app: tomcat
        release: canary
    spec:
      containers:
      - name: tomcat
        image: tomcat:7-alpine
        ports:
        - name: httpd
          containerPort: 8080

查看

kubectl get pod | grep tomcat
tomcat-deploy-64b488b68-wk45q   1/1     Running   0          29m
kubectl get svc | grep tomcat
tomcat        ClusterIP   10.0.0.183   <none>        8080/TCP       29m

创建ingress

cat ingress-tomcat.yaml 
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-tomcat
  namespace: default
  annotations: 
    kubernets.io/ingress.class: "nginx"
spec:
  rules:
  - host: www.aa.com  #用来解析的域名地址
    http:
      paths:
      - path: 
        backend:
          serviceName: tomcat   #集群服务的名字
          servicePort: 8080        #集群服务开放的端口

访问测试

 curl -H "host:www.aa.com" http://10.167.130.206:80  #IP地址为运行ingress-nginx-controller的主机地址,因为只有运行了这个容器才会监听宿主的80端口。

<!DOCTYPE html>


<html lang="en">
    <head>
        <title>Apache Tomcat/7.0.91</title>

可用命令查看ingress列表

kubectl get ingress
NAME             HOSTS        ADDRESS   PORTS   AGE
ingress-tomcat   www.aa.com             80      34m

kubectl describe ingress ingress-tomcat
Name:             ingress-tomcat
Namespace:        default
Address:          
Default backend:  default-http-backend:80 (<none>)
Rules:
 Host        Path  Backends
 ----        ----  --------
 www.aa.com  
                tomcat:8080 (<none>)

用ingress来代理4层请求

创建mysql

cat mysql.yaml

apiVersion: v1
kind: Service
metadata:
 name: mysql
 namespace: default
spec:
 type: ClusterIP
 selector:
   app: mysql
   release: canary
 ports:
 - name: mysql
   port: 3306
   targetPort: 3306
---
apiVersion: apps/v1
kind: DaemonSet  #每个node都运行一个pod，我就两个node正好用来测试负载效果
metadata: 
 name: mysql-daemonset
spec:
# replicas: 1
 selector: 
   matchLabels:
     app: mysql
     release: canary
 template:
   metadata:
     labels:
       app: mysql
       release: canary
   spec:
     containers:
     - name: mysql
       image: mysql
       env:
         - name: MYSQL_ROOT_PASSWORD  #mysql镜像必须的变量,不写这个变量mysql跑不起来
           value: "mysql"
       ports:
       - name: mysql
         containerPort: 3306
         
         
kubectl apply -f  mysql.yaml #部署mysql pod
kubectl get pod
mysql-daemonset-2xdr7           1/1     Running   0          63m
mysql-daemonset-stvhf           1/1     Running   0          63m

修改configmap文件

cat configmap.yaml 

kind: ConfigMap
apiVersion: v1
metadata:
  name: tcp-services
  namespace: ingress-nginx
data:
 3306: "default/mysql:3306" #我们的mysql是在默认命名空间里创建的这个自行查看更改

添加或增减configmap直接在这个configmap文件中新增或去除即可,用kubectl apply重新应用,尽量不要直接kubectl delete -f configmap.yaml 因为这样会把整个tcp-services都删掉,删掉后node节点检测不到数据就不会对规则更新,这个和7层代理不太一样,7层可以一个服务创建一个name,4层在创建ingress服务时候就指定tcp-services和udp-services两个文件了,定义位置可以看ingress.yaml的281-282行

另外有一点,因为创建完ingress时候Node节点就是监听80和443的,在配置这个mysql时排错过程中发现,3306端口并不会默认监听,只有ingress可以正常连接到mysql集群时,node才会去监听3306端口,有错误可以按照这个思路排错,配置过程中也遇到很多问题,排错思路,容器>容器ip>集群ip>ingress。

断开连接几次试试，应该是轮训算法，分别在两个pod的数据库里写了a和b用来测试负载效果

mysql> show databases; 
+--------------------+
| Database           |
+--------------------+
| a                  |
| information_schema |
| mysql              |
| performance_schema |
| sys                |
+--------------------+

mysql> show databases; 
+--------------------+
| Database           |
+--------------------+
| b                  |
| information_schema |
| mysql              |
| performance_schema |
| sys                |
+--------------------+

其它示例

Single Service Ingress

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: test-ingress
spec:
  backend:
    serviceName: testsvc
    servicePort: 80

创建对象：

1
2
3

$ kubectl get ing
NAME                RULE          BACKEND        ADDRESS
test-ingress        -             testsvc:80     107.178.254.228

以上配置中没有具体的rule，所以诸如http(s)://107.178.254.228/xxx之类的请求都转发到testsvc的80端口。

其于URL转发

1 2	foo.bar.com -> 178.91.123.132 -> / foo s1:80 / bar s2:80

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: test
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: foo.bar.com
    http:
      paths:
      - path: /foo
        backend:
          serviceName: s1
          servicePort: 80
      - path: /bar
        backend:
          serviceName: s2
          servicePort: 80

创建对象：

$ kubectl get ing
NAME      RULE          BACKEND   ADDRESS
test      -
          foo.bar.com
          /foo          s1:80
          /bar          s2:80

基于名称的虚拟主机

实现如下目标：

1
2
3

foo.bar.com --|                 |-> foo.bar.com s1:80
              | 178.91.123.132  |
bar.foo.com --|                 |-> bar.foo.com s2:8

这种方式的核心逻辑是用http请求中的host字段区分不同服务，而不是URL。如host: foo.bar.com的请求被转发到s1服务80端口，如host: bar.foo.com的请求被转发到s2服务80端口。

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: test
spec:
  rules:
  - host: foo.bar.com
    http:
      paths:
      - backend:
          serviceName: s1
          servicePort: 80
  - host: bar.foo.com
    http:
      paths:
      - backend:
          serviceName: s2
          servicePort: 80

TLS

利用Secret类型对象为Ingress Controller提供私钥及证书，对通信链路加密。

Secret配置：

apiVersion: v1
data:
  tls.crt: base64 encoded cert
  tls.key: base64 encoded key
kind: Secret
metadata:
  name: testsecret
  namespace: default
type: Secret

在Ingress对象中引用：

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: no-rules-map
spec:
  tls:
  - hosts:
    - foo.bar.com
    secretName: testsecret
  rules:
  - host: foo.bar.com
    http:
      paths:
      - path: /
        backend:
          serviceName: s1
          servicePort: 80

StatefulSet

概述

在具有以下特点时使用StatefulSets：

稳定性，唯一的网络标识符。
稳定性，持久化存储。
有序的部署和扩展。
有序的删除和终止。
有序的自动滚动更新。

Pod调度运行时，如果应用不需要任何稳定的标示、有序的部署、删除和扩展，则应该使用一组无状态副本的控制器来部署应用，例如 Deployment 或 ReplicaSet更适合无状态服务需求。

RC、Deployment、DaemonSet都是面向无状态的服务，它们所管理的Pod的IP、名字，启停顺序等都是随机的，而StatefulSet是什么？顾名思义，有状态的集合，管理所有有状态的服务，比如MySQL、MongoDB集群等。
StatefulSet本质上是Deployment的一种变体，在v1.9版本中已成为GA版本，它为了解决有状态服务的问题，它所管理的Pod拥有固定的Pod名称，启停顺序，在StatefulSet中，Pod名字称为网络标识(hostname)，还必须要用到共享存储。
在Deployment中，与之对应的服务是service，而在StatefulSet中与之对应的headless service，headless service，即无头服务，与service的区别就是它没有Cluster IP，解析它的名称时将返回该Headless Service对应的全部Pod的Endpoint列表。
除此之外，StatefulSet在Headless Service的基础上又为StatefulSet控制的每个Pod副本创建了一个DNS域名，这个域名的格式为：

1 2	$(podname).(headless server name) FQDN： $(podname).(headless server name).namespace.svc.cluster.local

示例

apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  ports:
  - port: 80
    name: web
  clusterIP: None
  selector:
    app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  selector:
    matchLabels:
      app: nginx # has to match .spec.template.metadata.labels
  serviceName: "nginx"  #声明它属于哪个Headless Service.
  replicas: 3 # by default is 1
  template:
    metadata:
      labels:
        app: nginx # has to match .spec.selector.matchLabels
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: nginx
        image: k8s.gcr.io/nginx-slim:0.8
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:   #可看作pvc的模板
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "gluster-heketi"  #存储类名，改为集群中已存在的
      resources:
        requests:
          storage: 1Gi

通过该配置文件，可看出StatefulSet的三个组成部分：

Headless Service：名为nginx，用来定义Pod网络标识( DNS domain)。
StatefulSet：定义具体应用，名为Nginx，有三个Pod副本，并为每个Pod定义了一个域名。
volumeClaimTemplates：存储卷申请模板，创建PVC，指定pvc名称大小，将自动创建pvc，且pvc必须由存储类供应。

为什么需要 headless service 无头服务？
在用Deployment时，每一个Pod名称是没有顺序的，是随机字符串，因此是Pod名称是无序的，但是在statefulset中要求必须是有序，每一个pod不能被随意取代，pod重建后pod名称还是一样的。而pod IP是变化的，所以是以Pod名称来识别。pod名称是pod唯一性的标识符，必须持久稳定有效。这时候要用到无头服务，它可以给每个Pod一个唯一的名称。
为什么需要volumeClaimTemplate？
对于有状态的副本集都会用到持久存储，对于分布式系统来讲，它的最大特点是数据是不一样的，所以各个节点不能使用同一存储卷，每个节点有自已的专用存储，但是如果在Deployment中的Pod template里定义的存储卷，是所有副本集共用一个存储卷，数据是相同的，因为是基于模板来的，而statefulset中每个Pod都要自已的专有存储卷，所以statefulset的存储卷就不能再用Pod模板来创建了，于是statefulSet使用volumeClaimTemplate，称为卷申请模板，它会为每个Pod生成不同的pvc，并绑定pv，从而实现各pod有专用存储。这就是为什么要用volumeClaimTemplate的原因。

1
2
3

$ kubectl create -f nginx.yaml 
service "nginx" created
statefulset "web" created

#第一个是创建web-0
$ kubectl get pod
web-0                     1/1       ContainerCreating   0          51s

#待web-0 running且ready时，创建web-1
$ kubectl get pod
web-0                     1/1       Running             0          51s
web-1                     0/1       ContainerCreating   0          42s

#待web-1 running且ready时，创建web-2
$ kubectl get pod
web-0                     1/1       Running             0          1m
web-1                     1/1       Running             0          45s
web-2                     1/1       ContainerCreating   0          36s

#最后三个Pod全部running且ready
$ kubectl get pod
NAME                      READY     STATUS    RESTARTS   AGE
web-0                     1/1       Running   0          4m
web-1                     1/1       Running   0          3m
web-2                     1/1       Running   0          1m

$ kubectl get pvc
NAME              STATUS    VOLUME                                  CAPACITY   ACCESS MODES   STORAGECLASS     AGE
www-web-0         Bound     pvc-ecf003f3-828d-11e8-8815-000c29774d39   2G        RWO          gluster-heketi   7m
www-web-1         Bound     pvc-0615e33e-828e-11e8-8815-000c29774d39   2G        RWO          gluster-heketi   6m
www-web-2         Bound     pvc-43a97acf-828e-11e8-8815-000c29774d39   2G        RWO          gluster-heketi   4m

如果集群中没有StorageClass的动态供应PVC的机制，也可以提前手动创建多个PV、PVC，手动创建的PVC名称必须符合之后创建的StatefulSet命名规则：(volumeClaimTemplates.name)-(pod_name)

Statefulset名称为web 三个Pod副本: web-0，web-1,web-2，volumeClaimTemplates名称为：www，那么自动创建出来的PVC名称为www-web-[0-2]，为每个Pod创建一个PVC。

规律总结

匹配Pod name(网络标识)的模式为：$(statefulset名称)-$(序号)，比如上面的示例：web-0，web-1，web-2。
StatefulSet为每个Pod副本创建了一个DNS域名，这个域名的格式为：$(podname).(headless server name)，也就意味着服务间是通过Pod域名来通信而非Pod IP，因为当Pod所在Node发生故障时，Pod会被飘移到其它Node上，Pod IP会发生变化，但是Pod域名不会有变化。
StatefulSet使用Headless服务来控制Pod的域名，这个域名的FQDN为：$(service name).$(namespace).svc.cluster.local，其中，“cluster.local”指的是集群的域名。
根据volumeClaimTemplates，为每个Pod创建一个pvc，pvc的命名规则匹配模式：(volumeClaimTemplates.name)-(pod_name)，比如上面的volumeMounts.name=www， Pod name=web-[0-2]，因此创建出来的PVC是www-web-0、www-web-1、www-web-2。
删除Pod不会删除其pvc，手动删除pvc将自动释放pv。
关于Cluster Domain、headless service名称、StatefulSet 名称如何影响StatefulSet的Pod的DNS域名的示例：

Cluster Domain	Service (ns/name)	StatefulSet (ns/name)	StatefulSet Domain	Pod DNS	Pod Hostname
cluster.local	default/nginx	default/web	nginx.default.svc.cluster.local	web-{0..N-1}.nginx.default.svc.cluster.local	web-{0..N-1}
cluster.local	foo/nginx	foo/web	nginx.foo.svc.cluster.local	web-{0..N-1}.nginx.foo.svc.cluster.local	web-{0..N-1}
kube.local	foo/nginx	foo/web	nginx.foo.svc.kube.local	web-{0..N-1}.nginx.foo.svc.kube.local	web-{0..N-1}

Statefulset的启停顺序：

有序部署：部署StatefulSet时，如果有多个Pod副本，它们会被顺序地创建（从0到N-1）并且，在下一个Pod运行之前所有之前的Pod必须都是Running和Ready状态。
有序删除：当Pod被删除时，它们被终止的顺序是从N-1到0。
有序扩展：当对Pod执行扩展操作时，与部署一样，它前面的Pod必须都处于Running和Ready状态。

Statefulset Pod管理策略：
在v1.7以后，通过允许修改Pod排序策略，同时通过.spec.podManagementPolicy字段确保其身份的唯一性。

OrderedReady：上述的启停顺序，默认设置。
Parallel：告诉StatefulSet控制器并行启动或终止所有Pod，并且在启动或终止另一个Pod之前不等待前一个Pod变为Running and Ready或完全终止。

StatefulSet使用场景：

稳定的持久化存储，即Pod重新调度后还是能访问到相同的持久化数据，基于PVC来实现。
稳定的网络标识符，即Pod重新调度后其PodName和HostName不变。
有序部署，有序扩展，基于init containers来实现。
有序收缩。

更新策略

在Kubernetes 1.7及更高版本中，通过.spec.updateStrategy字段允许配置或禁用Pod、labels、source request/limits、annotations自动滚动更新功能。
OnDelete：通过.spec.updateStrategy.type 字段设置为OnDelete，StatefulSet控制器不会自动更新StatefulSet中的Pod。用户必须手动删除Pod，以使控制器创建新的Pod。
RollingUpdate：通过.spec.updateStrategy.type 字段设置为RollingUpdate，实现了Pod的自动滚动更新，如果.spec.updateStrategy未指定，则此为默认策略。
StatefulSet控制器将删除并重新创建StatefulSet中的每个Pod。它将以Pod终止（从最大序数到最小序数）的顺序进行，一次更新每个Pod。在更新下一个Pod之前，必须等待这个Pod Running and Ready。
Partitions：通过指定 .spec.updateStrategy.rollingUpdate.partition 来对 RollingUpdate 更新策略进行分区，如果指定了分区，则当 StatefulSet 的 .spec.template 更新时，具有大于或等于分区序数的所有 Pod 将被更新。
具有小于分区的序数的所有 Pod 将不会被更新，即使删除它们也将被重新创建。如果 StatefulSet 的 .spec.updateStrategy.rollingUpdate.partition 大于其 .spec.replicas，则其 .spec.template 的更新将不会传播到 Pod。在大多数情况下，不需要使用分区。

DNS

官方网站：https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/

介绍

Kubenetes以插件的形式提供DNS服务，一般是运行在kube-system名称空间下的service，拥有固定IP地址。插件运行起来后，配置各个节点上的kubelet，告诉它集群中DNS服务的IP地址，kebelet在启动容器时再将DNS服务器的地址告诉容器，容器再使用此DNS服务器进行域名解析。

能通过DNS名称得到什么？

集群中的service在创建时会被分配DNS名称，包含DNS服务自己。默认情况下客户pod的DNS搜索列表包含pod本身的namespace与集群默认域名，以下示例说明。

假设有一个名为foo的服务，们于bar名称空间。运行在bar名称空间中的其它pod直接以foo做为关键字查询DNS记录，对于bar名称空间中的pod需要使用关键字foo.bar查询foo的DNS记录。

以下小节详细介绍kubernetes DNS支持的记录类型及层次布局。

SERVICE

A records

普通服务（非无头服务）的名称被指派一条DNS A类记录，如位于my-namespace名称空间下的my-svc服务，为其指派的A类DNS记录为”my-svc.my-namespace.svc.cluster.local”，这条记录会被解析成服务的集群虚拟IP地址。

如果my-svn为无头服务，同样为其分配”my-svc.my-namespace.svc.cluster.local”的Ａ类记录。与普通服务不同，如果无头服务包含标签选择器，则此Ａ类记录会被解析成所有标签选择器选中pod的pod网络地址，用户可以通过某种算法如循环使用返回的条目集合。

SRV records

当普通或者是无头服务包含命名端口时，创建此类SRV条目，例如:

“_my-port-name._my-port-protocol.my-svc.my-namespace.svc.cluster.local”，有多个命名端口则创建多条记录。对于普通服务，此条记录被解析成my-port-name所对应的端口号与一条CNAME记录：”my-svc.my-namespace.svc.cluster.local”。对于包含标签选择器的无头服务，其解析结果为每个pod中的my-port-name对应的端口号及每个pod的CNAME记录：
pod-name.my-svc.my-namespace.svc.cluster.local。

Pods

本节提及之pod应该是指由用户直接创建，而非由ReplicaSet等副本控制器创建。

A records

如果功能被开启，pod以如下格式被分配A类记录：”pod-ip-address.my-namespace.pod.cluster.local”。例如pod的ip为1.2.3.4，名称空间为default，则在DNS中的Ａ类记录为”1-2-3-4.default.pod.cluster.local，当查询时此此条记录被解析成pod名称。

Pod’s hostname and subdomain fields

默认情况下，pod的hostname与pod名称相同。同时pod Spec有一个可选字段hostname，其值优先于pod名称被设置成hostname。另外，pod Spec还包含subdomain可选字段，可以为pod设置子域。假如为pod设置hostname为foo，subdomain设置为bar，其位于my-namespace名称空间下，则其有如下的全限定域名：”foo.bar.my-namespace.svc.cluster.local”。此条记录被解析成pod的IP地址。

大多数情况下，用户不直接创建pod，而是创建各种类型本控制器。用户直接创建pod的一种常见场景是创建包含选择器的无头服务，然后直接创建pod，让无头服务中的选择器选中自己创建的pod。如果打算为自己创建的pod创建A类记录，则必需在pod Spec中设置hostname字段。示例：

apiVersion: v1
kind: Service
metadata:
  name: default-subdomain
spec:
  selector:
    name: busybox
  clusterIP: None
  ports:
  - name: foo # Actually, no port is needed.
    port: 1234
    targetPort: 1234
---
apiVersion: v1
kind: Pod
metadata:
  name: busybox1
  labels:
    name: busybox
spec:
  hostname: busybox-1
  subdomain: default-subdomain
  containers:
  - image: busybox
    command:
      - sleep
      - "3600"
    name: busybox
---
apiVersion: v1
kind: Pod
metadata:
  name: busybox2
  labels:
    name: busybox
spec:
  hostname: busybox-2
  subdomain: default-subdomain
  containers:
  - image: busybox
    command:
      - sleep
      - "3600"
    name: busybox

上例的结果就是既会为无头服务default-subdomain创建A类解析条目”default-subdomain.my-namespace.svc.cluster.local”，也会单独为每个pod创建诸如”busybox-1.default-subdomain.my-namespace.svc.cluster.local”、”busybox-2.default-subdomain.my-namespace.svc.cluster.local”，分别被解析成pod的IP地址。前文讲过，如果没有为pod Spec指定hostname字段，则不创建后两条记录。

上述记录的生成过程大概是先选中pod，根据pod生成endpoint对象，根据生成的endpoint对象生成以上记录。如果无头服务没有标签选择器，则可以手动为其创建endpoint，如果打算为手动创建的endpoint单独添加记录，则必需在其Spec中设置hostname字段，其作用与在pod中设置相同。

Pod’s DNS Policy

以上介绍的是kubernetes如何为service、pod创建DNS记录。那么如何定义pod内部解析域名时的规则呢？可以设置pod Spec中的dnsPolicy字段，有如下几种取值：

“Default“:从节点继承DNS相关配置，对节点依赖性强。
“ClusterFirst“:如果DNS查询与配置好的默认集群域名前缀不匹配，则将查询请求转发到从节点继承而来，作为查询的上游服务器。
“ClusterFirstWithHostNet“:如果pod工作在主机网络，就将dnsPolicy设置成“ClusterFirstWithHostNet”，这样效率更高。
“None“:1.9版本引入的新特性(Beta in v1.10)。完全忽略kubernetes系统提供的DNS，以pod Spec中dnsConfig配置取而代之。

如果dnsPolicy字段未设置，默认策略是”ClusterFirst”。

以下示例使用”ClusterFirstWithHostNet”，因为pod工作在主机网络：


apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
  - image: busybox
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
    name: busybox
  restartPolicy: Always
  hostNetwork: true
  dnsPolicy: ClusterFirstWithHostNet

Pod’s DNS Config

DNS Config从1.9版本引入，1.10版本可用，新增加特性的目的是为增强用户对pod之DNS控制。首先在apiServer与kubelet中设置特性开关，如”–feature-gates=CustomPodDNS=true,…”，而后在pod Spec中将dnsPolicy设置成None，并新添加dnsConfig字段。

dnsConfig字段：

nameservers:DNS服务器IP地址，最多三个。如果dnsPolicy为None则此字段至少包含一个IP地址，为其它值时可选。此字段之地址会与其它方式生成的地址合并去重。
searches：查询域名，可选。与其它策略生成的域名合并去重。
options:对象选项列给，每个对象必需有name属性，value属性可选。

示例：

apiVersion: v1
kind: Pod
metadata:
  namespace: default
  name: dns-example
spec:
  containers:
    - name: test
      image: nginx
  dnsPolicy: "None"
  dnsConfig:
    nameservers:
      - 1.2.3.4
    searches:
      - ns1.svc.cluster.local
      - my.dns.search.suffix
    options:
      - name: ndots
        value: "2"
      - name: edns0

创建pod后，其/etc/resolv.conf内容如下：

1
2
3

nameserver 1.2.3.4
search ns1.svc.cluster.local my.dns.search.suffix
options ndots:2 edns0

调度之节点亲和性

Affinity 翻译成中文是“亲和性”，它对应的是 Anti-Affinity，我们翻译成“互斥”。这两个词比较形象，可以把 pod 选择 node 的过程类比成磁铁的吸引和互斥，不同的是除了简单的正负极之外，pod 和 node 的吸引和互斥是可以灵活配置的。

Affinity的优点：

匹配有更多的逻辑组合，不只是字符串的完全相等
调度分成软策略(soft)和硬策略(hard)，在软策略下，如果没有满足调度条件的节点，pod会忽略这条规则，继续完成调度。

目前主要的node affinity：

requiredDuringSchedulingIgnoredDuringExecution
表示pod必须部署到满足条件的节点上，如果没有满足条件的节点，就不停重试。其中IgnoreDuringExecution表示pod部署之后运行的时候，如果节点标签发生了变化，不再满足pod指定的条件，pod也会继续运行。
requiredDuringSchedulingRequiredDuringExecution
表示pod必须部署到满足条件的节点上，如果没有满足条件的节点，就不停重试。其中RequiredDuringExecution表示pod部署之后运行的时候，如果节点标签发生了变化，不再满足pod指定的条件，则重新选择符合要求的节点。
preferredDuringSchedulingIgnoredDuringExecution
表示优先部署到满足条件的节点上，如果没有满足条件的节点，就忽略这些条件，按照正常逻辑部署。
preferredDuringSchedulingRequiredDuringExecution
表示优先部署到满足条件的节点上，如果没有满足条件的节点，就忽略这些条件，按照正常逻辑部署。其中RequiredDuringExecution表示如果后面节点标签发生了变化，满足了条件，则重新调度到满足条件的节点。

软策略和硬策略的区分是有用处的，硬策略适用于 pod 必须运行在某种节点，否则会出现问题的情况，比如集群中节点的架构不同，而运行的服务必须依赖某种架构提供的功能；软策略不同，它适用于满不满足条件都能工作，但是满足条件更好的情况，比如服务最好运行在某个区域，减少网络传输等。这种区分是用户的具体需求决定的，并没有绝对的技术依赖。

下面是一个官方的示例：

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/e2e-az-name
            operator: In
            values:
            - e2e-az1
            - e2e-az2
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: another-node-label-key
            operator: In
            values:
            - another-node-label-value
  containers:
  - name: with-node-affinity
    image: gcr.io/google_containers/pause:2.0

这个 pod 同时定义了 requiredDuringSchedulingIgnoredDuringExecution 和 preferredDuringSchedulingIgnoredDuringExecution 两种 nodeAffinity。第一个要求 pod 运行在特定 AZ 的节点上，第二个希望节点最好有对应的 another-node-label-key:another-node-label-value 标签。

这里的匹配逻辑是label在某个列表中，可选的操作符有：

In: label的值在某个列表中
NotIn：label的值不在某个列表中
Exists：某个label存在
DoesNotExist：某个label不存在
Gt：label的值大于某个值（字符串比较）
Lt：label的值小于某个值（字符串比较）

如果nodeAffinity中nodeSelector有多个选项，节点满足任何一个条件即可；如果matchExpressions有多个选项，则节点必须同时满足这些选项才能运行pod 。

滚动升级

服务升级

修改其中的image

1	kubectl set image deployment/demoservice demoservice=lib/demoservicelib:1.1.0 --namespace=demospace

或者

1	kubectl edit deployment demoservice -n demospace

查看deployments版本

1	kubectl rollout history deployments demoservice -n demospace

查看deployments指定版本信息

1	kubectl rollout history deployments demoservice -n demospace --revision=2

回滚

1	kubectl rollout undo deployment/demoservice --namespace=demospace

回滚到指定版本：

1	kubectl rollout undo deployment/demoservice --to-revision=2 --namespace=demospace

查看历史

1	kubectl describe deployment/demoservice --namespace=demospace

设置配额

配置Namespace资源限制

中文文档：http://docs.kubernetes.org.cn/746.html

配置容器资源限制

对于一个pod来说，资源最基础的2个的指标就是：CPU和内存。
Kubernetes提供了个采用requests和limits 两种类型参数对资源进行预分配和使用限制。
limit 会限制pod的资源利用：

当pod 内存超过limit时，会被oom。
当cpu超过limit时，不会被kill，但是会限制不超过limit值。

测试内存限制

部署一个压测容器，压测时会分配250M内存，但实际pod的内存limit为100Mi

apiVersion: v1
kind: Pod
metadata:
  name: memory-demo
  namespace: example
spec:
  containers:
  - name: memory-demo-2-ctr
    image: polinux/stress
    resources:
      requests:
        memory: "50Mi"
      limits:
        memory: "100Mi"
    command: ["stress"]
    args: ["--vm", "1", "--vm-bytes", "250M", "--vm-hang", "1"]

部署后查看pod状态，可以看到pod被OOM，

1
2
3

  kubectl -n example get po
NAME             READY     STATUS        RESTARTS   AGE
memory-demo      0/1       OOMKilled     1          11s

测试CPU限制

apiVersion: v1
kind: Pod
metadata:
  name: cpu-demo
  namespace: example
spec:
  containers:
  - name: cpu-demo-ctr
    image: vish/stress
    resources:
      limits:
        cpu: "1"
      requests:
        cpu: "0.5"
    args:
    - -cpus
    - "2"

查看容器信息，可以看到pod 虽然不会被kill掉，但是实际使用cpu被限制只有1000m。

1
2
3

 kubectl -n example top po cpu-demo
NAME       CPU(cores)   MEMORY(bytes)
cpu-demo   1000m        0Mi

容器服务质量（QoS）

Kubernetes 提供服务质量管理，根据容器的资源配置，将pod 分为Guaranteed, Burstable, BestEffort 3个级别。当资源紧张时根据分级决定调度和驱逐策略，这三个分级分别代表：

Guaranteed： pod中所有容器都设置了limit和request，并且相等（设置limit后假如没有设置request会自动设置为limit值）
Burstable： pod中有容器未设置limit，或者limit和request不相等。这种类型的pod在调度节点时，可能出现节点超频的情况。
BestEffort： pod中没有任何容器设置request和limit。

计算qos代码：https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/helper/qos/qos.go

不同QoS对容器影响

oom：

Kubernetes会根据QoS设置oom的评分调整参数oom_score_adj，oom_killer 根据内存使用情况算出oom_score，并且和oom_score_adj综合评价，进程的评分越高，当发生oom时越优先被kill。

QoS	oom_score_adj
Guaranteed	-998
BestEffort	1000
Burstable	min(max(2, 1000 - (1000 * memoryRequestBytes) / machineMemoryCapacityBytes), 999)

当节点内存不足时，QoS为Guaranteed 的pod 最后被kill。而BestEffort 级别的pod优先被kill。其次是Burstable，根据计算公式 oom_score_adj 值范围2到999，设置的request越大，oom_score_adj越低，oom时保护程度越高。

实践

节点信息：
# kubectl describe no cn-beijing.i-2zeavb11mttnqnnicwj9 | grep -A 3 Capacity
Capacity:
 cpu:     4
 memory:  8010196Ki
 pods:    110

apiVersion: v1
kind: Pod
metadata:
  name: memory-demo-qos-1
  namespace: example
spec:
  containers:
  - name: memory-demo-qos-1
    image: polinux/stress
    resources:
      requests:
        memory: "200Mi"
    command: ["stress"]
    args: ["--vm", "1", "--vm-bytes", "50M", "--vm-hang", "1"]

---
apiVersion: v1
kind: Pod
metadata:
  name: memory-demo-qos-2
  namespace: example
spec:
  containers:
  - name: memory-demo-qos-2
    image: polinux/stress
    resources:
      requests:
        memory: "400Mi"
    command: ["stress"]
    args: ["--vm", "1", "--vm-bytes", "50M", "--vm-hang", "1"]

---
apiVersion: v1
kind: Pod
metadata:
  name: memory-demo-qos-3
  namespace: example
spec:
  containers:
  - name: memory-demo-qos-3
    image: polinux/stress
    resources:
      requests:
        memory: "200Mi"
        cpu: "2"
      limits:
        memory: "200Mi"
        cpu: "2"
    command: ["stress"]
    args: ["--vm", "1", "--vm-bytes", "50M", "--vm-hang", "1"]

单个节点可分配内存为8010196Ki，大约7822.45Mi。
根据Burstable 的计算方式:

1
2
3

request 200Mi: (1000 - 1000*200/7822.45) 约为975

request 400Mi: (1000 - 1000*400/7822.45) 约为950

我们分别查看这3个pod的oom参数

// request 200Mi
  kubectl -n example exec  memory-demo-qos-1 cat /proc/1/oom_score_adj
975

// request 400Miß
  kubectl -n example exec  memory-demo-qos-2 cat /proc/1/oom_score_adj
949

// Guaranteed
  kubectl -n example exec  memory-demo-qos-3 cat /proc/1/oom_score_adj
-998

设置oom 规则代码：https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/qos/policy.go

pod 驱逐：

当节点的内存和cpu资源不足，开始驱逐节点上的pod时。QoS同样会影响驱逐的优先级。顺序如下：

kubelet 优先驱逐 BestEffort的pod 和实际占用资源大于requests的Burstable pod。

接下来驱逐实际占用资源小于request的Burstable pod。
QoS为Guaranteed的pod最后驱逐， kubelet 会保证Guaranteed的pod 不会因为其他pod的资源消耗而被驱逐。
当QoS相同时，kubelet 根据Priority计算驱逐的优先级

ResourceQuota

Kubernetes提供ResourceQuota对象，用于配置限制namespace内的每种类型的k8s对象数量和资源（cpu，内存）。

一个namespace中可以创建一个或多个ResourceQuota
如果namespace中配置了ResourceQuota，部署时必须设置request和limit，否则会拒绝创建请求。
可以通过这是limitRange配置每个pod默认的requests和limits避免上述问题
1.10以后支持扩展资源详见：https://kubernetes.io/docs/tasks/configure-pod-container/extended-resource/

apiVersion: v1
kind: ResourceQuota
metadata:
  name: mem-cpu-demo
  namespace: example
spec:
  hard:
    requests.cpu: "3"
    requests.memory: 1Gi
    limits.cpu: "5"
    limits.memory: 2Gi
    pods: "5"

LimitRange

LimitRange 是用来设置 namespace 中 Pod 的默认的资源 request 和 limit 值，以及大小范围。

apiVersion: v1
kind: LimitRange
metadata:
  name: mem-limit-range
  namespace: example
spec:
  limits:
  - default:  # default limit
      memory: 512Mi
      cpu: 2
    defaultRequest:  # default request
      memory: 256Mi
      cpu: 0.5
    max:  # max limit
      memory: 800Mi
      cpu: 3
    min:  # min request
      memory: 100Mi
      cpu: 0.3
    maxLimitRequestRatio:  # max value for limit / request
      memory: 2
      cpu: 2
    type: Container # limit type, support: Container / Pod / PersistentVolumeClaim

limitRange支持的参数如下：

default 代表默认的limit
defaultRequest 代表默认的request
max 代表limit的最大值
min 代表request的最小值
maxLimitRequestRatio 代表 limit / request的最大值。由于节点是根据pod request 调度资源，可以做到节点超卖，maxLimitRequestRatio 代表pod最大超卖比例。

总结

Kubernetes 提供request 和 limit 两种方式设置容器资源。
为了提高资源利用率，k8s调度时根据pod 的request值计算调度策略，从而实现节点资源超卖。
k8s根据limit限制pod使用资源，当内存超过limit时会触发oom。且限制pod的cpu 不允许超过limit。
根据pod的 request和limit，k8s会为pod 计算服务质量，并分为Guaranteed, Burstable, BestEffort 这3级。当节点资源不足时，发生驱逐或者oom时， Guaranteed 级别的pod 优先保护， Burstable 节点次之（request越大，使用资源量越少保护级别越高）， BestEffort 最先被驱逐。
Kubernetes提供了RequestQuota和LimitRange 用于设置namespace 内pod 的资源范围和规模总量。 RequestQuota 用于设置各种类型对象的数量， cpu和内存的总量。 LimitRange 用于设置pod或者容器 request和limit 的默认值，最大最小值，以及超卖比例（limit / request）。
对于一些重要的线上应用，我们应该合理设置limit和request，limit和request 设置一致，资源不足时k8s会优先保证这些pod正常运行。
为了提高资源利用率。对一些非核心，并且资源不长期占用的应用，可以适当减少pod的request，这样pod在调度时可以被分配到资源不是十分充裕的节点，提高使用率。但是当节点的资源不足时，也会优先被驱逐或被oom kill。

PV & PVC

本质上，Kubernetes Volume 是一个目录，这一点与 Docker Volume 类似。当 Volume 被 mount 到 Pod，Pod 中的所有容器都可以访问这个 Volume。Kubernetes Volume 也支持多种 backend 类型，包括 emptyDir、hostPath、GCE Persistent Disk、AWS Elastic Block Store、NFS、Ceph 等，完整列表可参考 https://kubernetes.io/docs/concepts/storage/volumes/#types-of-volumes

emptyDir

emptyDir 是最基础的 Volume 类型。正如其名字所示，一个 emptyDir Volume 是 Host 上的一个空目录。

emptyDir Volume 对于容器来说是持久的，对于 Pod 则不是。当 Pod 从节点删除时，Volume 的内容也会被删除。但如果只是容器被销毁而 Pod 还在，则 Volume 不受影响。

也就是说：emptyDir Volume 的生命周期与 Pod 一致。

Pod 中的所有容器都可以共享 Volume，它们可以指定各自的 mount 路径。下面通过例子来实践 emptyDir，配置文件如下：

apiVersion: v1
kind: Pod
metadata:
  name: producer-consumer
spec:
  containers:
  - name: producer
    image: busybox
    volumeMounts:
    - name: shared-volume
      mountPath: /producer_dir
    args:
    - /bin/sh
    - -c
    - echo "hello world" > /producer_dir/hello; sleep 30000
  - name: consumer
    image: busybox
    volumeMounts:
    - name: shared-volume
      mountPath: /consumer_dir
    args:
    - /bin/sh
    - -c
    - cat /consumer_dir/hello; sleep 30000
  volumes:
  - name: shared-volume
    emptyDir: {}

这里我们模拟了一个 producer-consumer 场景。Pod 有两个容器 producer和 consumer，它们共享一个 Volume。producer 负责往 Volume 中写数据，consumer 则是从 Volume 读取数据。

文件最底部 volumes 定义了一个 emptyDir 类型的 Volume shared-volume。

producer 容器将 shared-volume mount 到 /producer_dir 目录。

producer 通过 echo 将数据写到文件 hello 里。

consumer 容器将 shared-volume mount 到 /consumer_dir 目录。

consumer 通过 cat 从文件 hello 读数据。

执行如下命令创建 Pod：

[root@master ~]# kubectl apply -f emptydir.yaml 
pod/producer-consumer created
[root@master ~]# kubectl get pods
NAME                READY   STATUS    RESTARTS   AGE
producer-consumer   2/2     Running   0          8s
[root@master ~]# kubectl logs producer-consumer consumer
hello world

kubectl logs 显示容器 consumer 成功读到了 producer 写入的数据，验证了两个容器共享 emptyDir Volume。

emptyDir 是 Host 上创建的临时目录，其优点是能够方便地为 Pod 中的容器提供共享存储，不需要额外的配置。但它不具备持久性，如果 Pod 不存在了，emptyDir 也就没有了。根据这个特性，emptyDir 特别适合 Pod 中的容器需要临时共享存储空间的场景，比如前面的生产者消费者用例。

hostPath

hostPath Volume 的作用是将 Docker Host 文件系统中已经存在的目录 mount 给 Pod 的容器。大部分应用都不会使用 hostPath Volume，因为这实际上增加了 Pod 与节点的耦合，限制了 Pod 的使用。不过那些需要访问 Kubernetes 或 Docker 内部数据（配置文件和二进制库）的应用则需要使用 hostPath。

下面的例子，我们把主机上的目录/data/pod/v1挂载到 Pod 上容器的/usr/share/nginx/html/。

apiVersion: v1
kind: Pod
metadata:
  name: pod-vol-hostPath
spec:
  containers:
  - name: mytest
    image: wangzan18/mytest:v1
    volumeMounts:
    - name: html
      mountPath: /usr/share/nginx/html/
  volumes:
  - name: html
    hostPath:
      path: /data/pod/v1
      type: DirectoryOrCreate

如果 Pod 被销毁了，hostPath 对应的目录也还会被保留，从这点看，hostPath 的持久性比 emptyDir 强。不过一旦 Host 崩溃，hostPath 也就没法访问了。

PV&PVC介绍

PersistentVolume（pv）和PersistentVolumeClaim（pvc）是k8s提供的两种API资源，用于抽象存储细节。管理员关注于如何通过pv提供存储功能而无需
关注用户如何使用，同样的用户只需要挂载pvc到容器中而不需要关注存储卷采用何种技术实现。
pvc和pv的关系与pod和node关系类似，前者消耗后者的资源。pvc可以向pv申请指定大小的存储资源并设置访问模式,这就可以通过Provision -> Claim 的方式，来对存储资源进行控制。

生命周期

pv和pvc遵循以下生命周期：

供应准备。通过集群外的存储系统或者云平台来提供存储持久化支持。
- 静态提供：管理员手动创建多个PV，供PVC使用。
- 动态提供：动态创建PVC特定的PV，并绑定。
绑定。用户创建pvc并指定需要的资源和访问模式。在找到可用pv之前，pvc会保持未绑定状态。
使用。用户可在pod中像volume一样使用pvc。
释放。用户删除pvc来回收存储资源，pv将变成“released”状态。由于还保留着之前的数据，这些数据需要根据不同的策略来处理，否则这些存储资源无法被其他pvc使用。
回收(Reclaiming)。pv可以设置三种回收策略：保留（Retain），回收（Recycle）和删除（Delete）。
- 保留策略：允许人工处理保留的数据。
- 删除策略：将删除pv和外部关联的存储资源，需要插件支持。
- 回收策略：将执行清除操作，之后可以被新的pvc使用，需要插件支持。

目前只有NFS和HostPath类型卷支持回收策略，AWS EBS,GCE PD,Azure Disk和Cinder支持删除(Delete)策略。

Provisioning

两种方式提供的PV资源供给：

static

通过集群管理者创建多个PV，为集群“使用者”提供存储能力而隐藏真实存储的细节。并且存在于kubenretes api中，可被直接使用。
dynamic

动态卷供给是kubernetes独有的功能，这一功能允许按需创建存储建。在此之前，集群管理员需要事先在集群外由存储提供者或者云提供商创建存储卷，成功之后再创建PersistentVolume对象，才能够在kubernetes中使用。动态卷供给能让集群管理员不必进行预先创建存储卷，而是随着用户需求进行创建。在1.5版本提高了动态卷的弹性和可用性。在此前1.4版本中加入了一个新的 API 对象 StorageClass，可以定义多个 StorageClass 对象，并可以分别指定存储插件、设置参数，用于提供不同的存储卷。这样的设计让集群管理员能够在同一个集群内，定义和提供不同类型的、不同参数的卷（相同或者不同的存储系统）。这样的设计还确保了最终用户在无需了解太多的情况下，有能力选择不同的存储选项。

PV类型

pv支持以下类型:

GCEPersistentDisk
AWSElasticBlockStore
NFS
iSCSI
RBD (Ceph Block Device)
Glusterfs
AzureFile
AzureDisk
CephFS
cinder
FC
FlexVolume
Flocker
PhotonPersistentDisk
Quobyte
VsphereVolume
HostPath (single node testing only – local storage is not supported in any way and WILL NOT WORK in a multi-node cluster)

PV属性:

访问模式,与pv的语义相同。在请求资源时使用特定模式。

accessModes 指定访问模式为 ReadWriteOnce，支持的访问模式有：

ReadWriteOnce – PV 能以 read-write 模式 mount 到单个节点。

ReadOnlyMany – PV 能以 read-only 模式 mount 到多个节点。

ReadWriteMany – PV 能以 read-write 模式 mount 到多个节点。
资源,申请的存储资源数额。

PV卷阶段状态：

Available – 资源尚未被claim使用
Bound – 卷已经被绑定到claim了
Released – claim被删除，卷处于释放状态，但未被集群回收。
Failed – 卷自动回收失败

示例

创建pv

apiVersion: v1
kind: PersistentVolume
metadata:
  name: ebs-pv
  labels:
    type: amazonEBS
spec:
  capacity:
    storage: 5Gi
  accessModes:
    - ReadWriteOnce
  awsElasticBlockStore:
    volumeID: vol-079c492115a7be6e1
    fsType: ext4

创建pvc

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: nginx-pvc
  labels:
    type: amazonEBS
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

创建deployment

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-with-pvc
spec:
  replicas: 1
  template:
    metadata:
      labels:
        service: nginx
        app: test
    spec:
      containers:
      - image: nginx
        name: nginx-with-pvc
        volumeMounts:
        - mountPath: /test-ebs
          name: my-pvc
      volumes:
      - name: my-pvc
        persistentVolumeClaim:
          claimName: nginx-pvc

回收策略

PersistentVolumes 可以有多种回收策略，包括 “Retain”、”Recycle” 和 “Delete”。对于动态配置的 PersistentVolumes来说，默认回收策略为 “Delete”。这表示当用户删除对应的 PersistentVolumeClaim 时，动态配置的 volume 将被自动删除。如果 volume 包含重要数据时，这种自动行为可能是不合适的。那种情况下，更适合使用 “Retain” 策略。使用 “Retain” 时，如果用户删除 PersistentVolumeClaim，对应的 PersistentVolume 不会被删除。相反，它将变为 Released 状态，表示所有的数据可以被手动恢复。

示例：

pvc.yml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-test
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ceph-rbd 
  resources:
    requests:
      storage: 1Gi

deployment.yml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-rbd
spec:
  replicas: 1
  template:
    metadata:
      labels:
        name: nginx
    spec:
      containers:
        - name: nginx
          image: nginx
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 80
          volumeMounts:
            - name: ceph-rbd-volume
              mountPath: "/usr/share/nginx/html"
      volumes:
      - name: ceph-rbd-volume
        persistentVolumeClaim:
          claimName: pvc-test

新建pvc、deployment、写入数据并删除pvc操作过程：

[root@lab1 test]# ll
total 8
-rw-r--r-- 1 root root 533 Oct 24 17:54 nginx.yaml
-rw-r--r-- 1 root root 187 Oct 24 17:55 pvc.yaml
[root@lab1 test]# kubectl apply -f pvc.yaml 
persistentvolumeclaim/pvc-test created
[root@lab1 test]# kubectl get pvc 
NAME               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pvc-test           Bound    pvc-069c4486-d773-11e8-bd12-000c2931d938   1Gi        RWO            ceph-rbd       7s
[root@lab1 test]# kubectl apply -f nginx.yaml 
deployment.extensions/nginx-rbd created
[root@lab1 test]# kubectl get pod |grep nginx-rbd
nginx-rbd-7c6449886-thv25           1/1     Running   0          33s
[root@lab1 test]# kubectl exec -it nginx-rbd-7c6449886-thv25 -- /bin/bash -c 'echo ygqygq2 > /usr/share/nginx/html/ygqygq2.html'        
[root@lab1 test]# kubectl exec -it nginx-rbd-7c6449886-thv25 -- cat /usr/share/nginx/html/ygqygq2.html
ygqygq2
[root@lab1 test]# kubectl delete -f nginx.yaml 
deployment.extensions "nginx-rbd" deleted
[root@lab1 test]# kubectl get pvc pvc-test     
NAME       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pvc-test   Bound    pvc-069c4486-d773-11e8-bd12-000c2931d938   1Gi        RWO            ceph-rbd       4m10s
[root@lab1 test]# kubectl delete pvc pvc-test  # 删除PVC
persistentvolumeclaim "pvc-test" deleted
[root@lab1 test]# kubectl get pv pvc-069c4486-d773-11e8-bd12-000c2931d938
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM              STORAGECLASS   REASON   AGE
pvc-069c4486-d773-11e8-bd12-000c2931d938   1Gi        RWO            Retain           Released   default/pvc-test   ceph-rbd                4m33s
[root@lab1 test]# kubectl get pv pvc-069c4486-d773-11e8-bd12-000c2931d938 -o yaml > /tmp/pvc-069c4486-d773-11e8-bd12-000c2931d938.yaml  # 保留备用

从上面可以看到，pvc删除后，pv变成Released状态。

再次创建同名PVC，查看是否分配原来PV操作过程：

[root@lab1 test]# kubectl apply -f pvc.yaml 
persistentvolumeclaim/pvc-test created
[root@lab1 test]# kubectl get pvc  # 查看新建的PVC              
NAME               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pvc-test           Bound    pvc-f2df48ea-d773-11e8-b6c8-000c29ea3e30   1Gi        RWO            ceph-rbd       19s
[root@lab1 test]# kubectl get pv pvc-069c4486-d773-11e8-bd12-000c2931d938  # 查看原来的PV
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM              STORAGECLASS   REASON   AGE
pvc-069c4486-d773-11e8-bd12-000c2931d938   1Gi        RWO            Retain           Released   default/pvc-test   ceph-rbd                7m18s

从上面可以看到，PVC分配的是新的PV，因为PV状态不是Available。

那怎么才能让PV状态变成Available呢？我们来查看之前的PV：

[root@lab1 test]# cat /tmp/pvc-069c4486-d773-11e8-bd12-000c2931d938.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: ceph.com/rbd
    rbdProvisionerIdentity: ceph.com/rbd
  creationTimestamp: 2018-10-24T09:56:06Z
  finalizers:
  - kubernetes.io/pv-protection
  name: pvc-069c4486-d773-11e8-bd12-000c2931d938
  resourceVersion: "11752758"
  selfLink: /api/v1/persistentvolumes/pvc-069c4486-d773-11e8-bd12-000c2931d938
  uid: 06b57ef7-d773-11e8-bd12-000c2931d938
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 1Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: pvc-test
    namespace: default
    resourceVersion: "11751559"
    uid: 069c4486-d773-11e8-bd12-000c2931d938
  persistentVolumeReclaimPolicy: Retain
  rbd:
    fsType: ext4
    image: kubernetes-dynamic-pvc-06a25bd3-d773-11e8-8c3e-0a580af400d5
    keyring: /etc/ceph/keyring
    monitors:
    - 192.168.105.92:6789
    - 192.168.105.93:6789
    - 192.168.105.94:6789
    pool: kube
    secretRef:
      name: ceph-secret
      namespace: kube-system
    user: kube
  storageClassName: ceph-rbd
status:
  phase: Released

从上面可以看到，spec.claimRef这段，仍保留之前的PVC信息。

我们大胆删除spec.claimRef这段。再次查看PV：

1	kubectl edit pv pvc-069c4486-d773-11e8-bd12-000c2931d938

1
2
3

[root@lab1 test]# kubectl get pv pvc-069c4486-d773-11e8-bd12-000c2931d938 
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   REASON   AGE
pvc-069c4486-d773-11e8-bd12-000c2931d938   1Gi        RWO            Retain           Available           ceph-rbd                10m

从上面可以看到，之前的PV pvc-069c4486-d773-11e8-bd12-000c2931d938已经变为Available。

小结

当前版本Kubernetes PVC存储大小是唯一能被设置或请求的资源，因我们没有修改PVC的大小，在PV的Available状态下，有PVC请求分配相同大小时，PV会被分配出去并绑定成功。
在PV变成Available过程中，最关键的是PV的spec.claimRef字段，该字段记录着原来PVC的绑定信息，删除绑定信息，即可重新释放PV从而达到Available。

Context

在kubeconfig配置文件中设置一个环境项。如果指定了一个已存在的名字，将合并新字段并覆盖旧字段。

1	kubectl config set-context NAME [--cluster=cluster_nickname] [--user=user_nickname] [--namespace=namespace]

示例

1 2	# 设置gce环境项中的user字段，不影响其他字段。 $ kubectl config set-context gce --user=cluster-admin --namespace=test --cluster=test

选项

1
2
3

--cluster="": 设置kuebconfig配置文件中环境选项中的集群。
--namespace="": 设置kuebconfig配置文件中环境选项中的命名空间。
--user="": 设置kuebconfig配置文件中环境选项中的用户。

StorageClass

存储类介绍

Kubernetes集群管理员通过提供不同的存储类，可以满足用户不同的服务质量级别、备份策略和任意策略要求的存储需求。动态存储卷供应使用StorageClass进行实现，其允许存储卷按需被创建。如果没有动态存储供应，Kubernetes集群的管理员将不得不通过手工的方式类创建新的存储卷。通过动态存储卷，Kubernetes将能够按照用户的需要，自动创建其需要的存储。

基于StorageClass的动态存储供应整体过程如下图所示：

1）集群管理员预先创建存储类（StorageClass）；

2）用户创建使用存储类的持久化存储声明(PVC：PersistentVolumeClaim)；

3）存储持久化声明通知系统，它需要一个持久化存储(PV: PersistentVolume)；

4）系统读取存储类的信息；

5）系统基于存储类的信息，在后台自动创建PVC需要的PV；

6）用户创建一个使用PVC的Pod；

7）Pod中的应用通过PVC进行数据的持久化；

8）而PVC使用PV进行数据的最终持久化处理。

定义存储类

每一个存储类都包含provisioner、parameters和reclaimPolicy这三个参数域，当一个属于某个类的PersistentVolume需要被动态提供时，将会使用上述的参数域。

存储类对象的名称非常重要，用户通过名称类请求特定的存储类。管理员创建存储类对象时，会设置类的名称和其它的参数，存储类的对象一旦被创建，将不能被更新。管理员能够为PVC指定一个默认的存储类。

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: standard
# 指定存储类的供应者
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
# 指定回收策略
reclaimPolicy: Retain
mountOptions:
  - debug

供应者

存储类有一个供应者的参数域，此参数域决定PV使用什么存储卷插件。参数必需进行设置：

存储卷	内置供应者	配置例子
AWSElasticBlockStore	✓	AWS
AzureFile	✓	Azure File
AzureDisk	✓	Azure Disk
CephFS	–	–
Cinder	✓	OpenStack Cinder
FC	–	–
FlexVolume	–	–
Flocker	✓	–
GCEPersistentDisk	✓	GCE
Glusterfs	✓	Glusterfs
iSCSI	–	–
PhotonPersistentDisk	✓	–
Quobyte	✓	Quobyte
NFS	–	–
RBD	✓	Ceph RBD
VsphereVolume	✓	vSphere
PortworxVolume	✓	Portworx Volume
ScaleIO	✓	ScaleIO
StorageOS	✓	StorageOS
Local	–	Local

Kubernetes的存储类并不局限于表中的“interneal”供应者，“interneal”供应者的名称带有“kubernetes.io”前缀；也可以允许和指定外部的供应者，外部供应者通过独立的程序进行实现。外部供应者的作者对代码在何处生存、如何供应、如何运行、使用什么卷插件（包括Flex）等有充分的判断权，kubernetes-incubator/external-storage仓库中存在编写外部提供者的类库。例如，NFS不是内部的供应者，但也是可以使用。在kubernetes-incubator/external-storage仓库中以列表的形式展示了一些外部的供应者，一些第三方供应商也提供了他们自己的外部供应者。

提供者的参数

存储类存在很多描述存储卷的参数，依赖不同的提供者可能有不同的参数。例如，对于type参数，它的值可能为io1。当一个参数被省略，则使用默认的值。

回收策略

通过存储类创建的持久化存储卷通过reclaimPolicy参数来指定，它的值可以是Delete或者Retain，默认为Delete。对于通过手工创建的，并使用存储类进行管理的持久化存储卷，将使用任何在创建时指定的存储卷。

挂接选项

通过存储类动态创建的持久化存储卷，会存在一个通过mountOptions参数指定的挂接选择。如果存储卷插件不支持指定的挂接选项，这提供存储供应就会失败，在存储类或者PV中都不会对挂接选项进行验证，因此需要在设置时进行确认。

使用存储类

动态存储卷供应基于StorageClass的API对象的来实现，集群管理员能够按需定义StorageClass对象，每一个StorageClass对象能够指定一个存储卷插件（即供应者）。集群管理员能够在一个集群中定义各种存储卷供应，用户不需要了解存储的细节和复杂性，就能够选择符合自己要求的存储。

启用动态供应

为了启用动态供应，集群管理员需要预先为用户创建一个或者多个存储类对象。存储类对象定义了使用哪个供应者，以及供应者相关的参数。下面是存储类的一个示例，它创建一个名称为slow的存储类，使用gce供应者：

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: slow
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-standard

下面创建了一个名为“fast”的存储类，其提供类似固态磁盘的存储卷磁盘：

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd

使用动态供应

用户通过在PersistentVolumeClaim中包含一个存储类，来请求动态供应存储。在Kubernetes v1.6之前的版本，通过volume.beta.kubernetes.io/storage-class注释类请求动态供应存储；在v1.6版本之后，用户应该使用PersistentVolumeClaim对象的storageClassName参数来请求动态存储。

下面是请求fast存储类的持久化存储卷声明的YAML配置文件示例：

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: claim1
spec:
  accessModes:
    - ReadWriteOnce
# 指定所使用的存储类，此存储类将会自动创建符合要求的PV
 storageClassName: fast
 resources:
    requests:
      storage: 30Gi

此声明将使用类似于固态存储磁盘，当持久化存储卷声明被删除后，存储卷也将会被销毁。

默认行为

如果Kubernetes的集群中没有指定存储类，集群管理员可以通过执行下面的设置，启用默认的存储类：

标记一个默认的StorageClass对象；
确定API server中DefaultStorage接入控制器已被启用

管理员能够通过添加storageclass.kubernetes.io/is-default-class注释，标记一个特定的StorageClass作为默认的存储类。在集群中，如果存在一个默认的StorageClass，系统将能够在不指定storageClassName 的情况下创建一个PersistentVolume，DefaultStorageClass接入控制器会自动将storageClassName指向默认的存储类。注意：在一个集群中，最多只能有一个默认的存储类，如果没有默认的存储类，那么如果在PersistentVolumeClaim中没有显示指定storageClassName，则将无法创建PersistentVolume。

NFS存储类示例

部署nfs-provisioner

为nfs-provisioner实例选择存储状态和数据的存储卷，并将存储卷挂接到容器的/export

...
 volumeMounts:
    - name: export-volume
      mountPath: /export
volumes:
  - name: export-volume
    hostPath:
      path: /tmp/nfs-provisioner
...

为StorageClass选择一个供应者名称，并在deploy/kubernetes/deployment.yaml进行设置。

1
2
3

args:
  - "-provisioner=example.com/nfs"
...

完整的deployment.yaml文件内容如下：

kind: Service
apiVersion: v1
metadata:
  name: nfs-provisioner
  labels:
    app: nfs-provisioner
spec:
  ports:
    - name: nfs
      port: 2049
    - name: mountd
      port: 20048
    - name: rpcbind
      port: 111
    - name: rpcbind-udp
      port: 111
      protocol: UDP
  selector:
    app: nfs-provisioner
---

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nfs-provisioner
spec:
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: nfs-provisioner
    spec:
      containers:
        - name: nfs-provisioner
          image: quay.io/kubernetes_incubator/nfs-provisioner:v1.0.8
          ports:
            - name: nfs
              containerPort: 2049
            - name: mountd
              containerPort: 20048
            - name: rpcbind
              containerPort: 111
            - name: rpcbind-udp
              containerPort: 111
              protocol: UDP
          securityContext:
            capabilities:
              add:
                - DAC_READ_SEARCH
                - SYS_RESOURCE
          args:
            # 定义提供者的名称，存储类通过此名称指定提供者
            - "-provisioner=nfs-provisioner"
          env:
            - name: POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: SERVICE_NAME
              value: nfs-provisioner
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          imagePullPolicy: "IfNotPresent"
          volumeMounts:
            - name: export-volume
              mountPath: /export
      volumes:
        - name: export-volume
          hostPath:
            path: /srv

在设置好deploy/kubernetes/deployment.yaml文件后，通过kubectl create命令在Kubernetes集群中部署nfs-provisioner。

1	$ kubectl create -f {path}/deployment.yaml

创建StorageClass

下面是example-nfs的StorageClass配置文件，此配置文件定义了一个名称为nfs-storageclass的存储类，此存储类的提供者为nfs-provisioner。

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-storageclass
  provisioner: nfs-provisioner

通过kubectl create -f命令使用上面的配置文件创建：

1	$ kubectl create -f deploy/kubernetes/class.yaml

storageclass “example-nfs” created

在存储类被正确创建后，就可以创建PersistenetVolumeClaim来请求StorageClass，而StorageClass将会为PersistenetVolumeClaim自动创建一个可用PersistentVolume。

创建PersistenetVolumeClaim

PersistenetVolumeClaim是对PersistenetVolume的声明，即PersistenetVolume为存储的提供者，而PersistenetVolumeClaim为存储的消费者。下面是PersistentVolumeClaim的YAML配置文件，此配置文件通过spec.storageClassName字段指定所使用的存储储类。

在此配置文件中，使用nfs-storageclass存储类为PersistenetVolumeClaim创建PersistenetVolume，所要求的PersistenetVolume存储空间大小为1Mi，可以被多个容器进行读取和写入操作。

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-pvc
spec:
  accessModes:
  - ReadWriteMany
  storageClassName： nfs-storageclass
  resources:
    requests:
      storage: 1Mi

通过kubectl create命令创建上述的持久化存储卷声明：

1	$ kubectl create -f {path}/claim.yaml

创建使用PersistenVolumeClaim的部署

在这里定义名为busybox-deployment的部署YAML配置文件，使用的镜像为busybox。基于busybox镜像的容器需要对/mnt目录下的数据进行持久化，在YAML文件指定使用名称为nfs的PersistenVolumeClaim对容器的数据进行持久化。

# This mounts the nfs volume claim into /mnt and continuously
# overwrites /mnt/index.html with the time and hostname of the pod. 
apiVersion: v1
kind: Deployment
metadata:  
  name: busybox-deployment
spec:  
  replicas: 2  
  selector:    
    name: busybox-deployment
  template:    
    metadata:      
      labels:        
        name: busybox-deployment    
    spec:      
      containers:      
      - image: busybox        
        command:          
        - sh          
        - -c          
        - 'while true; do date > /mnt/index.html; hostname >> /mnt/index.html; sleep $(($RANDOM % 5 + 5)); done'        
        imagePullPolicy: IfNotPresent        
        name: busybox        
        volumeMounts:          
        # name must match the volume name below          
        - name: nfs            
          mountPath: "/mnt"     
     # 
     volumes:      
     - name: nfs        
       persistentVolumeClaim:          
         claimName: nfs-pvc

通过kubectl create创建busy-deployment部署：

1	$ kubectl create -f {path}/nfs-busybox-deployment.yaml

liveness和readiness探针

官方文档：https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/

当你使用kuberentes的时候，有没有遇到过Pod在启动后一会就挂掉然后又重新启动这样的恶性循环？你有没有想过kubernetes是如何检测pod是否还存活？虽然容器已经启动，但是kubernetes如何知道容器的进程是否准备好对外提供服务了呢？

Kubelet使用liveness probe（存活探针）来确定何时重启容器。例如，当应用程序处于运行状态但无法做进一步操作，liveness探针将捕获到deadlock，重启处于该状态下的容器，使应用程序在存在bug的情况下依然能够继续运行下去（谁的程序还没几个bug呢）。

Kubelet使用readiness probe（就绪探针）来确定容器是否已经就绪可以接受流量。只有当Pod中的容器都处于就绪状态时kubelet才会认定该Pod处于就绪状态。该信号的作用是控制哪些Pod应该作为service的后端。如果Pod处于非就绪状态，那么它们将会被从service的load balancer中移除。

livenessProbe

定义 liveness命令

许多长时间运行的应用程序最终会转换到broken状态，除非重新启动，否则无法恢复。Kubernetes提供了liveness probe来检测和补救这种情况。

在本次实验中，你将基于 gcr.io/google_containers/busybox镜像创建运行一个容器的Pod。以下是Pod的配置文件exec-liveness.yaml：

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-exec
spec:
  containers:
  - name: liveness
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
    image: gcr.io/google_containers/busybox
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5
      periodSeconds: 5

该配置文件给Pod配置了一个容器。periodSeconds 规定kubelet要每隔5秒执行一次liveness probe。 initialDelaySeconds 告诉kubelet在第一次执行probe之前要的等待5秒钟。探针检测命令是在容器中执行 cat /tmp/healthy 命令。如果命令执行成功，将返回0，kubelet就会认为该容器是活着的并且很健康。如果返回非0值，kubelet就会杀掉这个容器并重启它。

容器启动时，执行该命令：

1	/bin/sh -c "touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600"

在容器生命的最初30秒内有一个 /tmp/healthy 文件，在这30秒内 cat /tmp/healthy命令会返回一个成功的返回码。30秒后， cat /tmp/healthy 将返回失败的返回码。

创建Pod：

1	kubectl create -f https://k8s.io/docs/tasks/configure-pod-container/exec-liveness.yaml

在30秒内，查看Pod的event：

1	kubectl describe pod liveness-exec

结果显示没有失败的liveness probe：

FirstSeen    LastSeen    Count   From            SubobjectPath           Type        Reason      Message
--------- --------    -----   ----            -------------           --------    ------      -------
24s       24s     1   {default-scheduler }                    Normal      Scheduled   Successfully assigned liveness-exec to worker0
23s       23s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Pulling     pulling image "gcr.io/google_containers/busybox"
23s       23s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Pulled      Successfully pulled image "gcr.io/google_containers/busybox"
23s       23s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Created     Created container with docker id 86849c15382e; Security:[seccomp=unconfined]
23s       23s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Started     Started container with docker id 86849c15382e

启动35秒后，再次查看pod的event：

1	kubectl describe pod liveness-exec

在最下面有一条信息显示liveness probe失败，容器被删掉并重新创建。

FirstSeen LastSeen    Count   From            SubobjectPath           Type        Reason      Message
--------- --------    -----   ----            -------------           --------    ------      -------
37s       37s     1   {default-scheduler }                    Normal      Scheduled   Successfully assigned liveness-exec to worker0
36s       36s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Pulling     pulling image "gcr.io/google_containers/busybox"
36s       36s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Pulled      Successfully pulled image "gcr.io/google_containers/busybox"
36s       36s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Created     Created container with docker id 86849c15382e; Security:[seccomp=unconfined]
36s       36s     1   {kubelet worker0}   spec.containers{liveness}   Normal      Started     Started container with docker id 86849c15382e
2s        2s      1   {kubelet worker0}   spec.containers{liveness}   Warning     Unhealthy   Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory

再等30秒，确认容器已经重启：

1	kubectl get pod liveness-exec

从输出结果来RESTARTS值加1了。

1 2	NAME READY STATUS RESTARTS AGE liveness-exec 1/1 Running 1 1m

定义一个liveness HTTP请求

我们还可以使用HTTP GET请求作为liveness probe。下面是一个基于gcr.io/google_containers/liveness镜像运行了一个容器的Pod的例子http-liveness.yaml：

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-http
spec:
  containers:
  - name: liveness
    args:
    - /server
    image: gcr.io/google_containers/liveness
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
        httpHeaders:
          - name: X-Custom-Header
            value: Awesome
      initialDelaySeconds: 3
      periodSeconds: 3

该配置文件只定义了一个容器，livenessProbe 指定kubelete需要每隔3秒执行一次liveness probe。initialDelaySeconds 指定kubelet在该执行第一次探测之前需要等待3秒钟。该探针将向容器中的server的8080端口发送一个HTTP GET请求。如果server的/healthz路径的handler返回一个成功的返回码，kubelet就会认定该容器是活着的并且很健康。如果返回失败的返回码，kubelet将杀掉该容器并重启它。

任何大于200小于400的返回码都会认定是成功的返回码。其他返回码都会被认为是失败的返回码。

查看该server的源码：server.go.

最开始的10秒该容器是活着的， /healthz handler返回200的状态码。这之后将返回500的返回码。

http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
    duration := time.Now().Sub(started)
    if duration.Seconds() > 10 {
        w.WriteHeader(500)
        w.Write([]byte(fmt.Sprintf("error: %v", duration.Seconds())))
    } else {
        w.WriteHeader(200)
        w.Write([]byte("ok"))
    }
})

容器启动3秒后，kubelet开始执行健康检查。第一次健康监测会成功，但是10秒后，健康检查将失败，kubelet将杀掉和重启容器。

创建一个Pod来测试一下HTTP liveness检测：

1	kubectl create -f https://k8s.io/docs/tasks/configure-pod-container/http-liveness.yaml

After 10 seconds, view Pod events to verify that liveness probes have failed and the Container has been restarted:

10秒后，查看Pod的event，确认liveness probe失败并重启了容器。

1	kubectl describe pod liveness-http

定义TCP liveness探针

第三种liveness probe使用TCP Socket。使用此配置，kubelet将尝试在指定端口上打开容器的套接字。如果可以建立连接，容器被认为是健康的，如果不能就认为是失败的。

apiVersion: v1
kind: Pod
metadata:
  name: goproxy
  labels:
    app: goproxy
spec:
  containers:
  - name: goproxy
    image: k8s.gcr.io/goproxy:0.1
    ports:
    - containerPort: 8080
    readinessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10
    livenessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 20

如您所见，TCP检查的配置与HTTP检查非常相似。此示例同时使用了readiness和liveness probe。容器启动后5秒钟，kubelet将发送第一个readiness probe。这将尝试连接到端口8080上的goproxy容器。如果探测成功，则该pod将被标记为就绪。Kubelet将每隔10秒钟执行一次该检查。

除了readiness probe之外，该配置还包括liveness probe。容器启动15秒后，kubelet将运行第一个liveness probe。就像readiness probe一样，这将尝试连接到goproxy容器上的8080端口。如果liveness probe失败，容器将重新启动。

使用命名的端口

可以使用命名的ContainerPort作为HTTP或TCP liveness检查：

ports:
- name: liveness-port
  containerPort: 8080
  hostPort: 8080

livenessProbe:
  httpGet:
  path: /healthz
  port: liveness-port

定义readiness探针

有时，应用程序暂时无法对外部流量提供服务。例如，应用程序可能需要在启动期间加载大量数据或配置文件。在这种情况下，你不想杀死应用程序，但你也不想发送请求。 Kubernetes提供了readiness probe来检测和减轻这些情况。 Pod中的容器可以报告自己还没有准备，不能处理Kubernetes服务发送过来的流量。

Readiness probe的配置跟liveness probe很像。唯一的不同是使用 readinessProbe而不是livenessProbe。

readinessProbe:
  exec:
    command:
    - cat
    - /tmp/healthy
  initialDelaySeconds: 5
  periodSeconds: 5

Readiness probe的HTTP和TCP的探测器配置跟liveness probe一样。

Readiness和livenss probe可以并行用于同一容器。使用两者可以确保流量无法到达未准备好的容器，并且容器在失败时重新启动。

配置Probe

Probe中有很多精确和详细的配置，通过它们你能准确的控制liveness和readiness检查：

initialDelaySeconds：容器启动后第一次执行探测是需要等待多少秒。
periodSeconds：执行探测的频率。默认是10秒，最小1秒。
timeoutSeconds：探测超时时间。默认1秒，最小1秒。
successThreshold：探测失败后，最少连续探测成功多少次才被认定为成功。默认是1。对于liveness必须是1。最小值是1。
failureThreshold：探测成功后，最少连续探测失败多少次才被认定为失败。默认是3。最小值是1。

HTTP probe中可以给 httpGet设置其他配置项：

host：连接的主机名，默认连接到pod的IP。你可能想在http header中设置”Host”而不是使用IP。
scheme：连接使用的schema，默认HTTP。
path: 访问的HTTP server的path。
httpHeaders：自定义请求的header。HTTP运行重复的header。
port：访问的容器的端口名字或者端口号。端口号必须介于1和65525之间。

对于HTTP探测器，kubelet向指定的路径和端口发送HTTP请求以执行检查。 Kubelet将probe发送到容器的IP地址，除非地址被httpGet中的可选host字段覆盖。在大多数情况下，你不想设置主机字段。有一种情况下你可以设置它。假设容器在127.0.0.1上侦听，并且Pod的hostNetwork字段为true。然后，在httpGet下的host应该设置为127.0.0.1。如果你的pod依赖于虚拟主机，这可能是更常见的情况，你不应该是用host，而是应该在httpHeaders中设置Host头。

初始化容器

理解初始容器

一个pod里可以运行多个容器,它也可以运行一个或者多个初始容器,初始容器先于应用容器运行,除了以下两点外,初始容器和普通容器没有什么两样:

它们总是run to completion
一个初始容器必须成功运行另一个才能运行

如果pod中的一个初始容器运行失败,则kubernetes会尝试重启pod直到初始容器成功运行,如果pod的重启策略设置为从不(never),则不会重启.

创建容器时,在podspec里添加initContainers字段,则指定容器即为初始容器,它们的返回状态作为数组保存在.status.initContainerStatuses里(与普通容器状态存储字段.status.containerStatuses类似)

初始容器和普通容器的不同:

初始容器支持所有普通容器的特征,包括资源配额限制和存储卷以及安全设置.但是对资源申请和限制处理初始容器略有不同,下面会介绍.此外,初始容器不支持可用性探针(readiness probe),因为它在ready之前必须run to completion

如果在一个pod里指定了多个初始容器,则它们会依次启动起来(pod内的普通容器并行启动),并且只有上一个成功下一个才能启动.当所有的初始容器都启动了,kubernetes才开始启普通应用容器.

初始容器能做什么

由于初始容器和普通应用容器是分开的镜像,因此他在做一些初始化工作很有优势:

它们可以包含并且运行一些出于安全考虑不适合和应用放在一块的小工具.
它们可以一些小工具和自定义代码来做些初始化工作,这样就不需要在普通应用容器里使用sed,awk,python或者dig来做初始化工作了
应用构建者和发布者可以独立工作,而不必再联合起来处理同一个pod
它们使用linux namespaces因此它们和普通应用pod拥有不同的文件系统视图.因此他们可以被赋予普通应用容器获取不到的secrets
它们在应用容器启动前运行,因此它们可以阻止或者延缓普通应用容器的初始化直到需要的条件满足

示例:

通过执行shell命令来等待一个服务创建完成,命令如下:

1	for i in {1..100}; do sleep 1; if dig myservice; then exit 0; fi; done; exit 1

通过downward API把当前pod注册到远程服务器,命令如下:

1	curl -X POST http://$MANAGEMENT_SERVICE_HOST:$MANAGEMENT_SERVICE_PORT/register -d 'instance=$(<POD_NAME>)&ip=$(<POD_IP>)'

在容器启动之前等待一定时间:例如sleep 60
克隆一个git仓库到存储目录
通过模板工具动态把一些值写入到主应用程序的配置文件里.

更多详细示例请查看pod应用环境布置指南

初始容器使用

apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
  labels:
    app: myapp
spec:
  containers:
  - name: myapp-container
    image: busybox
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']
  initContainers:
  - name: init-myservice
    image: busybox
    command: ['sh', '-c', 'until nslookup myservice; do echo waiting for myservice; sleep 2; done;']
  - name: init-mydb
    image: busybox
    command: ['sh', '-c', 'until nslookup mydb; do echo waiting for mydb; sleep 2; done;']

以上pod定义包含两个初始容器,第一个等待myservice服务可用,第二个等待mydb服务可用,这两个pod执行完成,应用容器开始执行.

下面是myservice和mydb两个服务的yaml文件

kind: Service
apiVersion: v1
metadata:
  name: myservice
spec:
  ports:
  - protocol: TCP
    port: 80
    targetPort: 9376
---
kind: Service
apiVersion: v1
metadata:
  name: mydb
spec:
  ports:
  - protocol: TCP
    port: 80
    targetPort: 9377

上面定义的pod可以通过以下使用初始化和调试

1 2	kubectl create -f myapp.yaml pod/myapp-pod created

kubectl get -f myapp.yaml

NAME        READY     STATUS     RESTARTS   AGE
myapp-pod   0/1       Init:0/2   0          6m

Name:          myapp-pod
Namespace:     default
[...]
Labels:        app=myapp
Status:        Pending
[...]
Init Containers:
  init-myservice:
[...]
    State:         Running
[...]
  init-mydb:
[...]
    State:         Waiting
      Reason:      PodInitializing
    Ready:         False
[...]
Containers:
  myapp-container:
[...]
    State:         Waiting
      Reason:      PodInitializing
    Ready:         False
[...]
Events:
  FirstSeen    LastSeen    Count    From                      SubObjectPath                           Type          Reason        Message
  ---------    --------    -----    ----                      -------------                           --------      ------        -------
  16s          16s         1        {default-scheduler }                                              Normal        Scheduled     Successfully assigned myapp-pod to 172.17.4.201
  16s          16s         1        {kubelet 172.17.4.201}    spec.initContainers{init-myservice}     Normal        Pulling       pulling image "busybox"
  13s          13s         1        {kubelet 172.17.4.201}    spec.initContainers{init-myservice}     Normal        Pulled        Successfully pulled image "busybox"
  13s          13s         1        {kubelet 172.17.4.201}    spec.initContainers{init-myservice}     Normal        Created       Created container with docker id 5ced34a04634; Security:[seccomp=unconfined]
  13s          13s         1        {kubelet 172.17.4.201}    spec.initContainers{init-myservice}     Normal        Started       Started container with docker id 5ced34a04634

1 2	kubectl logs myapp-pod -c init-myservice # Inspect the first init container kubectl logs myapp-pod -c init-mydb # Inspect the second init container

当我们启动mydb和myservice两个服务后,我们可以看到初始容器完成并且myapp-pod pod被创建.

kubectl create -f services.yaml

service/myservice created
service/mydb created

1
2
3

kubectl get -f myapp.yaml
NAME        READY     STATUS    RESTARTS   AGE
myapp-pod   1/1       Running   0          9m

这些示例非常简单但是应该能为你创建自己的初始容器提供一些灵感

行为细节

在启动pod的过程中,在存储卷和网络创建以后,初始容器依次创建.上一个容器必须返回成功下一个才能启动,如果由于运行时错误或者其它异常退出,它会依照restartPolicy来重试,然而,如果restartPolicy设置为Always,初始容器实际上使用的是OnFailure策略
如果pod重启了,则所有的初始容器要重新执行
对初始容器的spec的更改仅限于镜像(image)字段的修改,更改了初始容器的镜像字段相当于重启pod
由于初始容器可以被重启,重试和重新执行,因此它里面的代码应当是幂等的,尤其是写入文件到EmptyDirs的代码应当注意文件可能已经存在
容器中的所有初始容器和普通容器名称必须惟一.

资源

基于初始容器的执行顺序,以下关于资源的规则适用:

对于特定资源,所有初始容器申请的最高的生效
对于pod,相同资源申请取以下两者较高的一个:

1) 所有普通应用容器申请的资源总和
2) 初始容器申请的生效的资源(上面说到,初始容器申请资源取所有初始容器申请最大的一个)
调度基于生效的初始请求,这就意味着初始容器可以申请预留资源,即便在pod以后的整个生命周期都用不到

pod重启原因

一个pod基于以下列出的原因,会重启,重新执行初始容器:

用户更新初始容器的PodSpec导致镜像发生改变.普通应用容器改变只会使应用容器重启
由于restartPolicy被设置为Always,导致所有容器均被中止,强制重启,由于垃圾回收初始容器的初始状态记录丢失

定义pod postStart或preStop

[root@k8s-master01 manifests]# cat poststart-pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: poststart-pod
spec:
  containers:
  - name: buxybox-httpd
    image: busybox
    imagePullPolicy: IfNotPresent
    lifecycle:
      postStart:
        exec:
          command: ["mkdir", "-p"," /data/web/html"]
    command: ["/bin/sh","-c","sleep 3600"]

API

集群外部:

集群内部:

ConfigMap

创建ConfigMap

通过命令行参数--from-literal创建

指定文件创建

指定目录创建

通过事先写好configmap的标准yaml文件创建

使用ConfigMap

通过环境变量使用

在启动命令中引用

作为volume挂载使用

深度解析mountPath,subPath,key,path的关系和作用

mountPath结合subPath作用

有subPath但筛选结果为false,

无 subPath,path相当于重命名

有subPath且筛选结果为true,mouthPath指定文件名，可以和subPath不一样

configmap的热更新研究

ServiceAccount

创建Service Account

授权

示例

Service

概念

Service的实现模型

userspace代理模式

iptables代理模式

ipvs代理模式

Service的定义

Service字段含义

service的类型

ClusterIP的service类型演示

NodePort的service类型演示

Pod的会话保持

Headless Service

Ingress

相关组件关系

概述

部署

更改暴露方式

创建一个tomcat并用ingress7层代理转发

用ingress来代理4层请求

其它示例

Single Service Ingress

其于URL转发

基于名称的虚拟主机

TLS

StatefulSet

概述

示例

DNS

介绍

SERVICE

A records

SRV records

Pods

A records

Pod’s hostname and subdomain fields

Pod’s DNS Policy

Pod’s DNS Config

调度之节点亲和性

滚动升级

设置配额

配置Namespace资源限制

配置容器资源限制

测试内存限制

测试CPU限制

容器服务质量（QoS）

不同QoS对容器影响

ResourceQuota

LimitRange

总结

PV & PVC

emptyDir

hostPath

PV&PVC介绍

生命周期

Provisioning

PV类型

通过命令行参数`--from-literal`创建