khenaidoo | ab1f7bd | 2019-11-14 14:00:27 -0500 | [diff] [blame] | 1 | // Copyright 2015 The etcd Authors |
| 2 | // |
| 3 | // Licensed under the Apache License, Version 2.0 (the "License"); |
| 4 | // you may not use this file except in compliance with the License. |
| 5 | // You may obtain a copy of the License at |
| 6 | // |
| 7 | // http://www.apache.org/licenses/LICENSE-2.0 |
| 8 | // |
| 9 | // Unless required by applicable law or agreed to in writing, software |
| 10 | // distributed under the License is distributed on an "AS IS" BASIS, |
| 11 | // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 12 | // See the License for the specific language governing permissions and |
| 13 | // limitations under the License. |
| 14 | |
| 15 | /* |
| 16 | Package wal provides an implementation of a write ahead log that is used by |
| 17 | etcd. |
| 18 | |
| 19 | A WAL is created at a particular directory and is made up of a number of |
| 20 | segmented WAL files. Inside of each file the raft state and entries are appended |
| 21 | to it with the Save method: |
| 22 | |
| 23 | metadata := []byte{} |
| 24 | w, err := wal.Create(zap.NewExample(), "/var/lib/etcd", metadata) |
| 25 | ... |
| 26 | err := w.Save(s, ents) |
| 27 | |
| 28 | After saving a raft snapshot to disk, SaveSnapshot method should be called to |
| 29 | record it. So WAL can match with the saved snapshot when restarting. |
| 30 | |
| 31 | err := w.SaveSnapshot(walpb.Snapshot{Index: 10, Term: 2}) |
| 32 | |
| 33 | When a user has finished using a WAL it must be closed: |
| 34 | |
| 35 | w.Close() |
| 36 | |
| 37 | Each WAL file is a stream of WAL records. A WAL record is a length field and a wal record |
| 38 | protobuf. The record protobuf contains a CRC, a type, and a data payload. The length field is a |
| 39 | 64-bit packed structure holding the length of the remaining logical record data in its lower |
| 40 | 56 bits and its physical padding in the first three bits of the most significant byte. Each |
| 41 | record is 8-byte aligned so that the length field is never torn. The CRC contains the CRC32 |
| 42 | value of all record protobufs preceding the current record. |
| 43 | |
| 44 | WAL files are placed inside of the directory in the following format: |
| 45 | $seq-$index.wal |
| 46 | |
| 47 | The first WAL file to be created will be 0000000000000000-0000000000000000.wal |
| 48 | indicating an initial sequence of 0 and an initial raft index of 0. The first |
| 49 | entry written to WAL MUST have raft index 0. |
| 50 | |
| 51 | WAL will cut its current tail wal file if its size exceeds 64MB. This will increment an internal |
| 52 | sequence number and cause a new file to be created. If the last raft index saved |
| 53 | was 0x20 and this is the first time cut has been called on this WAL then the sequence will |
| 54 | increment from 0x0 to 0x1. The new file will be: 0000000000000001-0000000000000021.wal. |
| 55 | If a second cut issues 0x10 entries with incremental index later then the file will be called: |
| 56 | 0000000000000002-0000000000000031.wal. |
| 57 | |
| 58 | At a later time a WAL can be opened at a particular snapshot. If there is no |
| 59 | snapshot, an empty snapshot should be passed in. |
| 60 | |
| 61 | w, err := wal.Open("/var/lib/etcd", walpb.Snapshot{Index: 10, Term: 2}) |
| 62 | ... |
| 63 | |
| 64 | The snapshot must have been written to the WAL. |
| 65 | |
| 66 | Additional items cannot be Saved to this WAL until all of the items from the given |
| 67 | snapshot to the end of the WAL are read first: |
| 68 | |
| 69 | metadata, state, ents, err := w.ReadAll() |
| 70 | |
| 71 | This will give you the metadata, the last raft.State and the slice of |
| 72 | raft.Entry items in the log. |
| 73 | |
| 74 | */ |
| 75 | package wal |