sslobodr | d046be8 | 2019-01-16 10:02:22 -0500 | [diff] [blame] | 1 | // Copyright (c) 2012-2018 Ugorji Nwoke. All rights reserved. |
| 2 | // Use of this source code is governed by a MIT license found in the LICENSE file. |
| 3 | |
| 4 | /* |
| 5 | Package codec provides a |
| 6 | High Performance, Feature-Rich Idiomatic Go 1.4+ codec/encoding library |
| 7 | for binc, msgpack, cbor, json. |
| 8 | |
| 9 | Supported Serialization formats are: |
| 10 | |
| 11 | - msgpack: https://github.com/msgpack/msgpack |
| 12 | - binc: http://github.com/ugorji/binc |
| 13 | - cbor: http://cbor.io http://tools.ietf.org/html/rfc7049 |
| 14 | - json: http://json.org http://tools.ietf.org/html/rfc7159 |
| 15 | - simple: |
| 16 | |
| 17 | To install: |
| 18 | |
| 19 | go get github.com/ugorji/go/codec |
| 20 | |
| 21 | This package will carefully use 'unsafe' for performance reasons in specific places. |
| 22 | You can build without unsafe use by passing the safe or appengine tag |
| 23 | i.e. 'go install -tags=safe ...'. Note that unsafe is only supported for the last 3 |
| 24 | go sdk versions e.g. current go release is go 1.9, so we support unsafe use only from |
| 25 | go 1.7+ . This is because supporting unsafe requires knowledge of implementation details. |
| 26 | |
| 27 | For detailed usage information, read the primer at http://ugorji.net/blog/go-codec-primer . |
| 28 | |
| 29 | The idiomatic Go support is as seen in other encoding packages in |
| 30 | the standard library (ie json, xml, gob, etc). |
| 31 | |
| 32 | Rich Feature Set includes: |
| 33 | |
| 34 | - Simple but extremely powerful and feature-rich API |
| 35 | - Support for go1.4 and above, while selectively using newer APIs for later releases |
| 36 | - Excellent code coverage ( > 90% ) |
| 37 | - Very High Performance. |
| 38 | Our extensive benchmarks show us outperforming Gob, Json, Bson, etc by 2-4X. |
| 39 | - Careful selected use of 'unsafe' for targeted performance gains. |
| 40 | 100% mode exists where 'unsafe' is not used at all. |
| 41 | - Lock-free (sans mutex) concurrency for scaling to 100's of cores |
| 42 | - Coerce types where appropriate |
| 43 | e.g. decode an int in the stream into a float, decode numbers from formatted strings, etc |
| 44 | - Corner Cases: |
| 45 | Overflows, nil maps/slices, nil values in streams are handled correctly |
| 46 | - Standard field renaming via tags |
| 47 | - Support for omitting empty fields during an encoding |
| 48 | - Encoding from any value and decoding into pointer to any value |
| 49 | (struct, slice, map, primitives, pointers, interface{}, etc) |
| 50 | - Extensions to support efficient encoding/decoding of any named types |
| 51 | - Support encoding.(Binary|Text)(M|Unm)arshaler interfaces |
| 52 | - Support IsZero() bool to determine if a value is a zero value. |
| 53 | Analogous to time.Time.IsZero() bool. |
| 54 | - Decoding without a schema (into a interface{}). |
| 55 | Includes Options to configure what specific map or slice type to use |
| 56 | when decoding an encoded list or map into a nil interface{} |
| 57 | - Mapping a non-interface type to an interface, so we can decode appropriately |
| 58 | into any interface type with a correctly configured non-interface value. |
| 59 | - Encode a struct as an array, and decode struct from an array in the data stream |
| 60 | - Option to encode struct keys as numbers (instead of strings) |
| 61 | (to support structured streams with fields encoded as numeric codes) |
| 62 | - Comprehensive support for anonymous fields |
| 63 | - Fast (no-reflection) encoding/decoding of common maps and slices |
| 64 | - Code-generation for faster performance. |
| 65 | - Support binary (e.g. messagepack, cbor) and text (e.g. json) formats |
| 66 | - Support indefinite-length formats to enable true streaming |
| 67 | (for formats which support it e.g. json, cbor) |
| 68 | - Support canonical encoding, where a value is ALWAYS encoded as same sequence of bytes. |
| 69 | This mostly applies to maps, where iteration order is non-deterministic. |
| 70 | - NIL in data stream decoded as zero value |
| 71 | - Never silently skip data when decoding. |
| 72 | User decides whether to return an error or silently skip data when keys or indexes |
| 73 | in the data stream do not map to fields in the struct. |
| 74 | - Detect and error when encoding a cyclic reference (instead of stack overflow shutdown) |
| 75 | - Encode/Decode from/to chan types (for iterative streaming support) |
| 76 | - Drop-in replacement for encoding/json. `json:` key in struct tag supported. |
| 77 | - Provides a RPC Server and Client Codec for net/rpc communication protocol. |
| 78 | - Handle unique idiosyncrasies of codecs e.g. |
| 79 | - For messagepack, configure how ambiguities in handling raw bytes are resolved |
| 80 | - For messagepack, provide rpc server/client codec to support |
| 81 | msgpack-rpc protocol defined at: |
| 82 | https://github.com/msgpack-rpc/msgpack-rpc/blob/master/spec.md |
| 83 | |
| 84 | Extension Support |
| 85 | |
| 86 | Users can register a function to handle the encoding or decoding of |
| 87 | their custom types. |
| 88 | |
| 89 | There are no restrictions on what the custom type can be. Some examples: |
| 90 | |
| 91 | type BisSet []int |
| 92 | type BitSet64 uint64 |
| 93 | type UUID string |
| 94 | type MyStructWithUnexportedFields struct { a int; b bool; c []int; } |
| 95 | type GifImage struct { ... } |
| 96 | |
| 97 | As an illustration, MyStructWithUnexportedFields would normally be |
| 98 | encoded as an empty map because it has no exported fields, while UUID |
| 99 | would be encoded as a string. However, with extension support, you can |
| 100 | encode any of these however you like. |
| 101 | |
| 102 | Custom Encoding and Decoding |
| 103 | |
| 104 | This package maintains symmetry in the encoding and decoding halfs. |
| 105 | We determine how to encode or decode by walking this decision tree |
| 106 | |
| 107 | - is type a codec.Selfer? |
| 108 | - is there an extension registered for the type? |
| 109 | - is format binary, and is type a encoding.BinaryMarshaler and BinaryUnmarshaler? |
| 110 | - is format specifically json, and is type a encoding/json.Marshaler and Unmarshaler? |
| 111 | - is format text-based, and type an encoding.TextMarshaler? |
| 112 | - else we use a pair of functions based on the "kind" of the type e.g. map, slice, int64, etc |
| 113 | |
| 114 | This symmetry is important to reduce chances of issues happening because the |
| 115 | encoding and decoding sides are out of sync e.g. decoded via very specific |
| 116 | encoding.TextUnmarshaler but encoded via kind-specific generalized mode. |
| 117 | |
| 118 | Consequently, if a type only defines one-half of the symmetry |
| 119 | (e.g. it implements UnmarshalJSON() but not MarshalJSON() ), |
| 120 | then that type doesn't satisfy the check and we will continue walking down the |
| 121 | decision tree. |
| 122 | |
| 123 | RPC |
| 124 | |
| 125 | RPC Client and Server Codecs are implemented, so the codecs can be used |
| 126 | with the standard net/rpc package. |
| 127 | |
| 128 | Usage |
| 129 | |
| 130 | The Handle is SAFE for concurrent READ, but NOT SAFE for concurrent modification. |
| 131 | |
| 132 | The Encoder and Decoder are NOT safe for concurrent use. |
| 133 | |
| 134 | Consequently, the usage model is basically: |
| 135 | |
| 136 | - Create and initialize the Handle before any use. |
| 137 | Once created, DO NOT modify it. |
| 138 | - Multiple Encoders or Decoders can now use the Handle concurrently. |
| 139 | They only read information off the Handle (never write). |
| 140 | - However, each Encoder or Decoder MUST not be used concurrently |
| 141 | - To re-use an Encoder/Decoder, call Reset(...) on it first. |
| 142 | This allows you use state maintained on the Encoder/Decoder. |
| 143 | |
| 144 | Sample usage model: |
| 145 | |
| 146 | // create and configure Handle |
| 147 | var ( |
| 148 | bh codec.BincHandle |
| 149 | mh codec.MsgpackHandle |
| 150 | ch codec.CborHandle |
| 151 | ) |
| 152 | |
| 153 | mh.MapType = reflect.TypeOf(map[string]interface{}(nil)) |
| 154 | |
| 155 | // configure extensions |
| 156 | // e.g. for msgpack, define functions and enable Time support for tag 1 |
| 157 | // mh.SetExt(reflect.TypeOf(time.Time{}), 1, myExt) |
| 158 | |
| 159 | // create and use decoder/encoder |
| 160 | var ( |
| 161 | r io.Reader |
| 162 | w io.Writer |
| 163 | b []byte |
| 164 | h = &bh // or mh to use msgpack |
| 165 | ) |
| 166 | |
| 167 | dec = codec.NewDecoder(r, h) |
| 168 | dec = codec.NewDecoderBytes(b, h) |
| 169 | err = dec.Decode(&v) |
| 170 | |
| 171 | enc = codec.NewEncoder(w, h) |
| 172 | enc = codec.NewEncoderBytes(&b, h) |
| 173 | err = enc.Encode(v) |
| 174 | |
| 175 | //RPC Server |
| 176 | go func() { |
| 177 | for { |
| 178 | conn, err := listener.Accept() |
| 179 | rpcCodec := codec.GoRpc.ServerCodec(conn, h) |
| 180 | //OR rpcCodec := codec.MsgpackSpecRpc.ServerCodec(conn, h) |
| 181 | rpc.ServeCodec(rpcCodec) |
| 182 | } |
| 183 | }() |
| 184 | |
| 185 | //RPC Communication (client side) |
| 186 | conn, err = net.Dial("tcp", "localhost:5555") |
| 187 | rpcCodec := codec.GoRpc.ClientCodec(conn, h) |
| 188 | //OR rpcCodec := codec.MsgpackSpecRpc.ClientCodec(conn, h) |
| 189 | client := rpc.NewClientWithCodec(rpcCodec) |
| 190 | |
| 191 | Running Tests |
| 192 | |
| 193 | To run tests, use the following: |
| 194 | |
| 195 | go test |
| 196 | |
| 197 | To run the full suite of tests, use the following: |
| 198 | |
| 199 | go test -tags alltests -run Suite |
| 200 | |
| 201 | You can run the tag 'safe' to run tests or build in safe mode. e.g. |
| 202 | |
| 203 | go test -tags safe -run Json |
| 204 | go test -tags "alltests safe" -run Suite |
| 205 | |
| 206 | Running Benchmarks |
| 207 | |
| 208 | Please see http://github.com/ugorji/go-codec-bench . |
| 209 | |
| 210 | Caveats |
| 211 | |
| 212 | Struct fields matching the following are ignored during encoding and decoding |
| 213 | - struct tag value set to - |
| 214 | - func, complex numbers, unsafe pointers |
| 215 | - unexported and not embedded |
| 216 | - unexported and embedded and not struct kind |
| 217 | - unexported and embedded pointers (from go1.10) |
| 218 | |
| 219 | Every other field in a struct will be encoded/decoded. |
| 220 | |
| 221 | Embedded fields are encoded as if they exist in the top-level struct, |
| 222 | with some caveats. See Encode documentation. |
| 223 | |
| 224 | */ |
| 225 | package codec |
| 226 | |
| 227 | // TODO: |
| 228 | // - For Go 1.11, when mid-stack inlining is enabled, |
| 229 | // we should use committed functions for writeXXX and readXXX calls. |
| 230 | // This involves uncommenting the methods for decReaderSwitch and encWriterSwitch |
| 231 | // and using those (decReaderSwitch and encWriterSwitch) in all handles |
| 232 | // instead of encWriter and decReader. |
| 233 | // The benefit is that, for the (En|De)coder over []byte, the encWriter/decReader |
| 234 | // will be inlined, giving a performance bump for that typical case. |
| 235 | // However, it will only be inlined if mid-stack inlining is enabled, |
| 236 | // as we call panic to raise errors, and panic currently prevents inlining. |
| 237 | // |
| 238 | // PUNTED: |
| 239 | // - To make Handle comparable, make extHandle in BasicHandle a non-embedded pointer, |
| 240 | // and use overlay methods on *BasicHandle to call through to extHandle after initializing |
| 241 | // the "xh *extHandle" to point to a real slice. |
| 242 | // |
| 243 | // BEFORE EACH RELEASE: |
| 244 | // - Look through and fix padding for each type, to eliminate false sharing |
| 245 | // - critical shared objects that are read many times |
| 246 | // TypeInfos |
| 247 | // - pooled objects: |
| 248 | // decNaked, decNakedContainers, codecFner, typeInfoLoadArray, |
| 249 | // - small objects allocated independently, that we read/use much across threads: |
| 250 | // codecFn, typeInfo |
| 251 | // - Objects allocated independently and used a lot |
| 252 | // Decoder, Encoder, |
| 253 | // xxxHandle, xxxEncDriver, xxxDecDriver (xxx = json, msgpack, cbor, binc, simple) |
| 254 | // - In all above, arrange values modified together to be close to each other. |
| 255 | // |
| 256 | // For all of these, either ensure that they occupy full cache lines, |
| 257 | // or ensure that the things just past the cache line boundary are hardly read/written |
| 258 | // e.g. JsonHandle.RawBytesExt - which is copied into json(En|De)cDriver at init |
| 259 | // |
| 260 | // Occupying full cache lines means they occupy 8*N words (where N is an integer). |
| 261 | // Check this out by running: ./run.sh -z |
| 262 | // - look at those tagged ****, meaning they are not occupying full cache lines |
| 263 | // - look at those tagged <<<<, meaning they are larger than 32 words (something to watch) |
| 264 | // - Run "golint -min_confidence 0.81" |