faster shiftNRuneBytes by zhangyunhao116 · Pull Request #305 · julienschmidt/httprouter

zhangyunhao116 · 2020-03-09T16:30:28Z

The new implementation is 1.1x faster than the previous version. Since the shiftNRuneBytes is a Hot Path in lookup function, a small improvement would be meaningful.

A simple benchmark for this change:

func BenchmarkShiftNRuneBytes(b *testing.B) {
	var x [4]byte
	x[0], x[1], x[2], x[3] = 1, 2, 3, 4
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		res = shiftNRuneBytes(x, b.N%4)
	}
}

benchstat:

name                  time/op
ShiftNRuneBytes-8     1.35ns ± 3%
ShiftNRuneBytesNew-8  1.23ns ± 2%

Furthermore, the new implementation instructions size reduce to 88(the previous implementation is 143).
Go assembly output:

"".shiftNRuneBytes STEXT nosplit size=143 args=0x18 locals=0x0
	0x0000 00000 (main.go:13)	TEXT	"".shiftNRuneBytes(SB), NOSPLIT|ABIInternal, $0-24
	0x0000 00000 (main.go:13)	MOVL	$0, "".~r2+24(SP)
	0x0008 00008 (main.go:14)	MOVQ	"".n+16(SP), AX
	0x000d 00013 (main.go:14)	CMPQ	AX, $1
	0x0011 00017 (main.go:17)	JGT	86
	0x0013 00019 (main.go:15)	TESTQ	AX, AX
	0x0016 00022 (main.go:15)	JEQ	77
	0x0018 00024 (main.go:14)	CMPQ	AX, $1
	0x001c 00028 (main.go:17)	JEQ	41
	0x001e 00030 (main.go:24)	MOVL	""..stmp_0(SB), AX
	0x0024 00036 (main.go:24)	MOVL	AX, "".~r2+24(SP)
	0x0028 00040 (main.go:24)	RET
	0x0029 00041 (main.go:18)	MOVBLZX	"".rb+9(SP), AX
	0x002e 00046 (main.go:18)	MOVBLZX	"".rb+10(SP), CX
	0x0033 00051 (main.go:18)	MOVBLZX	"".rb+11(SP), DX
	0x0038 00056 (main.go:18)	MOVL	$0, "".~r2+24(SP)
	0x0040 00064 (main.go:18)	MOVB	AL, "".~r2+24(SP)
	0x0044 00068 (main.go:18)	MOVB	CL, "".~r2+25(SP)
	0x0048 00072 (main.go:18)	MOVB	DL, "".~r2+26(SP)
	0x004c 00076 (main.go:18)	RET
	0x004d 00077 (main.go:16)	MOVL	"".rb+8(SP), AX
	0x0051 00081 (main.go:16)	MOVL	AX, "".~r2+24(SP)
	0x0055 00085 (main.go:16)	RET
	0x0056 00086 (main.go:19)	CMPQ	AX, $2
	0x005a 00090 (main.go:19)	JEQ	116
	0x005c 00092 (main.go:21)	CMPQ	AX, $3
	0x0060 00096 (main.go:21)	JNE	30
	0x0062 00098 (main.go:22)	MOVBLZX	"".rb+11(SP), AX
	0x0067 00103 (main.go:22)	MOVL	$0, "".~r2+24(SP)
	0x006f 00111 (main.go:22)	MOVB	AL, "".~r2+24(SP)
	0x0073 00115 (main.go:22)	RET
	0x0074 00116 (main.go:20)	MOVBLZX	"".rb+10(SP), AX
	0x0079 00121 (main.go:20)	MOVBLZX	"".rb+11(SP), CX
	0x007e 00126 (main.go:20)	MOVL	$0, "".~r2+24(SP)
	0x0086 00134 (main.go:20)	MOVB	AL, "".~r2+24(SP)
	0x008a 00138 (main.go:20)	MOVB	CL, "".~r2+25(SP)
	0x008e 00142 (main.go:20)	RET
"".shiftNRuneBytesNew STEXT nosplit size=88 args=0x18 locals=0x0
	0x0000 00000 (main.go:28)	TEXT	"".shiftNRuneBytesNew(SB), NOSPLIT|ABIInternal, $0-24
	0x0000 00000 (main.go:28)	MOVL	$0, "".res+24(SP)
	0x0008 00008 (main.go:29)	MOVQ	"".n+16(SP), AX
	0x000d 00013 (main.go:29)	TESTQ	AX, AX
	0x0010 00016 (main.go:29)	JEQ	79
	0x0012 00018 (main.go:31)	CMPQ	AX, $1
	0x0016 00022 (main.go:31)	JNE	44
	0x0018 00024 (main.go:32)	MOVWLZX	"".rb+9(SP), AX
	0x001d 00029 (main.go:32)	MOVBLZX	"".rb+11(SP), CX
	0x0022 00034 (main.go:32)	MOVW	AX, "".res+24(SP)
	0x0027 00039 (main.go:32)	MOVB	CL, "".res+26(SP)
	0x002b 00043 (main.go:38)	RET
	0x002c 00044 (main.go:33)	CMPQ	AX, $2
	0x0030 00048 (main.go:33)	JNE	62
	0x0032 00050 (main.go:34)	MOVWLZX	"".rb+10(SP), AX
	0x0037 00055 (main.go:34)	MOVW	AX, "".res+24(SP)
	0x003c 00060 (main.go:34)	JMP	43
	0x003e 00062 (main.go:35)	CMPQ	AX, $3
	0x0042 00066 (main.go:35)	JNE	43
	0x0044 00068 (main.go:36)	MOVBLZX	"".rb+11(SP), AX
	0x0049 00073 (main.go:36)	MOVB	AL, "".res+24(SP)
	0x004d 00077 (main.go:36)	JMP	43
	0x004f 00079 (main.go:30)	MOVL	"".rb+8(SP), AX
	0x0053 00083 (main.go:30)	MOVL	AX, "".res+24(SP)
	0x0057 00087 (main.go:30)	RET

julienschmidt

Does the switch have an impact in the (reliably) measurable range? I'd personally prefer it for its slightly better readability.
We're speaking about sub-ns differences anyway.

zhangyunhao116 · 2020-03-10T16:10:32Z

I also agree that the previous version is more readable. And I found that the previous benchmark was not reliable, since it was tested on my laptop.

A more precise benchmark for this change, the new implementation is 1.26x faster in linux environment. (CPU: 9700k goarch: amd64 goos: linux)
go test -bench=. -benchtime=100000000x -count=50
benchstat:

name                  time/op
ShiftNRuneBytes-8     1.24ns ± 0%
ShiftNRuneBytesNew-8  0.98ns ± 1%

Maybe it is worth to do that? it's your call :)

julienschmidt · 2020-03-10T23:42:08Z

Your change consists of two changes: using if/else instead of the switch and using assignments instead of assembling a new array. How does the following perform?

switch n {
case 0:
	return rb
case 1:
	res[0], res[1], res[2] = rb[1], rb[2], rb[3]
case 2:
	res[0], res[1] = rb[2], rb[3]
case 3:
	res[0] = rb[3]
}
return

zhangyunhao116 · 2020-03-11T04:51:50Z

According to the benchmark, looks like the performance improvement comes from both change the switch and using assignment. I guess there is a compiler optimization in this situation.

func shiftNRuneBytesNew2(rb [4]byte, n int) (res [4]byte) {
	switch n {
	case 0:
		return rb
	case 1:
		res[0], res[1], res[2] = rb[1], rb[2], rb[3]
	case 2:
		res[0], res[1] = rb[2], rb[3]
	case 3:
		res[0] = rb[3]
	}
	return
}

func shiftNRuneBytesNew3(rb [4]byte, n int) [4]byte {
	if n == 0 {
		return rb
	} else if n == 1 {
		return [4]byte{rb[1], rb[2], rb[3], 0}
	} else if n == 2 {
		return [4]byte{rb[2], rb[3]}
	} else if n == 3 {
		return [4]byte{rb[3]}
	}
	return [4]byte{}
}

benchstat

name                   time/op
ShiftNRuneBytes-8      1.24ns ± 1%
ShiftNRuneBytesNew-8   0.98ns ± 1%
ShiftNRuneBytesNew2-8  1.24ns ± 1%
ShiftNRuneBytesNew3-8  1.24ns ± 0%

faster shiftNRuneBytes

e8198ec

julienschmidt reviewed Mar 10, 2020

View reviewed changes

julienschmidt added this to the v1.4 milestone Jul 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faster shiftNRuneBytes#305

faster shiftNRuneBytes#305
zhangyunhao116 wants to merge 1 commit intojulienschmidt:masterfrom
zhangyunhao116:master

zhangyunhao116 commented Mar 9, 2020 •

edited

Loading

Uh oh!

julienschmidt left a comment

Uh oh!

zhangyunhao116 commented Mar 10, 2020 •

edited

Loading

Uh oh!

julienschmidt commented Mar 10, 2020

Uh oh!

zhangyunhao116 commented Mar 11, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhangyunhao116 commented Mar 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

julienschmidt left a comment

Choose a reason for hiding this comment

Uh oh!

zhangyunhao116 commented Mar 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

julienschmidt commented Mar 10, 2020

Uh oh!

zhangyunhao116 commented Mar 11, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhangyunhao116 commented Mar 9, 2020 •

edited

Loading

zhangyunhao116 commented Mar 10, 2020 •

edited

Loading