Overview
Once the response is received, the newUTF8WithFallbackReader determines the content-type (media-type and charset). In case, media-type is text content, the body's character encoding is converted to utf-8, otherwise no-op.
Currently, the approach to determine textual media-type is inspired by go-colly's implementation, which basically blacklists limited set of non-textual media-types.
|
isTextualContent := func(mimeType string) bool { |
|
switch { |
|
case strings.HasPrefix(mimeType, "image/"), |
|
strings.HasPrefix(mimeType, "video/"), |
|
strings.HasPrefix(mimeType, "audio/"), |
|
strings.HasPrefix(mimeType, "font/"): |
|
return false |
|
default: |
|
return true |
|
} |
|
} |
Scope
Instead of filtering out "what the media-type is not", determine if it belongs to a known textual type such as text/*, application/json, *+xml, etc, so that only textual content is processed in the flow, while reducing the error surface.
|
if !isTextualContent(mimeType) { |
|
return nil, nil |
|
} |
Overview
Once the response is received, the
newUTF8WithFallbackReaderdetermines thecontent-type(media-typeandcharset). In case,media-typeis text content, the body's character encoding is converted to utf-8, otherwise no-op.Currently, the approach to determine textual media-type is inspired by go-colly's implementation, which basically blacklists limited set of non-textual media-types.
synapse/fetcher/http/charset.go
Lines 76 to 86 in 9d92807
Scope
Instead of filtering out "what the
media-typeis not", determine if it belongs to a known textual type such astext/*,application/json,*+xml, etc, so that only textual content is processed in the flow, while reducing the error surface.synapse/fetcher/http/charset.go
Lines 101 to 103 in 9d92807