Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update HTTP Semantic Conventions to Stable Versions #118

Merged
merged 43 commits into from
May 28, 2024

Conversation

evanlauer1
Copy link
Collaborator

@evanlauer1 evanlauer1 commented May 22, 2024

Context

The OTEL spec declared the new HTTP semantic conventions as stable, so this PR is updating several instrumentation libraries to follow the new convenions. The updated libraries are http-client, wai, postgresql-simple, persistent-mysql, and yesod. Programmers can opt into the new conventions using the environment variable OTEL_SEMCONV_STABILITY_OPT_IN. The stable conventions will replace the old conventions in the next major release of hs-opentelemetry.

Related: Issue #103

Testing

  • Created and ran test suite for semConvStabilityOptIn.hs to check if the program correctly reads the environment variable based on this specification.
  • Did not test the changed instrumentation libraries to see if they add the correct attributes. The instrumentation libraries do not have any tests at present.

evanlauer1 added 30 commits May 14, 2024 15:21
…e latest stable semantics if the environment variable OTEL_SEMCONV_STABILITY_OPT_IN is "http"
Copy link
Collaborator Author

@evanlauer1 evanlauer1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished self-review.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all auto-generated changes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this file should be in the api/src/OpenTelemetry directory or somewhere else. This seemed like the most appropriate place.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine. I'd probably prefer expanding the module name to OpenTelemetry.SemanticConventionsStabilityOptIn for clarity.

getSemConvStabilityOptIn :: IO SemConvStabilityOptIn
getSemConvStabilityOptIn = do
menv <- lookupEnv "OTEL_SEMCONV_STABILITY_OPT_IN"
let menvs = fmap T.strip . T.splitOn "," . T.pack <$> menv
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a better library to use than Data.Text for parsing. Also, T.strip strips whitespace from the environment variable and I'm not sure if that's necessary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me

api/src/OpenTelemetry/SemConvStabilityOptIn.hs Outdated Show resolved Hide resolved
hie.yaml Outdated
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-generated using the command gen-hie > hie.yaml. This allows HLS to work in VSCode for this repository.

H.union
[ ("db.connection_string", fromString $ showsPrecConnectInfoMasked 0 ci "")
[ ("db.connection_string" :: Text, fromString $ showsPrecConnectInfoMasked 0 ci "")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:: Text is used because of OverloadedStrings

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-generated

@@ -28,6 +28,7 @@ dependencies:
- bytestring
- network
- iproute
- unordered-containers
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for Data.HashMap.Strict

-- , ("http.response_content_length_uncompressed", _)
-- , ("net.transport")
-- , ("net.peer.name")
-- , ("net.peer.ip")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

net.peer.ip is deleted because it is superseded by server.address which can either contain the contents of net.peer.name or net.peer.ip.

Stable ->
usefulCallsite
`H.union` [
( "user_agent.original"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This attribute must be provided to the sampler and is therefore has to be added when the span is created and not after.

@evanlauer1 evanlauer1 marked this pull request as ready for review May 22, 2024 20:18
@evanlauer1 evanlauer1 requested a review from iand675 May 22, 2024 20:18
Copy link
Collaborator

@andrewsmith andrewsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work so far, Evan! This looks like excellent progress and is very thoughtful. I'll defer to Ian for approval, but this appears to be going in a good direction to me.

data SemConvStabilityOptIn = Stable | Both | Old deriving (Show, Eq)


getSemConvStabilityOptIn :: IO SemConvStabilityOptIn
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could consider splitting this into two functions: a pure function that does the parsing and this that does the lookupEnv. Something like:

getSemConvStabilityOptIn :: IO SemConvStabilityOptIn
getSemConvStabilityOptIn = parseSemConvStabilityOptIn <$> lookupEnv "OTEL_SEMCONV_STABILITY_OPT_IN"

parseSemConvStabilityOptIn :: Maybe Text -> SemConvStabilityOptIn
...

The pure function can make testing a little bit easier, avoiding the set and unset calls. You could make an argument for skipping tests on getSemConvStabilityOptIn claiming that its implementation is trivial.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split it into two functions, but I am thinking about keeping the tests the same. I like that the current tests make sure that the program is reading the env var correctly, and the fact that getSemConvStabilityOptIn is trivial means that it is still a good test for parseSemConvStabilityOptIn. It might be better to test the pure function though, not sure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your tests are pretty nice, compact and legible, but one thing you could do with the pure function is reduce the number of env var tests to two (set/unset), and replace the rest with tests of the pure function that could be one-liners like parseSemConvStabiltyOptIn (Just "http") shouldBe Stable. if the setup was more onerous than setEnv "blah" I might feel more strongly about suggesting that approach, but as it stands I think it's just an aesthetic choice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the tests to use a looping structure like Max suggested. Does that satisfy you're aesthetic preference?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's good too :)

getSemConvStabilityOptIn :: IO SemConvStabilityOptIn
getSemConvStabilityOptIn = do
menv <- lookupEnv "OTEL_SEMCONV_STABILITY_OPT_IN"
let menvs = fmap T.strip . T.splitOn "," . T.pack <$> menv
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me

api/src/OpenTelemetry/SemConvStabilityOptIn.hs Outdated Show resolved Hide resolved
[New HTTP semantic conventions have been declared stable.](https://opentelemetry.io/blog/2023/http-conventions-declared-stable/#migration-plan) Opt-in by setting the environment variable OTEL_SEMCONV_STABILITY_OPT_IN to
- "http" - to use the stable conventions
- "http/dup" - to emit both the old and the stable conventions
Otherwise, the old conventions will be used. The stable conventions will replace the old conventions in the next major release of this library.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent

t <- httpTracerProvider
inSpan'' t "withResponse" (addAttributesToSpanArguments callerAttributes spanArgs) $ \_wrSpan -> do
tracer <- httpTracerProvider
inSpan'' tracer "withResponse" (addAttributesToSpanArguments callerAttributes spanArgs) $ \_wrSpan -> do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the variable rename is reasonable. Haskellers can lean on the side of being terse and I think that partly stems from mathematics backgrounds, where single letter variable naming is common.

Comment on lines 122 to 125
case semConvStabilityOptIn of
Stable -> addStableAttributes
Both -> addStableAttributes >> addOldAttributes
Old -> addOldAttributes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the structure of this... very clean.

instrumentation/http-client/test/Spec.hs Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised to see changes here, particularly special-casing on the same OTEL_SEMCONV_STABILITY_OPT_IN values. On the face of it, those seem unrelated. However, upon looking at the database conventions, they reference this. It appears that the overall changes aren't HTTP-only. So it seems like you are doing the correct thing according to the spec.

Comment on lines 19 to 42
it "defaults to 'Old' when env var has incorrect value" $ do
setEnv envVarName "http/du" -- intentionally similar to "http/dup" and "http"
semConvStabilityOptIn <- getSemConvStabilityOptIn
semConvStabilityOptIn `shouldBe` Old
it "is 'Stable' when env var is 'http'" $ do
setEnv envVarName "http"
semConvStabilityOptIn <- getSemConvStabilityOptIn
semConvStabilityOptIn `shouldBe` Stable
it "is 'Both' when env var is 'http/dup'" $ do
setEnv envVarName "http/dup"
semConvStabilityOptIn <- getSemConvStabilityOptIn
semConvStabilityOptIn `shouldBe` Both
it "is 'Both' when env var is 'http/dup,http'" $ do
setEnv envVarName "http/dup,http"
semConvStabilityOptIn <- getSemConvStabilityOptIn
semConvStabilityOptIn `shouldBe` Both
it "is 'Both' when env var is 'http,http/dup'" $ do
setEnv envVarName "http,http/dup"
semConvStabilityOptIn <- getSemConvStabilityOptIn
semConvStabilityOptIn `shouldBe` Both
it "is 'Both' when env var is 'http,something-random,http/dup'" $ do
setEnv envVarName "http,http/dup"
semConvStabilityOptIn <- getSemConvStabilityOptIn
semConvStabilityOptIn `shouldBe` Both

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your way is preferable, just throwing out an option is to do these tests as a loop through [("testname", "envvarname", valueItShouldBe), ...]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I was on the edge but I think it increases readability to do it that way because there is less repeated information to look through.

Copy link
Owner

@iand675 iand675 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direction here is really great! Left some feedback on how I'd evolve the interface a bit to future-proof it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine. I'd probably prefer expanding the module name to OpenTelemetry.SemanticConventionsStabilityOptIn for clarity.

@@ -0,0 +1,26 @@
{-# LANGUAGE OverloadedStrings #-}

module OpenTelemetry.SemConvStabilityOptIn (
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to haddock all the things in this module.


module OpenTelemetry.SemConvStabilityOptIn (
getSemConvStabilityOptIn,
SemConvStabilityOptIn (Stable, Both, Old),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do SemConvStabilityOptIn(..) here to export all of the options.

import System.Environment (lookupEnv)


data SemConvStabilityOptIn = Stable | Both | Old deriving (Show, Eq)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the spec for this environment variable says that it could expand to encompass other things besides the current set of values, I think we need to tweak this a bit so that it's a bit more forwards compatible. If something like sql/dup shows up in the future, we'd need some way to recognize it and let users consume it.

data SemanticConventionStabilityOptIn 
  = HttpStableSemantics
  | HttpOldAndStableSemantics
  -- note that I'm deleting the current `Old` value here, see the next comment for what I'm thinking.
  | UnknownSemanticConvention Text 
  deriving (Show, Eq, Generic)

instance Hashable SemanticConventionStabilityOptIn

data SemConvStabilityOptIn = Stable | Both | Old deriving (Show, Eq)


parseSemConvStabilityOptIn :: Maybe String -> SemConvStabilityOptIn
Copy link
Owner

@iand675 iand675 May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we need to account for other semantic convention options in the future, I think it'd be nice to produce an opt-in collection data type where we aim to expose functionality via functions that we export.

newtype SemanticConventionStabilityOptIns = SemanticConventionStabilityOptIns -- hide the constructor export
  { optIns :: HashSet SemanticConventionStabilityOptIn
  }

semanticConventionOptionIsSet :: SemanticConventionStabilityOptIn -> SemanticConventionStabilityOptIns -> Bool
semanticConventionOptionIsSet opt (SemanticConventionStabilityOptIn opts) = HashSet.member opt opts

useNewHttpSemanticConventions :: SemanticConventionStabilityOptIns -> Bool
useNewHttpSemanticConventions opts = 
  semanticConventionOptionIsSet HttpStableSemantics opts || 
  semanticConventionOptionIsSetHashSet.member HttpOldAndStableSemantics optIns

useOldHttpSemanticConventions :: SemanticConventionStabilityOptIns -> Bool
useOldHttpSemanticConventions SemanticConventionStabilityOptIns{..} =
 semanticConventionOptionIsSet HttpOldAndStableSemantics optIns || 
 not (semanticConventionOptionIsSet HttpStableSemantics optIns)

parseSemanticConventionStabilityOptIns :: Maybe String -> SemanticConventionStabilityOptIns
parseSemanticConventionStabilityOptIns Nothing = SemanticConventionStabilityOptIns HashSet.empty
parseSemanticConventionStabilityOptIns (Just env) = SemanticConventionStabilityOptIns $ HashSet.fromList $ fmap parseItem envs
  where
    envs = fmap T.strip . T.splitOn "," . T.pack $ env
    parseItem "http/dup" = HttpOldAndStableSemantics
    parseItem "http" = HttpStableSemantics
    parseItem other = UnknownSemanticConvention

$ requestHeadersToRecord conf

semConvStabilityOptIn <- liftIO getSemConvStabilityOptIn
case semConvStabilityOptIn of
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use the set-membership approach I suggested, then here you'd do something like:

when (useStableHttpSemanticConventions semConvStabilityOptIn) addStableAttributes
when (useOldHttpSemanticConventions semConvStabilityOptIn) addOldAttributes

(\h -> (\v -> ("http.request.header." <> T.decodeUtf8 (foldedCase h), toAttribute (T.decodeUtf8 v))) <$> lookup h (requestHeaders req))
$ requestHeadersToRecord conf

semConvStabilityOptIn <- liftIO getSemConvStabilityOptIn
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate trying to avoid breaking the interface. I don't think it's great to have to read the environment variable and parse the functionality on each web request though, so I think we need to memoize the result if we're not going to pass in the opt-ins directly.

I'd recommend adding something like this to the Semantic Convention module:

import Control.Exception
import Data.IORef
import System.IO.Unsafe

-- | Create a new memoized IO action using an 'IORef' under the surface. Note that
-- the action may be run in multiple threads simultaneously, so this may not be
-- thread safe (depending on the underlying action). For the sake of reading an environment
-- variable and parsing some stuff, we don't have to be concerned about thread-safety.
memoize :: IO a -> IO (IO a)
memoize action = do
  ref <- newIORef Nothing
  pure $ do
    mres <- readIORef ref
    res <- case mres of
      Just res -> pure res
      Nothing -> do
        res <- tryAny action
        writeIORef ref $ Just res
        pure res
    either throwIO pure res

-- This uses the global IORef trick: 
-- https://www.parsonsmatt.org/2021/04/21/global_ioref_in_template_haskell.html
memoizedGetSemanticConventionOptIns :: IO SemanticConventionStabilityOptIns
memoizedGetSemanticConventionOptIns = unsafePerformIO $ memoize getSemConvStabilityOptIn
{-# NOINLINE memoizedGetSemanticConventionOptIns #-}

Then in places where you're calling getSemConvStabilityOptIn currently, use memoizedGetSemanticConventionOptIns instead.

let attrs' =
case semConvStabilityOptIn of
Stable -> addStableAttributes attrs
Both -> addStableAttributes . addOldAttributes $ attrs
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a (very) minor preference towards using $ unless I'm doing pointfree style. It makes it a little bit easier to move things around in my experience.

That said, you could just write addStableAttributes $ addOldAttributes attrs and do away with the dot entirely.

@evanlauer1 evanlauer1 requested a review from iand675 May 24, 2024 01:03
Copy link
Collaborator Author

@evanlauer1 evanlauer1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished re-review

| StableAndOld
| Old
deriving (Show, Eq)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this instead of set membership because I did not like that if the pattern was to use when (useStableHttpSemantics options) useStableAttributes it would be easy to forget to check both useStable and useOld because the compiler does not enforce it. This way, the compiler will say "missing cases in pattern match".

parseHttpOption envs
| "http/dup" `elem` envs = StableAndOld
| "http" `elem` envs = Stable
| otherwise = httpOption defaultOptions
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using httpOption defaultOptions means that the value if the environment variable is unset will always be the same as if it is set but missing configuration for http.

parseSemanticsOptions (Just env) = SemanticsOptions {..}
where
envs = fmap T.strip $ T.splitOn "," $ T.pack env
httpOption = parseHttpOption envs
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler will give a warning if one of the fields is unset.

unless it is necessary to retrieve the value of OTEL_SEMCONV_STABILITY_OPT_IN every time getSemanticsOptions' is called.
-}
getSemanticsOptions' :: IO SemanticsOptions
getSemanticsOptions' = parseSemanticsOptions <$> lookupEnv "OTEL_SEMCONV_STABILITY_OPT_IN"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used function' to indicate a different version of function. I'm not sure if that is the best way to indicate the difference. I want function to be the default and function' to only be used in necessary circumstances.

describe "HttpOption" $ do
it "defaults to 'Old' when env var has no value" $ do
unsetEnv envVarName
semanticsOptions <- getSemanticsOptions'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uses getSemanticsOptions' so that the tests are not affected by memoization. Memoization is tested later.

semanticsOptions <- liftIO getSemanticsOptions
case httpOption semanticsOptions of
Stable -> addStableAttributes
StableAndOld -> addStableAttributes >> addOldAttributes
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StableAndOld seemed like a more specific name than Both.

@iand675 iand675 merged commit 07f3ed5 into main May 28, 2024
6 of 8 checks passed
@evanlauer1 evanlauer1 deleted the update-http-semantics branch May 28, 2024 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants