Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logging buffering #5635

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

evgenyfedorov2
Copy link
Contributor

@evgenyfedorov2 evgenyfedorov2 commented Nov 13, 2024

Related to the #5123 proposal, this PR is focused on the logging buffering only. See also the sampling part #5574.

Microsoft Reviewers: Open in CodeFlow

/// <summary>
/// Gets or sets the size of the buffer for a request.
/// </summary>
public int PerRequestCapacity { get; set; } = 1_000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this measured in log messages? I think we should also consider if we need guards on the size of an individual message or size of all messages combined. For example if someone logged 500 messages each of size 10MB I expect many services would fail with an OOM. I don't expect anyone would intentionally log such a large message but it might happen because of an unintended bug or a malicious actor has discovered a way to get large content into a log message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is in log messages. I don't know yet how to make performant memory size limit for that, but I will keep this thread open to get back to it later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a total size of the underlying buffer? "Don't buffer more than N bytes"?

@@ -714,6 +718,48 @@ await RunAsync(
});
}

#if NET9_0_OR_GREATER
[Fact]
public async Task HttpRequestBuffering()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't have to be this test, but it would be nice to have some test or sample code that uses the feature the way we expect an end-user to use it. For example I would not expect end users to new up the Options type, create their own instances of HttpRequestBuffer or manually add objects to the DI container.

Copy link
Contributor Author

@evgenyfedorov2 evgenyfedorov2 Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this test, users (as of today) are supposed to inject IHttpRequestBufferManager into their middleware and call Flush() and/or FlushCurrentRequestLogs() on it

@evgenyfedorov2 evgenyfedorov2 force-pushed the evgenyfedorov2/log_buffering branch 2 times, most recently from e40b7b6 to 9d61d12 Compare December 12, 2024 16:30
@evgenyfedorov2 evgenyfedorov2 force-pushed the evgenyfedorov2/log_buffering branch from 9d61d12 to 2f1a335 Compare December 12, 2024 17:36
/// Interface for a HTTP request buffer manager.
/// </summary>
[Experimental(diagnosticId: DiagnosticIds.Experiments.Telemetry, UrlFormat = DiagnosticIds.UrlFormat)]
public interface IHttpRequestBufferManager : IBufferManager
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this type be internal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users will use buffering either via this interface (for HTTP or non-HTTP request logs or for both) or via IBufferManager(for non-HTTP request logs only) injected via constructor, that's why it should be public.

/// Defines a rule used to filter log messages for purposes of futher buffering.
/// </summary>
[Experimental(diagnosticId: DiagnosticIds.Experiments.Telemetry, UrlFormat = DiagnosticIds.UrlFormat)]
public class BufferFilterRule : ILoggerFilterRule
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to also support filtering per name/value pair. So that you can control buffering based on things like cluster name or region.

public TimeSpan Duration { get; set; } = TimeSpan.FromSeconds(30);

/// <summary>
/// Gets or sets the size of the buffer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this in log records? If so, I think this should be expressed in bytes, since the total size of individual log records may vary wildly.

Comment on lines +39 to +49
if (!httpContext.Items.TryGetValue(category, out var buffer))
{
var httpRequestBuffer = new HttpRequestBuffer(bufferSink, _requestOptions, _globalOptions);
httpContext.Items[category] = httpRequestBuffer;
return httpRequestBuffer;
}

if (buffer is not ILoggingBuffer loggingBuffer)
{
throw new InvalidOperationException($"Unable to parse value of {buffer} of the {category}");
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the key in HttpContext.Items should be something more complex than just string category, in case a library is already using the same string as a log category name and as a key in HttpContext.Items, and an application that uses this library is being changed to use log buffering.

Another risk here is that HttpContext is not thread-safe. If the request causes asynchronous work on multiple threads, and those threads log events in parallel (with the same log category or different categories), then the HttpContext.Items dictionary could be corrupted. Not sure how to fix this.

Address PR comments
Add .NET 8 support
Add ExceptionJsonConverter
@evgenyfedorov2 evgenyfedorov2 force-pushed the evgenyfedorov2/log_buffering branch from 883334d to 022f00c Compare December 16, 2024 14:23
@evgenyfedorov2 evgenyfedorov2 force-pushed the evgenyfedorov2/log_buffering branch from 022f00c to a371d9c Compare December 16, 2024 14:51
@dotnet-comment-bot
Copy link
Collaborator

‼️ Found issues ‼️

Project Coverage Type Expected Actual
Microsoft.AspNetCore.Diagnostics.Middleware Line 100 96.03 🔻
Microsoft.AspNetCore.Diagnostics.Middleware Branch 100 98.51 🔻
Microsoft.Extensions.Telemetry Line 93 86.2 🔻
Microsoft.Extensions.Telemetry Branch 93 84.59 🔻
Microsoft.Extensions.Diagnostics.Testing Line 99 85.78 🔻
Microsoft.Extensions.Diagnostics.Testing Branch 99 79.57 🔻

🎉 Good job! The coverage increased 🎉
Update MinCodeCoverage in the project files.

Project Expected Actual
Microsoft.Extensions.Diagnostics.Probes 70 76
Microsoft.Extensions.Caching.Hybrid 75 86
Microsoft.Extensions.AI.OpenAI 72 77

Full code coverage report: https://dev.azure.com/dnceng-public/public/_build/results?buildId=896358&view=codecoverage-tab

// NOTE: This implementation uses thread local storage. As a result, it will fail if formatter code, enricher code, or
// NOTE: This implementation uses thread local storage. As a wasBuffered, it will fail if formatter code, enricher code, or

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That change looks unintended.

{
case ModernTagJoiner modernTagJoiner:
_buffer.Enqueue(new SerializedLogRecord(logLevel, eventId, _timeProvider.GetUtcNow(), modernTagJoiner, exception,
((Func<ModernTagJoiner, Exception?, string>)(object)formatter)(modernTagJoiner, exception)));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this casting more performant than just formatter(attributes, exception)?

internal interface ILoggingBuffer
{
/// <summary>
/// Enqueues a log record in the underlying buffer..

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double dot. But I'm not sure whether internal interfaces should even have XML documentation comments.

Comment on lines +63 to +77
public void Flush()
{
var result = _buffer.ToArray();

#if NETFRAMEWORK
lock (_netfxBufferLocker)
{
while (_buffer.TryDequeue(out _))
{
// Clear the buffer
}
}
#else
_buffer.Clear();
#endif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loses log entries if another thread enqueues more of them between _buffer.ToArray() and _buffer.Clear(). That is more likely to happen on .NET Framework, where clearing the queue takes more time.

{
while (!cancellationToken.IsCancellationRequested)
{
await _timeProvider.Delay(_options.CurrentValue.Duration, cancellationToken).ConfigureAwait(false);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, IOptionsMonitor<TOptions>.CurrentValue throws if the option is bound to configuration and the binding fails (parsing fails or a validator rejects the value); this could even happen when an operator modifies a configuration file after the application has already started. The BackgroundService would then fail and, depending on HostOptions.BackgroundServiceExceptionBehavior, either log buffer truncation would stop working or the whole application would terminate. Should this instead keep using the previous value of the options in that case?

/// <summary>
/// Represents a rule used for filtering log messages for purposes of log sampling and buffering.
/// </summary>
internal interface ILoggerFilterRule

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ILoggerFilterRule interface doesn't look useful when it is internal and has only one implementation and has only read-only properties (rather than a method that checks whether the rule is applicable). LoggerFilterRuleSelector could instead use the BufferFilterRule class directly.

Comment on lines +72 to +77
logger.Log(
serializedRecord.LogLevel,
serializedRecord.EventId,
serializedRecord.Attributes,
exception,
(_, _) => serializedRecord.FormattedMessage ?? string.Empty);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allocates a closure and a delegate for each ILogger.Log<TState> call. I can think of two ways to minimize the allocations:

  • Instead of serializedRecord.Attributes, pass serializedRecord itself as the state to ILogger.Log<TState>. Implement IReadOnlyList<KeyValuePair<string, object?>> in SerializedLogRecord by calling the corresponding methods of public IReadOnlyList<KeyValuePair<string, string>> Attributes.
  • Alternatively, keep using the same delegate across all iterations of both foreach loops; declare the serializedRecord variable and a delegate variable before the outer loop. That way, it would allocate only one closure and one delegate per BufferSink.LogRecords call, no matter how many log records are in IEnumerable<T> serializedRecords. This is safe because ILogger.Log<TState> is not allowed to save the formatter delegate for later calling.

}
else
{
exception = Activator.CreateInstance(deserializedType, message) as Exception;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will pass the wrong argument to ArgumentNullException(string? paramName).

Comment on lines +116 to +125
exception = Activator.CreateInstance(deserializedType, message, innerExceptions.First()) as Exception;
}
}
else if (innerException != null)
{
exception = Activator.CreateInstance(deserializedType, message, innerException) as Exception;
}
else
{
exception = Activator.CreateInstance(deserializedType, message) as Exception;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the recovery plan if the exception type has no suitable constructor and Activator.CreateInstance throws MethodInvocationException? For example, System.ServiceModel.FaultException<TDetail> does not have any constructors that take just (String) or (String, Exception).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reconstruction of exception instances from JSON would not be necessary if loggers that do not implement IBufferedLogger were given the log entries immediately without buffering.

using Microsoft.Extensions.ObjectPool;

namespace Microsoft.Extensions.Diagnostics.Buffering;
internal sealed class PooledLogRecord : BufferedLogRecord, IResettable

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add DebuggerDisplayAttribute so that the list of log records in a buffer can be viewed more easily.

public override IReadOnlyList<KeyValuePair<string, object?>> Attributes => _attributes;
private IReadOnlyList<KeyValuePair<string, object?>> _attributes;

public bool TryReset()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What calls PooledLogRecord.TryReset()? BufferSink does PoolFactory.CreateListPool<PooledLogRecord>() and _logRecordPool.Return(pooledList), but PooledListPolicy<T>.Return(List<T>) clears only the list itself and does not call IResettable.Reset() on the elements of the list.

Comment on lines +23 to +26
/// <summary>
/// Gets the maximum <see cref="LogLevel"/> of messages where this rule applies to.
/// </summary>
public int? EventId { get; }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong documentation.

Comment on lines +54 to +55
public override string? MessageTemplate => _messageTemplate;
private string? _messageTemplate;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PooledLogRecord._messageTemplate is always null. Only PooledLogRecord.TryReset writes to it.

Comment on lines +48 to +55
pooledList.Add(
new PooledLogRecord(
serializedRecord.Timestamp,
serializedRecord.LogLevel,
serializedRecord.EventId,
serializedRecord.Exception,
serializedRecord.FormattedMessage,
serializedRecord.Attributes.Select(kvp => new KeyValuePair<string, object?>(kvp.Key, kvp.Value)).ToArray()));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These PooledLogRecord instances should presumably be rented from a pool but that has not been implemented; only the List<PooledLogRecord> instances are pooled, not the records themselves.

If the record were pooled, then it might be useful to pool the PooledLogRecord.Attributes arrays too. For that, there are a few options:

  • Change the type of PooledLogRecord._attributes to List<KeyValuePair<string, object>>. When the PooledLogRecord instance is returned to the pool, keep the attribute list attached; when the instance is rented again from the pool, reuse the attribute list and resize it if necessary.
    • Con: If most log entries have no attributes, but some rare log entries have hundreds of attributes, then over time most PooledLogRecord instances in the pool would get their attribute lists expanded to the largest size, wasting memory.
  • When the PooledLogRecord instance is returned to the pool, detach the attribute array and pool it separately in an ArrayPool<KeyValuePair<string, object>>.
    • Pro: Separate buckets for different array sizes.
    • Con: Because ArrayPool<T>.Rent can return a larger array than requested, PooledLogRecord.Attributes would have to return a size-limiting wrapper rather than the array itself. ArraySegment<T> implements IReadOnlyList<T> but it would have to be boxed anew each time the backing array is replaced. A custom class that implements IReadOnlyList<T> and allows reinitializing with a different array should work but I'm not sure about the performance.

Comment on lines +29 to +33
#if NETFRAMEWORK
serializedAttributes.Add(new KeyValuePair<string, string>(new string(attributes[i].Key.ToCharArray()), attributes[i].Value?.ToString() ?? string.Empty));
#else
serializedAttributes.Add(new KeyValuePair<string, string>(new string(attributes[i].Key), attributes[i].Value?.ToString() ?? string.Empty));
#endif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this copy the attribute key strings -- is that related to collectible assembly load contexts?

Comment on lines +57 to +58
public override IReadOnlyList<KeyValuePair<string, object?>> Attributes => _attributes;
private IReadOnlyList<KeyValuePair<string, object?>> _attributes;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is BufferedLogRecord.Attributes expected to include the "{OriginalFormat}" entry whose value should be in BufferedLogRecord.MessageTemplate too?

if (!category.AsSpan().StartsWith(prefix, StringComparison.OrdinalIgnoreCase) ||
!category.AsSpan().EndsWith(suffix, StringComparison.OrdinalIgnoreCase))
{
return false;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A length check seems to be missing. Consider rule.Category = "abb*bbc", which causes prefix = "abb" and suffix = "bbc". Then if category = "abbc", it should not match because it is too short, but this code accepts it because it starts with "abb" and ends with "bbc".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed dotnet/runtime#110890 for a similar bug in Microsoft.Extensions.Logging.LoggerRuleSelector.

if (wildcardIndex != -1 &&
categoryName.IndexOf(WildcardChar, wildcardIndex + 1) != -1)
{
throw new InvalidOperationException("Only one wildcard character is allowed in category name.");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this error were thrown from IValidateOptions<GlobalBufferOptions> and IValidateOptions<HttpRequestBufferOptions>, the error might be easier to handle there, especially during configuration reload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants