Log4j syslog appender issue

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Hi,
I am using Opensearch 2.7.0 in k8s environment. I want to send my logs to syslog.

Describe the issue:

During the startup of opensearch pods, all logs are sent to syslog successfully. But due to some reason if syslog server restarts, these pods are not able to communicate with syslog. Continuously getting Broken pipe error.

[T#1] ERROR Unable to write to stream TLS:xx.xx.xx.xx:514 for appender syslog org.apache.logging.log4j.core.appender.AppenderLoggingException: Error writing to TLS:xx.xx.xx.xx:514 for connection /xx.xx.xx.xx:514
        at org.apache.logging.log4j.core.net.TcpSocketManager.write(TcpSocketManager.java:244)
        at org.apache.logging.log4j.core.appender.OutputStreamManager.write(OutputStreamManager.java:190)
        at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.writeByteArrayToManager(AbstractOutputStreamAppender.java:206)
        at org.apache.logging.log4j.core.appender.SocketAppender.directEncodeEvent(SocketAppender.java:459)
        at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:190)
        at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:181)
        at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:161)
        at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:134)
        at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:125)
        at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:89)
        at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:542)
        at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:500)
        at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:483)
        at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:417)
        at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:82)
        at org.apache.logging.log4j.core.Logger.log(Logger.java:161)
        at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2205)
        at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2159)
        at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2142)
        at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2017)
        at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1983)
        at org.apache.logging.log4j.spi.AbstractLogger.info(AbstractLogger.java:1320)
        at org.opensearch.jobscheduler.sweeper.JobSweeper.lambda$initBackgroundSweep$10(JobSweeper.java:294)
        at org.opensearch.threadpool.Scheduler$ReschedulingRunnable.doRun(Scheduler.java:239)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:806)
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.net.SocketException: Broken pipe (Write failed)
        at java.base/java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.base/java.net.SocketOutputStream.socketWrite(Unknown Source)
        at java.base/java.net.SocketOutputStream.write(Unknown Source)
        at java.base/sun.security.ssl.SSLSocketOutputRecord.deliver(Unknown Source)
        at java.base/sun.security.ssl.SSLSocketImpl$AppOutputStream.write(Unknown Source)
        at org.apache.logging.log4j.core.net.TcpSocketManager.writeAndFlush(TcpSocketManager.java:253)
        at org.apache.logging.log4j.core.net.TcpSocketManager.write(TcpSocketManager.java:219)
        ... 31 more

Configuration

  1. Syslog configuration
# rsyslog configuration file

# For more information see /usr/share/doc/rsyslog-*/rsyslog_conf.html
# If you experience problems, see http://www.rsyslog.com/doc/troubleshoot.html

#### MODULES ####

# The imjournal module bellow is now used as a message source instead of imuxsock.
$ModLoad imuxsock # provides support for local system logging (e.g. via logger command)
$ModLoad imjournal # provides access to the systemd journal
#$ModLoad imklog # reads kernel messages (the same are read from journald)
#$ModLoad immark  # provides --MARK-- message capability

# Provides UDP syslog reception
#$ModLoad imudp
#$UDPServerRun 514

# Provides TCP syslog reception
#$ModLoad imtcp
#$InputTCPServerRun 514

# load TCP listener
module(
load="imtcp"
StreamDriver.Name="gtls"
StreamDriver.Mode="1"
StreamDriver.Authmode="anon"
)

# start up listener at port 514
input(
type="imtcp"
port="514"
)


#### GLOBAL DIRECTIVES ####
global(
DefaultNetstreamDriver="gtls"
DefaultNetstreamDriverCAFile="/etc/rsyslog.d/ca.crt"
DefaultNetstreamDriverCertFile="/etc/rsyslog.d/server.crt"
DefaultNetstreamDriverKeyFile="/etc/rsyslog.d/server.key"
)

# Where to place auxiliary files
$WorkDirectory /var/lib/rsyslog

# Use default timestamp format
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat

# File syncing capability is disabled by default. This feature is usually not required,
# not useful and an extreme performance hit
#$ActionFileEnableSync on

# Include all config files in /etc/rsyslog.d/
$IncludeConfig /etc/rsyslog.d/*.conf

# Turn off message reception via local log socket;
# local messages are retrieved through imjournal now.
$OmitLocalLogging on

# File to store the position in the journal
$IMJournalStateFile imjournal.state


#### RULES ####

# Log all kernel messages to the console.
# Logging much else clutters up the screen.
#kern.*                                                 /dev/console

# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none;cron.none                /var/log/messages

# The authpriv file has restricted access.
authpriv.*                                              /var/log/secure

# Log all the mail messages in one place.
mail.*                                                  -/var/log/maillog


# Log cron stuff
cron.*                                                  /var/log/cron

# Everybody gets emergency messages
*.emerg                                                 :omusrmsg:*

# Save news errors of level crit and higher in a special file.
uucp,news.crit                                          /var/log/spooler

# Save boot messages also to boot.log
local7.*                                                /var/log/boot.log


# ### begin forwarding rule ###
# The statement between the begin ... end define a SINGLE forwarding
# rule. They belong together, do NOT split them. If you create multiple
# forwarding rules, duplicate the whole block!
# Remote Logging (we use TCP for reliable delivery)
#
# An on-disk queue is created for this action. If the remote host is
# down, messages are spooled to disk and sent when it is up again.
#$ActionQueueFileName fwdRule1 # unique name prefix for spool files
#$ActionQueueMaxDiskSpace 1g   # 1gb space limit (use as much as possible)
#$ActionQueueSaveOnShutdown on # save messages to disk on shutdown
#$ActionQueueType LinkedList   # run asynchronously
#$ActionResumeRetryCount -1    # infinite retries if host is down
# remote host is: name/ip:port, e.g. 192.168.0.1:514, port optional
#*.* @@remote-host:514
# ### end of the forwarding rule ###

  1. Below is my log4j2.properties file.
status = error
appender.console.type = Console
appender.console.name = console
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] [%node_name]%marker %m%n
appender.syslog.type = Syslog
appender.syslog.name = syslog
appender.syslog.host = xx.xx.xx.xx
appender.syslog.port = 514
appender.syslog.facility = null
appender.syslog.protocol = TCP
appender.syslog.SSL.type = SSL
appender.syslog.SSL.protocol = TLS
appender.syslog.SSL.TrustStore.type = TrustStore
appender.syslog.SSL.TrustStore.location = /etc/opensearch/config/certs/syslog-certs/syslogTrustoreJks
appender.syslog.SSL.TrustStore.password = changeit
appender.syslog.layout.type = PatternLayout
appender.syslog.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] [%node_name]%marker %m%n
rootLogger.appenderRef.syslog.ref = syslog
rootLogger.level = info
rootLogger.appenderRef.console.ref = console

Note: Using gnutls command I am able to send logs to syslog after syslog restart. Only the pod logs are not being sent to syslog after syslog restart.

Relevant Logs or Screenshots:

Attached in above section.

Please let me know how to come out from this error and send logs to syslog even after syslog restart.

If I’m reading the docs right, you’ll want to set reconnectionDelayMillis for your syslog appender to some value, so that it retries establishing the connection.

You might still see this error temporarily - after all, some writes will fail as your rsyslog restarts. IMO, that’s one of the problems with a logging library instead of a log shipper.

Maybe a cleaner option is to use rsyslog as a sidecar container, then make OpenSearch log to it? Then rsyslog can take care of shipping the data to the next hop. And if that hop fails, rsyslog can take care of buffering, retrying, etc (here’s a config snippet).

Hi,
I have tried setting reconnectionDelayMillis but it didn’t help. Pod logs are stuck in Broken Pipe exception.

Thanks, unfortunate :frowning:

I would go the “local syslog” route, with an rsyslog sidecar. IMO it’s the cleaner option anyway, though I understand it’s more complicated to set up.

In my case the uppest error message was triggered as well, seemingly due to a timing issue. I solved with a
After=network-online.target
in the unit in … systemd/…/opensearch.conf