Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync dependencies #347

Open
lippserd opened this issue Aug 12, 2021 · 21 comments
Open

Sync dependencies #347

lippserd opened this issue Aug 12, 2021 · 21 comments
Assignees
Labels
enhancement New feature or request

Comments

@lippserd
Copy link
Member

At the moment dependencies are not synchronized in Redis and the database and there is no idea for the storage scheme yet.

Since dependencies represent trees (or even graphs?), They can be difficult to store in the database.

At the moment I can think of the following ways to store the dependencies in the database:

  • Associate parent
  • Nested set
  • Materialized path
  • Adjacency list / matrix
  • Closure table

I'm sure there's more.

All schemes must be compared in terms of complexity and performance when writing and reading.

I think it also makes sense to store the parent dependencies directly for the objects. This would allow us to display these in the detail areas in Web without much effort.

@lippserd lippserd added the enhancement New feature or request label Aug 12, 2021
@lippserd lippserd added this to the v1.0.0 milestone Aug 12, 2021
@julianbrost julianbrost removed this from the 1.0.0 milestone Nov 18, 2021
@julianbrost
Copy link
Contributor

Another idea: don't sync dependencies to the relational database at all but instead only keep them in Redis. Whenever Web wants to display something about dependencies, the required graph algorithms are implemented as Lua scripts and executed on the Redis server.

Only downside with this: you couldn't use the dependency information in filters then (trying to support this would probably require some nasty work merging result sets from Redis and SQL), but things like showing all (reverse) dependencies of a checkable or rendering nice dependency graphs starting from a checkable should work nicely.

There could also be some combined solution where we think about what filtering on dependencies should be supported, write information specifically for these queries to the relational database and use Lua for everything else.

@nilmerg
Copy link
Member

nilmerg commented Aug 23, 2024

My proposal, which fits nicely with what is planned in Web, uses primarily recursive common table expressions to query connected nodes in either direction.

Required database versions:

  • MySQL 8
  • MariaDB 10.2.2
  • PostgreSQL 8.4.

The related tables consist of the following structures:

  • Adjacency Matrix (dependency_edge)

Glossary

Term Description
dependency An Icinga 2 object of type Dependency
child A host or service referenced by child_host_name or child_service_name in a dependency
node An object part of a dependency hierarchy (host, service or redundancy group)
edge A link between two different nodes
reachable Represents the overall state of the dependencies where the node is referenced as child
member A host or service referenced as parent in a dependency and as such part of a redundancy group

Schema Additions

Tables

host

column type null key fk comment
affected_children uint - - Number of dependent children, including grand children

host_state

column type null key fk comment
affects_children bool_enum - - y if reachable and any dependency, where this host is the parent, causes the child to be not reachable. Otherwise n

service

column type null key fk comment
affected_children uint - - Number of dependent children, including grand children

service_state

column type null key fk comment
affects_children bool_enum - - y if (reachable or not a child) and any dependency, where this service is the parent, causes the child to be not reachable. Otherwise n

redundancy_group
A redundancy group's config

column type null key fk comment
id binary(20) primary - -
display_name text - - The value of the redundancy_group dependency property

redundancy_group_state
The state of a redundancy group

column type null key fk comment
id binary(20) primary - -
redundancy_group_id binary(20) - redundancy_group.id -
failed bool_enum - - n if a member causes its child(ren) to be reachable. Otherwise y
is_reachable bool_enum - - y if failed=n or none of the members are not reachable. Otherwise n
last_state_change bigint - - -

dependency_node
A node referred to or referred by another node (i.e one that's part of a dependency relationship)

column type null key fk
id binary(20) primary -
host_id binary(20) unique1 host.id
service_id binary(20) unique1 service.id
redundancy_group_id binary(20) unique2 redundancy_group.id

dependency_edge
All direct dependency relationships (from = child, to = parent)

column type null key fk comment
from_node_id binary(20) unique1 dependency_node.id The id of the child node
to_node_id binary(20) unique1 dependency_node.id The id of the parent node
display_name text - - The value of the name dependency property
dependency_edge_state_id binary(20) - dependency_edge_state.id -

dependency_edge_state

column type null key fk comment
id binary(20) primary - -
failed bool_enum - - The state of the dependency

Relations

erDiagram
    dependency_node ||--|| host : is
    dependency_node ||--|| service : is
    dependency_node ||--|| redundancy_group : is
    dependency_edge }|--|{ dependency_node : "refers to / referred by"
    dependency_edge }|--|| dependency_edge_state : "has one / belongs to many"
    redundancy_group ||--|| redundancy_group_state : has
Loading

Example Queries

Root Problems

WITH RECURSIVE UnreachableNodes AS (
    SELECT id                     as id,
           0                      as level,
           dependency_node.host_id,
           cast('' as binary(20)) as service_id,
           cast('' as binary(20)) as redundancy_group_id,
           0                      as is_group_member
    FROM dependency_node
    WHERE dependency_node.host_id = '<host-id>'
      and dependency_node.service_id is null
    
    UNION ALL
    
    SELECT e.to_node_id,
           r.level + 1,
           tn.host_id,
           tn.service_id,
           tn.redundancy_group_id,
           r.redundancy_group_id IS NOT NULL AND r.level > 0 as is_group_member
    FROM dependency_edge e
             INNER JOIN UnreachableNodes r ON e.from_node_id = r.id
             INNER JOIN dependency_node tn ON e.to_node_id = tn.id
             LEFT JOIN dependency_edge_state es ON e.dependency_edge_state_id = es.id
             LEFT JOIN redundancy_group_state rgs
                       on rgs.redundancy_group_id = tn.redundancy_group_id
    WHERE es.failed = 'y'
       or rgs.is_reachable = 'n')
SELECT rn.id,
       rn.level,
       rn.is_group_member,
       h.name  as host_name,
       s.name  as service_name,
       rg.name as redundancy_group_name
FROM UnreachableNodes rn
         LEFT JOIN host h on rn.host_id = h.id
         LEFT JOIN host_state hs on rn.host_id = hs.host_id
         LEFT JOIN service s on rn.service_id = s.id
         LEFT JOIN service_state ss on rn.service_id = ss.service_id
         LEFT JOIN redundancy_group rg on rn.redundancy_group_id = rg.id
         LEFT JOIN redundancy_group_state rgs on rn.redundancy_group_id = rgs.redundancy_group_id
WHERE rn.level > 0
  and rn.is_group_member = 0
  and (
    ss.affects_children = 'y'
        or (rn.service_id IS NULL and hs.affects_children = 'y')
        or (rgs.failed = 'y' and rgs.is_reachable = 'y')
    );

Direct Parents

SELECT e.to_node_id,
       h.name  as host_name,
       s.name  as service_name,
       e.display_name as dependency_name,
       rg.name as redundancy_group_name
FROM dependency_edge e
         INNER JOIN dependency_node tn ON e.to_node_id = tn.id
         INNER JOIN dependency_node fn ON e.from_node_id = fn.id
         LEFT JOIN host h on tn.host_id = h.id
         LEFT JOIN service s on tn.service_id = s.id
         LEFT JOIN redundancy_group rg on tn.redundancy_group_id = rg.id
WHERE fn.host_id = '<host-id>'
  and fn.service_id is null;

Direct Children

SELECT e.from_node_id,
       h.name  as host_name,
       s.name  as service_name,
       e.display_name as dependency_name,
       rg.name as redundancy_group_name
FROM dependency_edge e
         INNER JOIN dependency_node tn ON e.to_node_id = tn.id
         INNER JOIN dependency_node fn ON e.from_node_id = fn.id
         LEFT JOIN host h on fn.host_id = h.id
         LEFT JOIN service s on fn.service_id = s.id
         LEFT JOIN redundancy_group rg on fn.redundancy_group_id = rg.id
WHERE tn.host_id = '<host-id>'
  and tn.service_id is null;

@julianbrost
Copy link
Contributor

dependency_state The state of the dependency's parent
state int ❌ - -

That's redundant with the host/service state information and I believe it also doesn't provide the information you hope for. I think you might want a failed boolean instead, given that the Icinga 2 Dependency object has attributes like states and ignore_soft_states that affect when a dependency counts as failed, i.e. marking the child object unreachable.

Similarly for redundancy groups. After all, they are just a set of dependencies that counts as failed if all individual dependencies are failed.

dependency_node_parent All parents a node is connected with, beyond all edges

This table can become huge. Imagine a dependency structure like the following (boxes are hosts, arrows are dependencies), scale up the top and bottom level and the size of this table will grow quadratically in the size of the Icinga 2 config.

graph TD;
    A1-->B;
    A2-->B;
	A3-->B;
	A4-->B;
    B-->C1;
    B-->C2;
	B-->C3;
	B-->C4;
Loading

So if this table is necessary, I see three options:

  1. Introduce some new limitation to the Icinga 2 config, something like any object must have at most N total (direct and indirect) parents.
  2. Have a similar check, but don't enforce it for the whole Icinga 2 config, but instead, skip writing rows into this table, i.e. it may be incomplete.
  3. Ignore the problem and if someone builds an unfortunate dependency structure, the whole system will fall over.

All direct dependency relationships (from = child, to = parent)

Why not simply stick to "child" and "parent"?

The related tables consist of the following structures:

  • Adjacency Matrix
  • Closure Table
  • Transitive Closure Table

How do these map to the tables given?

Also, can you please give an example how you'd expect these additional virtual nodes to be generated? At least with redundancy groups, that's not obvious.

@nilmerg
Copy link
Member

nilmerg commented Aug 23, 2024

I think you might want a failed boolean instead

Yeah, I thought of this, but figured this would be noted sooner or later, as it happend now ;)

This table can become huge

Before this becomes a problem (for the database or our queries), doesn't Icinga already suffer from this anyway? I'd go with 3 for now.

Why not simply stick to "child" and "parent"?

Hah, I knew this question comes up. Because the table is called _edge, not _relation, and left or right were not neutral enough to me. 😅

How do these map to the tables given?

Eh. There is no simple closure table. Added the names to the other two.

can you please give an example how you'd expect these additional virtual nodes to be generated?

Done.

@julianbrost
Copy link
Contributor

This table can become huge

Before this becomes a problem (for the database or our queries), doesn't Icinga already suffer from this anyway? I'd go with 3 for now.

To be honest, I don't know what would happen exactly, it's quite possible that things will get slower with lots of dependencies. But so far, it does not expand all indirect dependencies, so it's not obvious that this problem already exists.

Why not simply stick to "child" and "parent"?

Hah, I knew this question comes up. Because the table is called _edge, not _relation, and left or right were not neutral enough to me. 😅

Just to be sure: we're still talking about a directed graph here (otherwise, I don't know how this should work)? And the edges are aligned in the direction between a child and a parent, the edge just doesn't necessarily directly connect to something that's a host/service, i.e. the parent/child of a dependency in Icinga 2.

can you please give an example how you'd expect these additional virtual nodes to be generated?

Done.

That's not what I meant. More like which of these options do you have in mind? Or something totally different?

graph LR;
    ChildHost-->RedundancyGroup;
	RedundancyGroup-->ParentHostA;
	RedundancyGroup-->ParentHostB;
	ChildHost-->Dependency;
	Dependency-->ParentHostC;
Loading
graph LR;
    ChildHost-->RedundancyGroup;
	RedundancyGroup-->DependencyA;
	DependencyA-->ParentHostA;
	RedundancyGroup-->DependencyB;
	DependencyB-->ParentHostB;
	ChildHost-->Dependency;
	Dependency-->ParentHostC;
Loading

@nilmerg
Copy link
Member

nilmerg commented Aug 23, 2024

That's not what I meant. More like which of these options do you have in mind? Or something totally different?

Ah. Option 2 then.

@yhabteab
Copy link
Member

dependency_node_parent All parents a node is connected with, beyond all edges

This table can become huge.

As Julian have already pointed out, the dependenc_node_parent table is also superfluous to me, as I believe that the information for the Affected Children1 could also be assembled without this table (from the edges and nodes) by traversing the graph edges.

redundancy_group_state
The state of a redundancy group

I'm not sure at the moment why this is going to be useful, but technically redundancy groups only define a way to construct alternate dependencies and don't provide any actual states by themselves. So, why not just link the redundancy_group table to the dependency table, like it's done for the normal non-redundant dependencies? Alternatively, don't create a direct link at all, as the dependency can be reached via the dependency_node table.

(NOT EXISTS(SELECT 1 FROM dependency_edge WHERE from_node_id = e.to_node_id) and
      (hs.is_problem = 'y' or ss.is_problem = 'y'))) # the top most node is always reachable

Looking at this where clause, I'm wondering why the nodes are being expanded to themselves in the first place! Isn't that an indication of a cyclic graph (dependency), which Icinga 2 clearly prohibits, since Icinga 2 >= 2.14?

Footnotes

  1. It currently only seems to be used in your Affected Children example query, albeit at the moment I'm not sure how this information is supposed to represent affected children as it simply joins the tables and sums up the result, and apart from that, just representing the node edges increases the database storage exponentially, and duplicating that information makes no sense to me.

@nilmerg
Copy link
Member

nilmerg commented Aug 26, 2024

the dependency_node_parent table is also superfluous to me

It is not required to re-construct any relationships, yes. But it stores information crucial for a very specific use-case more efficiently. At least compared to work required to fetch the same information by traversing the edges each time. This table is only there to easily aggregate the number of affected children, as in the example. Without this table, this information cannot be shown in the UI in lists. (e.g. host and service list)

but technically redundancy groups only define a way to construct alternate dependencies and don't provide any actual states by themselves.

It's this alternation, which is abstracted away by this table. I don't want to have to express this in a query. Icinga knows about this, and can evaluate/update it in case it changes.

Looking at this where clause

This doesn't check for cyclic dependencies. It's used to know in advance that the recursion will stop with this row.

@nilmerg
Copy link
Member

nilmerg commented Aug 28, 2024

I've updated the proposal now with what I've discussed with @julianbrost yesterday. This is mainly:

  • Dependencies are no longer nodes in the graph. As such they are not referenced by dependency_node anymore. Instead, they are linked with dependency_edge
  • dependency now has additional columns in order to better reflect this as a new configuration table
  • The CTEs subquery has been removed, as edges now know their dependency's state which must be failed in case of a top node being able to include as root problem
  • Redundancy groups are no longer tracked per child relationship, but instead per children (plural) relationship. This reduces the amount of group nodes significantly and improves the visual representation. I can provide a more detailed description in person, or @julianbrost should also know about this
  • Regarding root problems and affected children, there are some alternative queries on the table now. They are relevant for the decision whether to require table dependency_node_parent or not

@nilmerg
Copy link
Member

nilmerg commented Aug 29, 2024

Regarding root problems and affected children, there are some alternative queries on the table now. They are relevant for the decision whether to require table dependency_node_parent or not

dependency_node_parent is gone now. In the end, it was only beneficial for a very specific use-case. (i.e the list of affected children.)

The total number of affected children, which Web would like to show in a host's or service's list item, is better be calculated by Icinga (DB) in advance and stored in table host and service respectively. Since this would only serve to indicate a very important node, this can only be an approximation (e.g. 1000+) if such an optimization is necessary.

The list of affected children will either be not shown, be a simple link to the zoomed in map or only be visible in case the parent has a problem while the list then only includes unreachable nodes. This is because, traversing the hierarchy top down (with a CTE) is considered to be very expensive, compared to a bottom up approach which is done when querying root problems.

nilmerg added a commit to Icinga/icingadb-web that referenced this issue Sep 5, 2024
Models for the new schema additions described here:
Icinga/icingadb#347 (comment)
@sukhwinder33445
Copy link

sukhwinder33445 commented Oct 22, 2024

The redundancy_group_state should contain an additional column, responsible_node_id. This column contains the dependency_node_id (host/service) which is responsible for the change in failed column.

Responsible Node:

  • The last UP|OK state member is also DOWN|CRITICAL, which leads to (redundancy_group_state.failed=y).

  • Any DOWN|CRITICAL state member is UP|OK now, which leads to (redundancy_group_state.failed=n).

Use case:

In the redundancy group details view, we want to display the plugin output of the member that affected the redundancy_group_state.failed column OR has the highest severity.

Problem:

In case, a member (other than the responsible one) becomes DOWN|CRITICAL, this will not affect the state of the group, as some other members are still UP|OK. We will still have the plugin output of the last resposible_node (which may now be OK) even though there is another member with higher severity.

If the group status is also OK|UP, the responsible_node_id must be updated with the member who has the highest severity.
@oxzi Is this easy to handle for icinga or do we have to come up with another solution?

@sukhwinder33445
Copy link

Update

We have dropped the idea of displaying the plugin output for redundancy groups. We therefore no longer need the responsible_node_id coulmn.

@nilmerg
Copy link
Member

nilmerg commented Dec 2, 2024

I've updated the proposal again and made sure the shown schema matches what I've pushed in #795. In addition, I updated the root problem query and removed the affected children query.

@nilmerg
Copy link
Member

nilmerg commented Dec 10, 2024

As we've discussed yesterday, it seems that dependency and dependency_state aren't as useful as I wanted them to be.

At the moment the only information stored in them and used by Web, are dependency.display_name and dependency_state.failed.

Problems arise, as @yhabteab noticed, once we take a look where dependency is linked to: dependency_edge

erDiagram
    dependency_edge ||--|| dependency : "declared by one / declares one"
    dependency ||--|| dependency_state : "has one"
Loading

Initially this looked fine:

  • An edge is the result of a dependency object in Icinga 2, which has one child and one parent
  • A dependency object also has state, it decides whether the child is reachable or not after all

But once the dependency object in question also defines a redundancy group, things get interesting:

erDiagram
    dependency_edge ||--|| dependency : "declared by one / declares one"
    dependency ||--|| dependency_state : "has one"
    dependency }|--|| redundancy_group : "belongs to"
Loading

Normally, redundancy groups are established in case multiple parents are cross linked with multiple children. (i.e. each child has the same parents as other children)
Which is why the same redundancy_group row might be referenced multiple times by dependency.redundancy_group_id.

graph TD;
    A-->C;
    A-->D;
    B-->C;
    B-->D;
    subgraph AB [ redundancy group ]
    A~~~B
   end
Loading

But redundancy groups are also linked by dependency_edge: (Implicitly through dependency_node)

erDiagram
    dependency_edge ||--|| dependency : "has one"
    dependency ||--|| dependency_state : "has one"
    dependency }|--|| redundancy_group : "belongs to (many)"
    dependency_node ||--|| redundancy_group : "has one"
    dependency_edge }|--|{ dependency_node : "has many"
Loading

To be expected, since redundancy groups are actual nodes in our representation of dependency hierarchies in Web.

But to avoid showing too complex structures and ease understanding, redundancy groups are supposed to be unique with their associated members. (The parents of the original dependency objects)

graph TD;
    A-->G[redundancy group];
    G-->D;
    B-->G;
    G-->C;
Loading

With this uniqueness in mind, the edges (which connect a redundancy group with their members) are halved.

But now, we end up with only two edges, which still need to be linked to a given dependency object. But there are four dependency objects involved, which will be used?

--

To solve this, @julianbrost suggested to not store dependency states in the database, but edge states. This means that dependency and dependency_state are no longer required.

As mentioned earlier, only dependency.display_name and dependency_state.failed are required by Web.

dependency.display_name might be stored as part of an edge instead, in dependency_edge.display_name.

dependency_state.failed should be moved to a new table dependency_edge_state:

erDiagram
    dependency_edge }|--|| dependency_edge_state : "has one / belongs to many"
Loading

The reason for this should hopefully be obvious, since we also save state of hosts and services in separate tables.

But the relationship of dependency_edge_state to dependency_edge is a one-to-many in the last graph, which I explain in the next comment.

@nilmerg
Copy link
Member

nilmerg commented Dec 11, 2024

Now onto why dependency_edge_state to dependency_edge should be a one-to-many relation.

Remember, redundancy groups are supposed to be separate nodes in their hierarchies. Connected edges pointing to their children are a result of the deduplication.

graph TD;
    A-->C;
    A-->D;
    B-->C;
    B-->D;
    subgraph AB [ redundancy group ]
    A~~~B
   end
Loading
graph TD;
    A-->G[redundancy group];
    G-->D;
    B-->G;
    G-->C;
Loading

Such edges, where the parent is a redundancy group, have no distinct state. Instead, they all share the same, so why shouldn't this be mirrored in the database?

Because of different dependency configurations you might point out. That's true.

But exactly for this reason, dependency_edge_state is going to be introduced. Suppose that the dependency objects, which establish the redundancy group, are a result of an apply rule.
This rule uses a single configuration, only the children relations are different in the end. In such a case, there's no different configuration to speak of.

Differences only come up, once such dependency objects are not a result of an apply rule. But if so, are we really talking of the same redundancy group? To me, this is a design flaw in Icinga's redundancy group implementation.

--

But I have another reason, for sharing state between multiple edges.

graph TD;
    c((A cluster of children));
    p[Parent]-->c;
Loading

The dependency map will group children above a given number (e.g. >10) and connect them with a single line to the parent. This line resembles a single edge.

To be able to do this, correctly, the map needs to distinguish relations based their definition. (i.e. dependency configuration)

Say, there are two distinct definitions part of 20 edges which all have the same parent. The map should render the following:

graph TD;
    c1((A cluster of 10 children));
    c2((A cluster of 10 children));
    p[Parent]-->c1;
    p-->c2;
Loading

The lines connecting them, resemble the two definitions in question. Phrased differently, two distinct dependency states.

So, in the database are 20 edges stored. But they reference only two states.

@julianbrost
Copy link
Contributor

Suppose that the dependency objects, which establish the redundancy group, are a result of an apply rule. This rule uses a single configuration, only the children relations are different in the end.

That's not enforced at all. Nothing stops you from setting config options to different values depending on the specific child object the dependency object is instantiated for. So apply rules are completely irrelevant here (like you can use them to create dependency objects but it doesn't matter if you did).

@nilmerg
Copy link
Member

nilmerg commented Dec 11, 2024

Okay, but that only makes things worse. To me. Parents in a redundancy group should be equal, or not? Why else would they be in said group?

@julianbrost
Copy link
Contributor

Yes, in order order to merge/deduplicate redundancy groups between two children, both redundancy groups have to contain the same set of parents with the same configuration (ignore_soft_states, period, states (and probably redundancy_group to be able to give the resulting redundancy group a somewhat meaningful name, but purely from a state perspective, differently named redundancy groups can behave identically)).

All I'm saying is that apply rules are unsuitable to obtain that grouping.

@nilmerg
Copy link
Member

nilmerg commented Dec 11, 2024

Okay, I just used apply rules as example, since I imagine most users of Icinga use them (backed by config we got shared) and then don't use different settings depending on where they apply. Anyway, they're only an abstraction layer, the result are multiple dependency objects with the, probably, same configuration. My reason for sharing states between edges.

@nilmerg
Copy link
Member

nilmerg commented Dec 11, 2024

I've updated the schema definition and example queries above accordingly now.

Main changes:

removed:

dependency
dependency_state
redundancy_group.name

added:

dependency_edge_state
dependency_edge.display_name
dependency_edge.dependency_edge_state_id

@raviks789
Copy link
Contributor

raviks789 commented Dec 18, 2024

dependency_edge_state

column type null key fk comment
id binary(20) primary - -
failed bool_enum - - The state of the dependency

According to my understanding (from my observation and discussion with @nilmerg), the failed column in the dependency_edge_state table should not be set to y, if the parent is a service and is unreachable because of its implicit dependency with the host and the child is also a service and is unreachable because of the host. See the example configuration below (dotted line represents implicit dependency).

graph TD;
    host-A .-> service-B;
    host-A .-> service-C;
    service-B --> service-C;
Loading

Also, IMO if the root is unreachable because of its implicit dependency, then it should not affect the state of the dependencies it is part of.

@nilmerg correct me if I am wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants