Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CascadeDelete in "appliction/domain layer" instead of Database #23188

Closed
John0King opened this issue Nov 4, 2020 · 15 comments
Closed

CascadeDelete in "appliction/domain layer" instead of Database #23188

John0King opened this issue Nov 4, 2020 · 15 comments
Labels
closed-no-further-action The issue is closed and no further action is planned. customer-reported

Comments

@John0King
Copy link

John0King commented Nov 4, 2020

I previously discuss in #21512 , and maybe I'm not clear my thought and that issue is closed. but I think it's important。

Current Cascade Options

  • Cascade data been deleted by database setting(cascade setting) and to delete it in ChangeTracker for in memeory data
  • ClientSetNull delete it in ChangeTracker for in memeory data only
  • SetNull set to null by database setting(cascade setting) and to set null in ChangeTracker for in memeory data
  • Restrict throw for non-nullable foreign key

and database relation has limited, that limited the child operation of 3 level, you can not cascade more. and database is doing hard job on concurrent query, and by using foreign key relation is slower than normal table (with cascade is much slow)

see Alibaba 's Database Design Rule and alibaba's taobao.com and tianmao.com is "Industry benchmark/standard" for huge request.

6.【强制】不得使用外键与级联,一切外键概念必须在应用层解决。 说明:以学生和成绩的关系为例,学生表中的 student_id 是主键,那么成绩表中的 student_id 则为外键。如果更新学生表中的 student_id,同时触发成绩表中的 student_id 更新,即为 级联更新。外键与级联更新适用于单机低并发,不适合分布式、高并发集群;级联更新是强阻 塞,存在数据库更新风暴的风险;外键影响数据库的插入速度。

translate :

[mandatory] foreign key and cascade are not allowed. All foreign key concepts must be solved in the application layer. Note: take the relationship between students and grades as an example,the student_ id in the student table is the primary key, then the student_ id in the grade table is the foreign key. If you update the student_id in the student table, and the student_id in the grade table will be triggered at the same time eg. update is cascade update. Foreign key and cascade update are suitable for single machine with low concurrency, and are not suitable for distributed and high concurrency clusters; cascaded update is a strong blocking, and there is a risk of database update storm; foreign key affects the speed of database insertion.

and base on my know, today(maybe the past) the big Hold back of the .net web site performace is database (java has sharding-jdbc for sepreate table/database that can manage huge data and improve performace of data). and see cnblog's experience, this may not relate this problem, but it show's that the challenge of .net website is always the database , I have the same experience for my company website. with all this, I think we should do something if we have the change to can make database has less work to do and make database more faster.

so, let's think how we can do in applcation layer.

  1. do not generate database foreign key relation is absolutely can be done. (consider builder.Entity<T>().WithoutForeignKey()) and it's not important for now.
  2. mock the foreign key relation in EF core 's model. eg (.HasForeignKey())
  3. track entity's delete and update and cascade delete by EF CORE to generate Delete Sql to delete it to database

and fix the bug behavior for below

var query = db.Articles.Inlcude(a=>a.Author);
var count = query.Count();  // only Count Blog 
var result = query.ToList(); // Articles INNER JOIN  Author

//if the data is normal:  the count result is same as  result.Count
//but if the Author is some how deleted without the foreign key constraint
// then the count will be bigger that result.Count

maybe I was remember wrong, in EF core 1.x , there is an article said that there is a JOIN optimizer that reduce the join if the result do not contains the second table field. but @smitpatel said there is not. but anyway the same query generate different SQL statement (the main query part) should not be a correct behavior. and consider that the author is been soft deleted with global filter, without JOIN (SELECT * FROM Author WHERE Authro.IsDeleted = false) as A ON Article.AuthorId = A.Id the query is still incorrect.

@roji
Copy link
Member

roji commented Nov 4, 2020

Related (possible dup): #13146

@ajcvickers
Copy link
Contributor

@John0King Some comments:

  • EF Core does not require that the database has any FK constraints. Feel free to remove them from the migration if you don't want them there. If you're not using Migrations, then simply don't create the constraints in whatever tool you use to manage the database.
  • EF Core FK handling works the same way regardless of whether the database has any FK constraints or not. There is no need to have EF Core do anything special just because there are no FK constraints in the database.
  • EF Core cascade deletes always happen on the client if entities are loaded. There is no requirement that the database have cascade deletes configured.

I think the only thing missing from EF Core in what you describe is the ability to delete entities without loading them into the context. This is tracked by #795.

@John0King
Copy link
Author

John0King commented Nov 6, 2020

@ajcvickers

Feel free to remove them from the migration

yes, I can do , but the number of table grows by time, and if I want to delete the FK then I had to delete them every time when I add a migration (including I previously deleted)

There is no need to have EF Core do anything special just because there are no FK constraints in the database.

I already show you a demo above, that without a FK constraints , it's possible that the data is incorrect like the soft delete example

public class Article
{
    public int AuthorId { get;set;}
    public Author  Author{ get;set;}  // application layer  FK
}

public class Author
{
    public int Id {get;set;}
    public string Name{get;set;}
}

// add data
var author = new Author{  }  // id=1
var artcle = new Article{  AuthorId = 1 }
db.Add(author)
db.Add(artcle)
db.Savechanges();
db.Remove(author)  // we remove a author , but won't remove the article
// don't be surprise there, if your search stackoverflow for `should I create a FK`  then , most answer will tell you without a FK constraints  , then there will be lots of  garbage data in database
db.Savechanges()  


// paged query

var query = db.Articles.Inlcude(x=>x.Author);
// var result = query.ToPagedResult(pageIndex,pageSize);  // for example   1, 10

var count = query.Count(); // 1
var result = query.ToList(); // 0

result on page:

Empty, no article yet

first prev 1 next latst page 1/1 total item: 1

@roji
Copy link
Member

roji commented Nov 6, 2020

A full solution here could also mean EF Core performing a bulk delete (or set null) on the database when a principal is deleted, to mimic the database behavor for cascading deletes.

@John0King
Copy link
Author

@roji yes, we do need bulk delete, and the fix for db.Articles.Include(x=>x.Author).Count() is still required for global filters and incorrect data in database earlier (or managed by other app) and then map to EF model

@ajcvickers
Copy link
Contributor

@roji Historically, that has always been part of the bulk update issue--there's discussion from Diego on #795. We could track it separately, but I think the feeling has been that if we do bulk updates, then using them for cascades is an important part of that.

@John0King In your example you create an optional relationship with a nullable FK property, in shadow state. EF does not by default configure optional relationships to cascade delete. If you made the relationship required or explicitly configured it to have cascade delete, then in your example both entities would be deleted since at the time you call SaveChanges they are both tracked.

@John0King
Copy link
Author

@ajcvickers thanks for your explain. but I think you still miss my point. let me explain.

first, I define int AuthorId and Author Author in my entity, that will create a non-nullable FK propertpy , and this is not about cascade delete, when I query using :

db.Articles.Incluce(x=>x.Author).ToList()
SELECT a.* FROM Articles as a INNER JOIN Author as u ON a.AuthorId = u.Id

but when I query using :

db.Articles.Incluce(x=>x.Author).Count()
SELECT Count(a.*) FROM Articles as a

you can see , that the Count() lost the INNER JOIN.


another example is using soft Delete;

public partial class Author
{
    public bool  IsDeleted {get;set;}
}

builder.Entity<Author>(b=> b.HasQueryFileters(a=>a.IsDeleted == false))

and then imaging the Author is been soft deleted

db.Articles.Incluce(x=>x.Author).ToList()
SELECT a.* FROM Articles as a INNER JOIN ( SELECT * FROM Author WHERE IsDeleted = 0) as u ON a.AuthorId = u.Id

but when I query using :

db.Articles.Incluce(x=>x.Author).Count()
SELECT Count(a.*) FROM Articles as a

and in this example the error is occur even when the database has the FK constraint.

@ajcvickers
Copy link
Contributor

@John0King

you can see , that the Count() lost the INNER JOIN.

Why is the inner join needed to get the count? In fact, would this not be the wrong result? Include will never effect the count of entities returned since, for example, there are the same number of Articles regardless which of them have Authors.

You say that you create a "non-nullable FK property," but it sounds like there are null values in the column. If a column contains null values, then it must be configured as nullable in the EF model. Otherwise you will at best get exceptions, and at worst get bad data.

@John0King
Copy link
Author

@ajcvickers I think you still in "strict data + should" mode to think the data relationship.
let me make a example again ,

Blog and Post example
goals:

  1. get a Post list
  2. only a Post that it's Blog not been soft/hard deleted will be list
  3. you will never show a Post that it's Blog been soft/hard deleted

In strict model (eg. Database constraint enabled)

define entity

public class Blog
{
    public int Id { get;set; }
    public string Name { get;set; }
    public bool IsDeleted{ get; set; } // default is false
}


    public class Post
    {
        public int Id { get; set; }
        public string PostName { get; set; }

        // with goal 3 , so I define that the BlogId is requred  FK eg. Noe-nullable-FK
        public int BlogId { get; set; }

        public Blog Blog{ get;set;}
    }

configure DataContext

public class AppDb :DbContext
{

        public DbSet<Blog> Blogs{get;set;}

        public DbSet<Post> Posts{ get;set;}

        protected override void OnModelCreating(ModelBuilder modelBuilder)
        {
            base.OnModelCreating(modelBuilder);
            modelBuilder.Entity<Blog>(builder=>
            {
                builder.HasQueryFilter(x=>x.IsDeleted == false);  // we set this filter so we well won't see any soft deleted 
            });
        }
}

and run using

 static void Main(string[] args)
        {

            SeedData();
            var db = new AppDb();
            var query = db.Posts.Include(x => x.Blog);

            var count = query.Count();
            var result = query.ToList();

            Console.WriteLine($"there are counted  [{count}] posts");
            Console.WriteLine($"there are actually [{result.Count}] posts ");
            foreach (var p in result)
            {
                Console.WriteLine($"\tname:[{p.PostName }] blogId: [{p.BlogId}]  blog is deleted: { p.Blog?.IsDeleted ?? true }  ");
            }
        }

and the result

there are counted  [3] posts
there are actually [1] posts 
        name:[b2 - p3] blogId: [2]  blog is deleted: False

In lax mode (eg. without Database constraint )

there can be no soflt deleted IsDeleted property, because BlogId is just a int, it can anything like 0,-1, or 999 it is valid anyway

so the result is the same, because we do everything in our "application layer" instead of data layer

@John0King
Copy link
Author

@ajcvickers I made a example repo and you can try to run that https://github.com/John0King/EfCoreFKDemo

@ajcvickers
Copy link
Contributor

@smitpatel Why does the query filter on Blog impact the number of Posts returned here?

@ajcvickers
Copy link
Contributor

@John0King So you should see a warning in the logs about adding a query filter to Blogs but not Posts. Otherwise your model doesn't prevent a non-deleted Post from existing for a deleted Blog. I think the fundamental problem here is that using a query filter is not a full solution to soft delete (see #19695). You can use query filters to get some of the behavior of soft deletes, but you have to be careful not to create invalid states like this in your model. (In other words, EF has no idea what the IsDeleted flag means semantically. It doesn't know that the relationship semantics (optional/required) have been changed because of a query filter.)

@smitpatel
Copy link
Contributor

Given the model configuration without soft delete scenario.
Author is principal, article is dependent and FK AuthorId is non-nullable hence article cannot exist without an associated Author.
If you perform "articles inner join authors" on FK then the number of rows in the result does not change since every article must have a matching author. Hence when doing count we remove the join since there is no need to join and SQL is faster without that join.

When soft delete is enabled, as Arthur pointed out, your data is inconsistent with the model
While you say that your FK BlogId is non-nullable, at the same time if the Blog with that Id has been marked as soft deleted then what is the principal of such post? In database it does point to a "blog" but if you apply soft delete filters then your post has no associated blog. i.e. your FK is not truly non-nullable. If cannot have non-null values but it does have orphans which is not what non-nullable FKs can have. We show warning for such a invalid configurations.
Now there are multiple ways to deal with the issue

  • As Arthur pointed to End-to-end soft delete #19695
  • Make FK constraint valid with/without query filter. If you configure soft delete on Post table also and further update the records in the way that if a blog marked as soft-deleted then all the posts related to it are also marked as deleted then you will receive a consistent graph of objects in the database. Blog-Post has a required constraints and no orphans with/without the soft delete filter.

The issue with filters and joins has been extensively discussed in #11691

@John0King
Copy link
Author

John0King commented Nov 12, 2020

@smitpatel @ajcvickers Thank for your explain, but I don't agree with you.

first , the database relationship with application is not one to one .
second , this issue is about application layer and without FK Constraint
3rd, EF is not database design tool, it's a application layer ORM, it job is not to ensure every data is correct in database, but to ensure every data for application is correct.
4th, It's a simple INNER JOIN or LEFT/RIGHT JOIN problem , don't make it too complicate.

Hence when doing count we remove the join since there is no need to join and SQL is faster without that join

you can see for this demo , it's incorrect behavior now. Why we write a additional .Incluce() if we want it to be fast without JOIN ?


with my first point. a database can be use for a .Net applcation and a php application, and then the db is not having FK (not soft delete), and the data for Post.BlogId can be 0 or any deleted id like 1,2,3 and the php app to ensure the data just write a INNER JOIN , and what does we need to do with the .net app using EF Core ?
the only way is to manually write .Join(x=>x.Blog,(p,b)=> new { p, b }) even we know that how the data relation is.

Data:

table - - -
Blog Id: 1 Id: 2 Id :3
table(No FK) - - -
Post Id:1, BlogId: 1 Id:2, BlogId: 2 Id:3, BlogId: 3

Result :

sense the data can also managed by php app , we can not assume the php will delete every relational data( a big application may have hundreds of those relationship) and when this issue(23188) been solved then in .net using EF Core we can not assume the data too, because some relationship may add later (lost relationship in first round of development).
so it's not a good idea to ensure EF query correctly by correct the data.

so my opinion is do not do "JOIN optimization" for .Include() just like you won't do JOIN optimization on .Join()


ps: for a example of Area, Countory, Province, City, Town, Street , should I using FK?
by using FK, I cannot using Cascade Delete, because the database not support cascade more than 3. ( so the .WithoutFKInDb() in migraion is necessary)
by not using FK, then I may missing to delete the Street.

so doing the FK in applcation layer maybe the best option.

@smitpatel
Copy link
Contributor

smitpatel commented Nov 12, 2020

Neither of the conclusions are actually correct.
Issue is neither EF Core, nor database, the major issue is incorrect data.

Putting all of EF Core, database constraints and query filters away.
There are 3 fundamental queries here.

  • Posts.Count()
    Literal meaning count number of rows in the Posts table.
  • Posts.ToList()
    Literal meaning - Materialize a post instance for each row in Posts table.
  • Posts.Include(e => e.Blog).ToList()
    Literal meaning - Exactly as 2nd query but also populate the Blog navigation for each post instance.

Now important thing here is that navigation if loaded from database then the principal key on blog instance pointed by that navigation matches the foreign key property value on post instance. That is very basic definition of navigations. If you want to call this an assumption of EF Core then there is no other way for us to identify what each Blog navigation points to since navigations do not exists in database. You are free to have a Blog property on Posts and set it yourself to whatever and ignore it in EF model.

Now let's introduce query filter on top of above.
You have defined query filter for Blogs table. What query filter does is, applies filter every time data is query from database. Hence in a way it is a condition which makes data visible or not to client. For filter on blog, basically, Regardless of how many rows are in Blogs table, only consider rows which matches this filter. Essentially a view of whole table. Since there is only filter over Blogs, it will have some rows invisible to EF Core. No such filter exists on Posts, so all rows are Posts table considered to be visible.
Now if we re-evaluate 3 queries from above.

  • Posts.Count()
    Count number of rows in Posts table. Posts don't have any restricted view so all rows are included and count will be exactly number of rows.
  • Posts.ToList()
    Same as above, all posts are materialized into an instance.
  • Posts.Include(e => e.Blog).ToList()
    Now this should be same as 2nd query with their Blog instance populated. If all the posts present in Posts table have their associated blogs in visible view, then everything will be wired up correctly. But if the blog is not visible then you violated the fundamental concept of navigation. You are saying that there is no associated blog for a post but post.BlogId still has a non-null value. Your navigation and FK values do not match.
    Taking it even further, you have created a non-deterministic behavior here. What happens when you soft-delete a blog, For blog itself the IsDeleted would be set to true and it would be removed from visible view. But what happens to posts which are associated with blog? Are they deleted from database? Are also marked as soft-deleted?

In the essence, you have defined a filtered view of your database which is entirely incorrect for a .NET client where navigations exists. I would assume same issue would be in any other type of programming language, but I am not expert and not going to comment on it.

so my opinion is do not do "JOIN optimization" for .Include() just like you won't do JOIN optimization on .Join()

There is no "JOIN optimization" for "Include". Even if we do LEFT JOIN we cannot materialize results because FK value and navigation values are mismatching.

Bottom line

Define the behavior of your data in your app which actually makes sense. What happens to posts if associated blog is soft-deleted

  • Remove navigations and relationships from EF Core model and set Blog property on post manually as you please
  • Delete posts outright in database - You will not get incorrect data regardless of join
  • Mark posts as soft-deleted - Add soft-delete for your post entity also and make sure IsDeleted flag for Blog/Posts are consistent when changed
  • Remove posts for soft-deleted blog from visible view - Define a query filter on posts like
    modelBuilder.Entity<Post>().HasQueryFilter(e => e.Blog.IsDeleted == false)

Query is neither requiring FK constraint in database nor enforcing correctness of data in database. If you don't give us data which can be materialized then you will get error. I need to create client side objects in the end.

@ajcvickers ajcvickers added closed-no-further-action The issue is closed and no further action is planned. and removed type-enhancement labels Nov 20, 2020
@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-no-further-action The issue is closed and no further action is planned. customer-reported
Projects
None yet
Development

No branches or pull requests

4 participants