Skip to content
This repository has been archived by the owner on Jun 9, 2022. It is now read-only.

Start thinking about API Design #9

Open
nathanielrindlaub opened this issue Mar 27, 2020 · 1 comment
Open

Start thinking about API Design #9

nathanielrindlaub opened this issue Mar 27, 2020 · 1 comment
Assignees

Comments

@nathanielrindlaub
Copy link
Member

It's probably a good idea to start thinking about the types of queries we'll want to be able to perform on the data, as that will inform schema and API design (see #2). Some for instances off the top of my head:

Ability to filter by:

  • Date range
  • Camera
  • Location (still need to figure out a good system for storing location info)
  • Object detected (filter out blanks)
  • labels (both predicted and manual)
@nathanielrindlaub nathanielrindlaub self-assigned this Mar 27, 2020
@nathanielrindlaub
Copy link
Member Author

nathanielrindlaub commented Mar 27, 2020

More thoughts... kind of getting into business logic and UX considerations. User types:

reviewer - ability to read recent images that have:

  • been updated with a new ML prediction, but not verified by a human. If we want to break the reviewing process into stages of yes/no questions, I would imagine a first pass might be a query that gets all non-verified images that have objects detected in them, and quickly confirm that they do have objects and the bounding boxes are accurate. That response can be paginated; no need for all the non-verified images at once. Must support quick writes to the DB as the reviewer update the image verification status. A second pass might look like: query all images with verified objects and ML suggested labels for a specific species that you haven't verified. So the reviewer could rip through and say yes skunk no skunk etc. Repeat for all classes. Circle back for a slower review of all images that have verified detected objects, but no ML label or 'unknown'. Circle back for a final review of all empty (no detected object) images.
  • I think arrow keys for yes/no will really speed the review up, but maybe have a third category (like up arrow) for "not quite right, maybe needs bounding box adjustment or delete a bounding box, but I'll deal with that later". Dumps it into a group that the reviewer can iterate through for a slower review.

analyst - probably wants large quantities of image metadata all at once but retrieving the images themselves is less important.

  • Limit analysts to only verified labeled images (safer and provides incentive to actually do the manual review)
  • Analysts are probably much more interested in quantifying an animal detection "event", which could be a sequence/series of photos or a single photo. We need to develop a representation of a detection event in our schema. Also, is grouping images into an event the responsibility of the reviewer? Or is that something we could determine in some automated fashion like "if there are multiple images with the same animal in them taken within 1 minute of each other that's safe bet it's a single animal event".

ML practitioner -

  • interested in extracting training data. They'd want to be able to extract a list of the paths to all verified labeled images of a set of classes, ideally split up by deployment location.
  • interesting in comparing model performance against real human review. I.e, retrieve some metrics of the real-life accuracy of each model. Perhaps these data are something we should track and write to the model documents in real time, e.g., each time a model makes a prediction, increment prediction count, each time an analyst verifies a prediction, increment correct count etc.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant