Start thinking about API Design #9

nathanielrindlaub · 2020-03-27T02:28:43Z

It's probably a good idea to start thinking about the types of queries we'll want to be able to perform on the data, as that will inform schema and API design (see #2). Some for instances off the top of my head:

Ability to filter by:

Date range
Camera
Location (still need to figure out a good system for storing location info)
Object detected (filter out blanks)
labels (both predicted and manual)

nathanielrindlaub · 2020-03-27T15:36:52Z

More thoughts... kind of getting into business logic and UX considerations. User types:

reviewer - ability to read recent images that have:

been updated with a new ML prediction, but not verified by a human. If we want to break the reviewing process into stages of yes/no questions, I would imagine a first pass might be a query that gets all non-verified images that have objects detected in them, and quickly confirm that they do have objects and the bounding boxes are accurate. That response can be paginated; no need for all the non-verified images at once. Must support quick writes to the DB as the reviewer update the image verification status. A second pass might look like: query all images with verified objects and ML suggested labels for a specific species that you haven't verified. So the reviewer could rip through and say yes skunk no skunk etc. Repeat for all classes. Circle back for a slower review of all images that have verified detected objects, but no ML label or 'unknown'. Circle back for a final review of all empty (no detected object) images.
I think arrow keys for yes/no will really speed the review up, but maybe have a third category (like up arrow) for "not quite right, maybe needs bounding box adjustment or delete a bounding box, but I'll deal with that later". Dumps it into a group that the reviewer can iterate through for a slower review.

analyst - probably wants large quantities of image metadata all at once but retrieving the images themselves is less important.

Limit analysts to only verified labeled images (safer and provides incentive to actually do the manual review)
Analysts are probably much more interested in quantifying an animal detection "event", which could be a sequence/series of photos or a single photo. We need to develop a representation of a detection event in our schema. Also, is grouping images into an event the responsibility of the reviewer? Or is that something we could determine in some automated fashion like "if there are multiple images with the same animal in them taken within 1 minute of each other that's safe bet it's a single animal event".

ML practitioner -

interested in extracting training data. They'd want to be able to extract a list of the paths to all verified labeled images of a set of classes, ideally split up by deployment location.
interesting in comparing model performance against real human review. I.e, retrieve some metrics of the real-life accuracy of each model. Perhaps these data are something we should track and write to the model documents in real time, e.g., each time a model makes a prediction, increment prediction count, each time an analyst verifies a prediction, increment correct count etc.

nathanielrindlaub self-assigned this Mar 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start thinking about API Design #9

Start thinking about API Design #9

nathanielrindlaub commented Mar 27, 2020

nathanielrindlaub commented Mar 27, 2020 •

edited

Loading

Start thinking about API Design #9

Start thinking about API Design #9

Comments

nathanielrindlaub commented Mar 27, 2020

nathanielrindlaub commented Mar 27, 2020 • edited Loading

nathanielrindlaub commented Mar 27, 2020 •

edited

Loading