Skip to content

Commit

Permalink
prep for renaming master branch to main
Browse files Browse the repository at this point in the history
  • Loading branch information
nikhilsimha committed Feb 21, 2024
1 parent ac5095b commit be3e262
Show file tree
Hide file tree
Showing 23 changed files with 88 additions and 88 deletions.
36 changes: 18 additions & 18 deletions CONTRIBUTE.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,37 +183,37 @@ Below is a list of resources that can be useful for development and debugging.
## Docs

(Docsite)[https://chronon.ai]
(doc directory)[https://github.com/airbnb/chronon/tree/master/docs/source]
(doc directory)[https://github.com/airbnb/chronon/tree/main/docs/source]
(Code of conduct)[TODO]

## Links:

(pip project)[https://pypi.org/project/chronon-ai/]
(maven central)[https://mvnrepository.com/artifact/ai.chronon/]: (publishing)[https://github.com/airbnb/chronon/blob/master/devnotes.md#publishing-all-the-artifacts-of-chronon]
(Docsite: publishing)[https://github.com/airbnb/chronon/blob/master/devnotes.md#chronon-artifacts-publish-process]
(maven central)[https://mvnrepository.com/artifact/ai.chronon/]: (publishing)[https://github.com/airbnb/chronon/blob/main/devnotes.md#publishing-all-the-artifacts-of-chronon]
(Docsite: publishing)[https://github.com/airbnb/chronon/blob/main/devnotes.md#chronon-artifacts-publish-process]


## Code Pointers

Api - (Thrift)[https://github.com/airbnb/chronon/blob/master/api/thrift/api.thrift#L180], (Python)[https://github.com/airbnb/chronon/blob/master/api/py/ai/chronon/group_by.py]
(CLI driver entry point for job launching.)[https://github.com/airbnb/chronon/blob/master/spark/src/main/scala/ai/chronon/spark/Driver.scala]
Api - (Thrift)[https://github.com/airbnb/chronon/blob/main/api/thrift/api.thrift#L180], (Python)[https://github.com/airbnb/chronon/blob/main/api/py/ai/chronon/group_by.py]
(CLI driver entry point for job launching.)[https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/Driver.scala]

**Offline flows that produce hive tables or file output**
(GroupBy)[https://github.com/airbnb/chronon/blob/master/spark/src/main/scala/ai/chronon/spark/GroupBy.scala]
(Staging Query)[https://github.com/airbnb/chronon/blob/master/spark/src/main/scala/ai/chronon/spark/StagingQuery.scala]
(Join backfills)[https://github.com/airbnb/chronon/blob/master/spark/src/main/scala/ai/chronon/spark/Join.scala]
(Metadata Export)[https://github.com/airbnb/chronon/blob/master/spark/src/main/scala/ai/chronon/spark/MetadataExporter.scala]
(GroupBy)[https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/GroupBy.scala]
(Staging Query)[https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/StagingQuery.scala]
(Join backfills)[https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/Join.scala]
(Metadata Export)[https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/MetadataExporter.scala]
Online flows that update and read data & metadata from the kvStore
(GroupBy window tail upload )[https://github.com/airbnb/chronon/blob/master/spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala]
(Streaming window head upload)[https://github.com/airbnb/chronon/blob/master/spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala]
(Fetching)[https://github.com/airbnb/chronon/blob/master/online/src/main/scala/ai/chronon/online/Fetcher.scala]
(GroupBy window tail upload )[https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala]
(Streaming window head upload)[https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala]
(Fetching)[https://github.com/airbnb/chronon/blob/main/online/src/main/scala/ai/chronon/online/Fetcher.scala]
Aggregations
(time based aggregations)[https://github.com/airbnb/chronon/blob/master/aggregator/src/main/scala/ai/chronon/aggregator/base/TimedAggregators.scala]
(time independent aggregations)[https://github.com/airbnb/chronon/blob/master/aggregator/src/main/scala/ai/chronon/aggregator/base/SimpleAggregators.scala]
(integration point with rest of chronon)[https://github.com/airbnb/chronon/blob/master/aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala#L223]
(Windowing)[https://github.com/airbnb/chronon/tree/master/aggregator/src/main/scala/ai/chronon/aggregator/windowing]
(time based aggregations)[https://github.com/airbnb/chronon/blob/main/aggregator/src/main/scala/ai/chronon/aggregator/base/TimedAggregators.scala]
(time independent aggregations)[https://github.com/airbnb/chronon/blob/main/aggregator/src/main/scala/ai/chronon/aggregator/base/SimpleAggregators.scala]
(integration point with rest of chronon)[https://github.com/airbnb/chronon/blob/main/aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala#L223]
(Windowing)[https://github.com/airbnb/chronon/tree/main/aggregator/src/main/scala/ai/chronon/aggregator/windowing]

**Testing**
(Testing - sbt commands)[https://github.com/airbnb/chronon/blob/master/devnotes.md#testing]
(Testing - sbt commands)[https://github.com/airbnb/chronon/blob/main/devnotes.md#testing]
(Automated testing - circle-ci pipelines)[https://app.circleci.com/pipelines/github/airbnb/chronon]
(Dev Setup)[https://github.com/airbnb/chronon/blob/master/devnotes.md#prerequisites]
(Dev Setup)[https://github.com/airbnb/chronon/blob/main/devnotes.md#prerequisites]
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ Does not include:

## Setup

To get started with the Chronon, all you need to do is download the [docker-compose.yml](https://github.com/airbnb/chronon/blob/master/docker-compose.yml) file and run it locally:
To get started with the Chronon, all you need to do is download the [docker-compose.yml](https://github.com/airbnb/chronon/blob/main/docker-compose.yml) file and run it locally:

```bash
curl -o docker-compose.yml https://chronon.ai/docker-compose.yml
Expand All @@ -74,7 +74,7 @@ In this example, let's assume that we're a large online retailer, and we've dete

## Raw data sources

Fabricated raw data is included in the [data](https://github.com/airbnb/chronon/blob/master/api/py/test/sample/data) directory. It includes four tables:
Fabricated raw data is included in the [data](https://github.com/airbnb/chronon/blob/main/api/py/test/sample/data) directory. It includes four tables:

1. Users - includes basic information about users such as account created date; modeled as a batch data source that updates daily
2. Purchases - a log of all purchases by users; modeled as a log table with a streaming (i.e. Kafka) event-bus counterpart
Expand Down Expand Up @@ -141,11 +141,11 @@ v1 = GroupBy(
)
```

See the whole code file here: [purchases GroupBy](https://github.com/airbnb/chronon/blob/master/api/py/test/sample/group_bys/quickstart/purchases.py). This is also in your docker image. We'll be running computation for it and the other GroupBys in [Step 3 - Backfilling Data](#step-3---backfilling-data).
See the whole code file here: [purchases GroupBy](https://github.com/airbnb/chronon/blob/main/api/py/test/sample/group_bys/quickstart/purchases.py). This is also in your docker image. We'll be running computation for it and the other GroupBys in [Step 3 - Backfilling Data](#step-3---backfilling-data).

**Feature set 2: Returns data features**

We perform a similar set of aggregations on returns data in the [returns GroupBy](https://github.com/airbnb/chronon/blob/master/api/py/test/sample/group_bys/quickstart/returns.py). The code is not included here because it looks similar to the above example.
We perform a similar set of aggregations on returns data in the [returns GroupBy](https://github.com/airbnb/chronon/blob/main/api/py/test/sample/group_bys/quickstart/returns.py). The code is not included here because it looks similar to the above example.

**Feature set 3: User data features**

Expand All @@ -167,7 +167,7 @@ v1 = GroupBy(
)
```

Taken from the [users GroupBy](https://github.com/airbnb/chronon/blob/master/api/py/test/sample/group_bys/quickstart/users.py).
Taken from the [users GroupBy](https://github.com/airbnb/chronon/blob/main/api/py/test/sample/group_bys/quickstart/users.py).


### Step 2 - Join the features together
Expand Down Expand Up @@ -200,7 +200,7 @@ v1 = Join(
)
```

Taken from the [training_set Join](https://github.com/airbnb/chronon/blob/master/api/py/test/sample/joins/quickstart/training_set.py).
Taken from the [training_set Join](https://github.com/airbnb/chronon/blob/main/api/py/test/sample/joins/quickstart/training_set.py).

The `left` side of the join is what defines the timestamps and primary keys for the backfill (notice that it is built on top of the `checkout` event, as dictated by our use case).

Expand Down Expand Up @@ -370,7 +370,7 @@ Using chronon for your feature engineering work simplifies and improves your ML
4. Chronon exposes easy endpoints for feature fetching.
5. Consistency is guaranteed and measurable.

For a more detailed view into the benefits of using Chronon, see [Benefits of Chronon documentation](https://github.com/airbnb/chronon/tree/master?tab=readme-ov-file#benefits-of-chronon-over-other-approaches).
For a more detailed view into the benefits of using Chronon, see [Benefits of Chronon documentation](https://github.com/airbnb/chronon/tree/main?tab=readme-ov-file#benefits-of-chronon-over-other-approaches).


# Benefits of Chronon over other approaches
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -411,7 +411,7 @@ class FrequentItems[T: FrequentItemsFriendly](val mapSize: Int, val errorType: E
// See: Back to the future: an even more nearly optimal cardinality estimation algorithm, 2017
// https://arxiv.org/abs/1708.06839
// refer to the chart here to tune your sketch size with lgK
// https://github.com/apache/incubator-datasketches-java/blob/master/src/main/java/org/apache/datasketches/cpc/CpcSketch.java#L180
// https://github.com/apache/incubator-datasketches-java/blob/main/src/main/java/org/apache/datasketches/cpc/CpcSketch.java#L180
// default is about 1200 bytes
class ApproxDistinctCount[Input: CpcFriendly](lgK: Int = 8) extends SimpleAggregator[Input, CpcSketch, Long] {
override def outputType: DataType = LongType
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ import java.util

case class BankersEntry[IR](var value: IR, ts: Long)

// ported from: https://github.com/IBM/sliding-window-aggregators/blob/master/rust/src/two_stacks_lite/mod.rs with some
// ported from: https://github.com/IBM/sliding-window-aggregators/blob/main/rust/src/two_stacks_lite/mod.rs with some
// modification to work with simple aggregator
class TwoStackLiteAggregationBuffer[Input, IR >: Null, Output >: Null](aggregator: SimpleAggregator[Input, IR, Output],
maxSize: Int) {
Expand Down
2 changes: 1 addition & 1 deletion airflow/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ def safe_part(p):
return re.sub("[^A-Za-z0-9_]", "__", safe_name)


# https://github.com/airbnb/chronon/blob/master/api/src/main/scala/ai/chronon/api/Extensions.scala
# https://github.com/airbnb/chronon/blob/main/api/src/main/scala/ai/chronon/api/Extensions.scala
def sanitize(name):
return re.sub("[^a-zA-Z0-9_]", "_", name)

Expand Down
2 changes: 1 addition & 1 deletion api/py/ai/chronon/group_by.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ class Operation:
APPROX_UNIQUE_COUNT = ttypes.Operation.APPROX_UNIQUE_COUNT
# refer to the chart here to tune your sketch size with lgK
# default is 8
# https://github.com/apache/incubator-datasketches-java/blob/master/src/main/java/org/apache/datasketches/cpc/CpcSketch.java#L180
# https://github.com/apache/incubator-datasketches-java/blob/main/src/main/java/org/apache/datasketches/cpc/CpcSketch.java#L180
APPROX_UNIQUE_COUNT_LGK = collector(ttypes.Operation.APPROX_UNIQUE_COUNT)
UNIQUE_COUNT = ttypes.Operation.UNIQUE_COUNT
COUNT = ttypes.Operation.COUNT
Expand Down
2 changes: 1 addition & 1 deletion api/py/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@


__version__ = "local"
__branch__ = "master"
__branch__ = "main"
def get_version():
version_str = os.environ.get("CHRONON_VERSION_STR", __version__)
branch_str = os.environ.get("CHRONON_BRANCH_STR", __branch__)
Expand Down
4 changes: 2 additions & 2 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -94,8 +94,8 @@ git.gitTagToVersionNumber := { tag: String =>
// Git plugin will automatically add SNAPSHOT for dirty workspaces so remove it to avoid duplication.
val versionStr = if (git.gitUncommittedChanges.value) version.value.replace("-SNAPSHOT", "") else version.value
val branchTag = git.gitCurrentBranch.value.replace("/", "-")
if (branchTag == "master") {
// For master branches, we tag the packages as <package-name>-<build-version>
if (branchTag == "main" || branchTag = "master") {
// For main branches, we tag the packages as <package-name>-<build-version>
Some(s"${versionStr}")
} else {
// For user branches, we tag the packages as <package-name>-<user-branch>-<build-version>
Expand Down
4 changes: 2 additions & 2 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
set -euxo pipefail

BRANCH="$(git rev-parse --abbrev-ref HEAD)"
if [[ "$BRANCH" != "master" ]]; then
echo "$(tput bold) You are not on master!"
if [[ "$BRANCH" != "main" ]]; then
echo "$(tput bold) You are not on main branch!"
echo "$(tput sgr0) Are you sure you want to release? (y to continue)"
read response
if [[ "$response" != "y" ]]; then
Expand Down
8 changes: 4 additions & 4 deletions devnotes.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ sbt python_api

Note: This will create the artifacts with the version specific naming specified under `version.sbt`
```text
Builds on master will result in:
Builds on main branch will result in:
<artifact-name>-<version>.jar
[JARs] chronon_2.11-0.7.0-SNAPSHOT.jar
[Python] chronon-ai-0.7.0-SNAPSHOT.tar.gz
Expand Down Expand Up @@ -227,15 +227,15 @@ This command will take into the account of `version.sbt` and handles a series of
2. Select "refresh" and "release"
3. Wait for 30 mins to sync to [maven](https://repo1.maven.org/maven2/) or [sonatype UI](https://search.maven.org/search?q=g:ai.chronon)
4. Push the local release commits (DO NOT SQUASH), and the new tag created from step 1 to Github.
1. chronon repo disallow push to master directly, so instead push commits to a branch `git push origin master:your-name--release-xxx`
1. chronon repo disallow push to main branch directly, so instead push commits to a branch `git push origin main:your-name--release-xxx`
2. your PR should contain exactly two commits, 1 setting the release version, 1 setting the new snapshot version.
3. make sure to use **Rebase pull request** instead of the regular Merge or Squash options when merging the PR.
5. Push release tag to master branch
5. Push release tag to main branch
1. tag new version to release commit `Setting version to 0.0.xx`. If not already tagged, can be added by
```
git tag -fa v0.0.xx <commit-sha>
```
2. push tag to master
2. push tag
```
git push origin <tag-name>
```
Expand Down
2 changes: 1 addition & 1 deletion docs/source/Code_Guidelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,4 +69,4 @@ in terms of power. Also Spark APIs are mainly in Scala2.
Every new behavior should be unit-tested. We have implemented a fuzzing framework
that can produce data randomly as scala objects or
spark tables - [see](../../spark/src/test/scala/ai/chronon/spark/test/DataFrameGen.scala). Use it for testing.
Python code is also covered by tests - [see](https://github.com/airbnb/chronon/tree/master/api/py/test).
Python code is also covered by tests - [see](https://github.com/airbnb/chronon/tree/main/api/py/test).
4 changes: 2 additions & 2 deletions docs/source/authoring_features/ChainingFeatures.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,9 +79,9 @@ enriched_listings = Join(

```
### Configuration Example
[Chaining GroupBy](https://github.com/airbnb/chronon/blob/master/api/py/test/sample/group_bys/sample_team/sample_chaining_group_by.py)
[Chaining GroupBy](https://github.com/airbnb/chronon/blob/main/api/py/test/sample/group_bys/sample_team/sample_chaining_group_by.py)

[Chaining Join](https://github.com/airbnb/chronon/blob/master/api/py/test/sample/joins/sample_team/sample_chaining_join.py)
[Chaining Join](https://github.com/airbnb/chronon/blob/main/api/py/test/sample/joins/sample_team/sample_chaining_join.py)

## Clarifications
- The goal of chaining is to use output of a Join as input to downstream computations like GroupBy or a Join. As of today we support the case 1 and case 2 in future plan
Expand Down
12 changes: 6 additions & 6 deletions docs/source/authoring_features/GroupBy.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ This can be achieved by using the output of one `GroupBy` as the input to the ne

## Supported aggregations

All supported aggregations are defined [here](https://github.com/airbnb/chronon/blob/master/api/thrift/api.thrift#L51).
All supported aggregations are defined [here](https://github.com/airbnb/chronon/blob/main/api/thrift/api.thrift#L51).
Chronon supports powerful aggregation patterns and the section below goes into detail of the properties and behaviors
of aggregations.

Expand Down Expand Up @@ -181,7 +181,7 @@ If you look at the parameters column in the above table - you will see `k`.

For approx_unique_count and approx_percentile - k stands for the size of the `sketch` - the larger this is, the more
accurate and expensive to compute the results will be. Mapping between k and size for approx_unique_count is
[here](https://github.com/apache/incubator-datasketches-java/blob/master/src/main/java/org/apache/datasketches/cpc/CpcSketch.java#L180)
[here](https://github.com/apache/incubator-datasketches-java/blob/main/src/main/java/org/apache/datasketches/cpc/CpcSketch.java#L180)
for approx_percentile is the first table in [here](https://datasketches.apache.org/docs/KLL/KLLAccuracyAndSize.html).
`percentiles` for `approx_percentile` is an array of doubles between 0 and 1, where you want percentiles at. (Ex: "[0.25, 0.5, 0.75]")

Expand All @@ -193,7 +193,7 @@ The following examples are broken down by source type. We strongly suggest makin

## Realtime Event GroupBy examples

This example is based on the [returns](https://github.com/airbnb/chronon/blob/master/api/py/test/sample/group_bys/quickstart/returns.py) GroupBy from the quickstart guide that performs various aggregations over the `refund_amt` column over various windows.
This example is based on the [returns](https://github.com/airbnb/chronon/blob/main/api/py/test/sample/group_bys/quickstart/returns.py) GroupBy from the quickstart guide that performs various aggregations over the `refund_amt` column over various windows.

```python
source = Source(
Expand Down Expand Up @@ -236,7 +236,7 @@ v1 = GroupBy(

## Bucketed GroupBy Example

In this example we take the [Purchases GroupBy](https://github.com/airbnb/chronon/blob/master/api/py/test/sample/group_bys/quickstart/purchases.py) from the Quickstart tutorial and modify it to include buckets based on a hypothetical `"credit_card_type"` column.
In this example we take the [Purchases GroupBy](https://github.com/airbnb/chronon/blob/main/api/py/test/sample/group_bys/quickstart/purchases.py) from the Quickstart tutorial and modify it to include buckets based on a hypothetical `"credit_card_type"` column.

```python
source = Source(
Expand Down Expand Up @@ -283,7 +283,7 @@ v1 = GroupBy(

## Simple Batch Event GroupBy examples

Example GroupBy with windowed aggregations. Taken from [purchases.py](https://github.com/airbnb/chronon/blob/master/api/py/test/sample/group_bys/quickstart/purchases.py).
Example GroupBy with windowed aggregations. Taken from [purchases.py](https://github.com/airbnb/chronon/blob/main/api/py/test/sample/group_bys/quickstart/purchases.py).

Important things to note about this case relative to the streaming GroupBy:
* The default accuracy here is `SNAPSHOT` meaning that updates to the online KV store only happen in batch, and also backfills will be midnight accurate rather than intra day accurate.
Expand Down Expand Up @@ -329,7 +329,7 @@ v1 = GroupBy(

### Batch Entity GroupBy examples

This is taken from the [Users GroupBy](https://github.com/airbnb/chronon/blob/master/api/py/test/sample/group_bys/quickstart/users.py) from the quickstart tutorial.
This is taken from the [Users GroupBy](https://github.com/airbnb/chronon/blob/main/api/py/test/sample/group_bys/quickstart/users.py) from the quickstart tutorial.


```python
Expand Down
Loading

0 comments on commit be3e262

Please sign in to comment.