Skip to content

SecPriv/cross-platform-matching

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cross-Platform App Matching

This repository contains the code from the 2024 MSR Paper "Comparing Apples to Androids: Discovery, Retrieval, and Matching of iOS and Android Apps for Cross-Platform Analyses" by Magdalena Steinböck, Jakob Bleier, Mikka Rainer, Tobias Urban, Christine Utz, and Martina Lindorfer.

We're currently documenting our analysis pipeline and setup. Soon we will publish the code to: - scrape store information - download apps - extract features for matching

Additionally, we make available: - the full list of matches produced for our paper - the full list of reference pairs we collected

If you use this work in whole or in part for academic purposes please cite:

Steinböck, M., Bleier, J., Rainer, M., Urban, T., Utz, C., & Lindorfer, M. (2024). Comparing Apples to Androids: Discovery, Retrieval, and Matching of iOS and Android Apps for Cross-Platform Analyses. Proceedings of the 21st International Conference on Mining Software Repositories (MSR). https://doi.org/10.1145/3643991.3644896

Table of Contents

Datasets

For our 2024 MSR paper we also provide two datasets:

  1. First the reference pairs we scraped from the Google migration API. You can find them in data/reference-pairs.csv.zip, which contains a zipped csv with header that describes the columns.
  2. The list of best matches we computed. They are in data/computed-matches.json and represent, for each iOS bundle id, the highest scoring Android app after cross-compiling the top 10k apps from each store.
  3. The list of best matches that we could verify using the Google migration API. They are in data/computed-matches-verified.json.

Usage

0. Clone the repo

If not done already, you need to clone this repo. This is done via the git terminal command:

git clone [email protected]:SecPriv/cross-platform-matching.git

If you don't have git installed, please install it. You can visit https://git-scm.com/ or search for instructions online.

After you have clone the repo, go into the the directoy:

cd cross-platform-matching

The folder cross-platform-matching is considered the project root.

1. System Requirements

You need to have the following installed on your system:

If you don't know how to install them, please search for instructions on your own, as we can't provide guidance for every OS + architecture.

OS wise, the code has been executed on Linux x86 and Apple M1 ARM. 32GB of RAM or more are recommended, especially during matching of thousands of apps.

2. Python Package Dependencies

We provide dependency information for both poetry and pip.

pip (easy)

We highly recommend first creating a virtual env before installing any dependencies. Otherwise all packages will be installed globally and may cause conflicts with other projects that you might need to run.

Create a virtual env

To create a virtual env, simply run this from the project root:

python -m venv venv

This will create a venv folder where all the dependencies will be installed to. However, in order to configure python properly, you must activate it first.

OS Command
Unix (Linux/MacOS) source ./venv/bin/activate
Windows Powershell .\venv\Scripts\Activate.ps1
Windows CMD .\venv\Scripts\activate.bat

[!WARNING]
You need to active the Python virtual env every time you spawn a new shell! Otherwise python will only use and update the globally installed packages!

Most IDEs have good support for python virtual env, however. Please research on your own, how to build a workflow that suits your needs.

We provide a requirements.txt in the project root. You can install all listed dependencies by running:

# Don't forget to activate the virtual env first!
python -m pip install -r requirements.txt

poetry (reproducible)

Poetry works similarly to venv, but provides better stability of dependencies. If you have not already installed it, follow the official documentation.

When poetry is installed, you can simply run:

poetry install

3. Start Database

Option 1: Run locally

We are using Docker to run the database. To simplify the setup, we are providing a docker-compose.yml file.

To start the database, run the following command from a terminal in the project root.

# use the -d flag to run database in the background
docker compose up -d

Note

Older versions of Docker may not provide the docker compose sub-command. If the command fails, try the old docker-compose command instead.

When the command finished, the database should be available shortly after on port 27017(the default port of MongoDB). The default credentials are localadmin for both username and password.

Option 2: Connect to a remote instance (e.g. sharing an instance between runners)

If you need to connect to a remote MongoDB instance, you must set the MONGO_URL env variable to something like this:

MONGO_URL="mongodb://<username>:<password>@<domain or IP>:<port>/"

To change the name of the DB to use, set the MONGO_DB env variable.

MONGO_DB="name-of-your-db"

If setting an env variable is not possible or feasible for you, you can also update the db_connector.py file.

Caution

Never ever commit credentials to a git repository!
Git is not a secure storage for confidential information.

MongoDB GUI

For viewing and querying data, we recommend MongoDB Compass. It's a free GUI tool provided by MongoDB.

4. Executing the Python Scripts

All python scripts are executed from within the ./code folder. Otherwise imports cannot be resolved.

cd code

The pipeline is split into multiple steps that have different requirements:

1. Scraping metadata from stores

2. Filtering / Selecting apps to download

3. Downloading apps

for iOS
for Android

4. Preprocessing features

5. Computing scores

Usage:

python -m app_matcher.threaded_matcher --help

There are three required parameters:

  • --ios-collection: Name of the collection (or view) where the iOS analysis results reside.
  • --android-collection: Name of the collection (or view) where the Android analysis results reside.
  • --matches-collection: Name of the collection, where the results are written to.

Warning

The --matches-collection is not cleared before running. So if there is an error during execution, you must manually clear the collection or choose a different name. Otherwise you will have duplicate entires in the --matches-collection!

Development

License

The Cross-Platform App Matching code is distributed under the terms of the MIT license.

Troubleshooting

Cannot start the database due to a port conflict

This can happen, if the default port of MongoDB (27017) is already in use. You can change the port of MongoDB in the docker-compose.yml

# ...
    ports:
        - "127.0.0.1:<change this port>:27017"
#....

Note, however, that you also need to update the connection string.

This can either be done, by setting the MONGO_URL env variable to something like this:

mongodb://localadmin:localadmin@localhost:<your changed port>/

or by updating the default value in the db_connector.py file.

[!CAUTION] Never ever commit credentials to a git repository!
Git is not a secure storage for confidential information.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages