Git Binding
Best Candidate
Reviewed Projects
- GitHub - dmitryn/GitStats: Statistics generator for git repositories. Fork of
- Requires Python 2
- GitHub - IonicaBizau/git-stats: 🍀 Local git statistics including GitHub-like contributions calendars.
- Just does git commit stats
- GitHub - dreamyguy/gitlogg: 💾 🧮 🤯 Parse the 'git log' of multiple repos to 'JSON'
- Does not work on Ethereum EIPs repo
- nielskrijger/gitstat: Generates git history json datafile for
- Works on EIPs repo
- Seems to have a loop on Dendron and ddamon-monorepo
- git log json format · GitHub
- Testing on EIPs -- TODO
- Testing on Dendron -- works
- Testing on ddaemon-monorepo -- TODO
- evilsocket/gitstats: Git Repository Analyzer.
- tarmstrong/git2json: Convert git logs to JSON for easy analysis
- file - Git log output to XML, JSON, or YAML? - Stack Overflow
Steps to index git repos
- [✅] Get a JSON file with a list of git URL's that can be cloned
- [✅] Run a script and clone all those repos
- [✅] Have a script go inside every git repo and export all the commits as JSON, save to another folder
- [❌] Process the git commit metadata and save it to yet another folder
- Parse Email
- Parse email Domain
- Parse name
- Parse second name
- Parse URL's from message
- Check URL's in message
- Message word count (check
- Message character count
- Calculate Character Count / Word Count
- Sum insertions and deletions
- [❌] Process the git committers data and merge it with another folder
- [❌] ndjson conversion script
- [❌] Dump every repo to their own unique index
- [❌] When all repos are dumped write a script to calculate the metadata such as committers and git repos
- wiki.ddaemon.monorepo.bindings.git.schema
does not take filename as CLI
should detect if there is a CLI argument
- Figuring out a way to assess the quality of the GitHub activity would be nice. Some projects just have a bunch of shitty pull requests that never get approved…
- We should also be scoring the quality of the commits as well
- And score on a multidimensional level (I.e. commit quality, influence, volume of quality commits, etc…)
- Once we have something like that in place, we can begin to track whenever the top—say 200–devs create a new project
Basic Queries
- Commits per author
bash git shortlog -s -n
- Commits per day/week/year
- Lines of code over time
- Graphs
Lines of code per author
bash git ls-files | while read f; do git blame -w -M -C -C --line-porcelain "$f" | grep -I '^author '; done | sort -f | uniq -ic | sort -n --reverse
graph - Generating statistics from Git repository - Stack Overflow
Questions for the data
- Source Code
- Check Method Names
- Queries for git repos such as
- ethereum/EIPs: The Ethereum Improvement Proposal repository
- Who has their name on the most EIP's
- Graph their names over time
- Graph repo activity over time
- Get semantic versioning of the repo
- Number of branches
- List unique names from a repo
- Same name but different email across repo or repos?
- PGP Signatures
- Heatmap of when the PGP signatures run out
- Expired PGP signatures still being used
- PGP signatures being updated
- Domain's of Emails
- Number of different email domains within single repo
- Number of different email domains across team
- Most popular email domain across all repos
- WHo committed the most code across all repos
- Sort commit comment's be length
- Who committed the most characters/lines of commits
- Average word length of a commit/users commits in a repo
- Heatmap of times a repo is committed to
- We can estimate the time zone of a team or the members on the team if they have a regular schedule
- Measure how much of a users code is still in the production branch
- How many lines need to be written by a user in order to get to production
Questions about questions for the data
- What are key words or phrases we should be looking for within the commits or even code itself
- What are patterns of behavior from professional developers we can measure
- What are the give always of a junior developer
Repos that product errors cloning
Notable Tools I researched along the way
- [Git log to json (git changelog nodejs|[git changelog nodejs|[git changelog nodejs|[git changelog nodejs|[git changelog nodejs|git changelog nodejs|[git changelog nodejs|[git changelog nodejs|[git changelog nodejs|[git changelog nodejs) · GitHub](/git changelog nodejs) · GitHub]] · GitHub](git%20changelog%20nodejs) · GitHub · GitHub]]%20%C2%B7%20GitHub)
- git-to-json - npm
- Read your project's latest git commit info and write it to JSON
- git-json-tree · PyPI
- git-log-to-json - npm
- This is definitely the best tool so far
- Bash Script (What I implemented)
- [Convert Git logs to JSON. The first script (|[|[|[|[|||[|[| is all you need, the other two files contain only optional bonus features 😀THIS GIST NOW HAS A FULL GIT REPO: is all you need, the other two files contain only optional bonus features 😀THIS GIST NOW HAS A FULL GIT REPO:]] is all you need, the other two files contain only optional bonus features 😀THIS GIST NOW HAS A FULL GIT REPO: is all you need, the other two files contain only optional bonus features 😀THIS GIST NOW HAS A FULL GIT REPO: is all you need, the other two files contain only optional bonus features 😀THIS GIST NOW HAS A FULL GIT REPO:]]%20is%20all%20you%20need,%20the%20other%20two%20files%20contain%20only%20optional%20bonus%20features%20%F0%9F%98%80THIS%20GIST%20NOW%20HAS%20A%20FULL%20GIT%20REPO:%20
- Has nodejs bindings
- Git log in JSON format
- Almost works just has double quotes inside the JSON string without the escape character