Methodology

This page documents how Fomofiles collects, analyzes, and publishes behavioral data. It exists to make the system auditable, replicable, and accountable.

Ethics Review & Oversight

This research operates as independent survey research with built-in ethics review mechanisms. As anonymous survey research collecting non-sensitive behavioral data from adults (18-25), this project qualifies for exemption under standard research ethics guidelines for minimal-risk studies.

Ethics Review Process:

  • • All survey instruments reviewed before deployment
  • • Consent architecture validated for clarity and completeness
  • • Data handling procedures audited quarterly
  • • Ethical decisions documented in Ethics Case Archive

Ethics Inquiries: ethics@fomofiles.in

For institutional collaborations requiring formal IRB approval, we provide complete documentation packages including methodology, consent procedures, and data governance protocols.

Three-Layer Responsibility Separation

To prevent authority concentration and maintain structural integrity, Fomofiles operates through three independent layers:

Layer A: Data Custody

Holds raw survey responses. Cannot interpret or publish. Enforces anonymization and access controls. Provides aggregate data only to Layer B.

Layer B: Interpretation

Analyzes aggregate data. Cannot access individual responses. Documents methodology and limitations. Submits insights to Layer C with full context.

Layer C: Public Voice

Publishes insights. Cannot modify analysis. Enforces citation guidelines and monitors for misuse. Maintains revision history.

No single person or role spans multiple layers. This structural firewall prevents data manipulation, interpretation bias, and authority creep.

Methodological Overview

Fomofiles uses repeated cross-sectional surveys to track behavioral patterns among Indian Gen Z (ages 18–25). Data is collected monthly, analyzed in aggregate, and published with full methodological transparency.

Core Principles

  • • Anonymity by design, not policy
  • • Aggregate analysis only (no individual profiling)
  • • Temporal consistency (same questions over time)
  • • Public methodology (fully auditable)
  • • Limitation disclosure (what we cannot know)

Data Sources

Primary Source: Monthly Behavioral Survey

Target Population: Indian residents, ages 18–25

Sample Size: Target 500+ responses per wave

Frequency: Monthly waves (open for 2 weeks each)

Distribution: Organic reach, no paid promotion

Participation: Voluntary, anonymous, uncompensated

We do not use scraped data, purchased panels, or third-party datasets. All insights derive from direct survey responses.

Survey Design

Question Stability

Core questions remain unchanged across waves to enable longitudinal comparison. New questions may be added but never replace existing ones mid-series.

Neutral Wording

Questions avoid leading language, emotional triggers, or brand references. Phrasing tested for clarity across education levels and English proficiency.

Response Options

Multiple choice for categorical data, numeric scales (1–10) for subjective measures, optional open-ended field for qualitative context. No forced responses.

Stop Conditions & Survey Pause Protocol

Survey collection may be paused or revised if questions yield problematic responses that indicate harm, confusion, or ethical concerns.

Conditions That Trigger Survey Pause

  • Participant Distress: Questions causing emotional harm or distress (detected through open-ended feedback or contact inquiries)
  • Systematic Confusion: >20% of responses indicate misunderstanding of question intent
  • Unintended Disclosure: Questions inadvertently eliciting personally identifiable information
  • Ethical Red Flags: Responses revealing coercion, illegal activity, or safety concerns
  • Data Quality Collapse: Response patterns suggesting bot activity, coordinated manipulation, or systematic gaming
  • Technical Failures: Form errors causing data loss or corruption

Pause Protocol (5-Step Process)

  1. Immediate Halt: Any team member issues stop notice → survey link disabled within 1 hour
  2. Rapid Assessment: Team reviews flagged responses within 24 hours to confirm issue severity
  3. Root Cause Analysis: Identify whether issue stems from question wording, technical error, or external factors
  4. Revision or Retirement: Question is either reworded (with documentation in changelog) or permanently retired
  5. Documentation: Decision logged in Ethics Case Archive with rationale and affected data handling

Handling Data from Paused Surveys

When a survey is paused mid-wave:

  • • Responses collected before pause are retained but flagged as "partial wave"
  • • Problematic question data may be excluded from analysis (documented in methodology notes)
  • • If entire wave is compromised, data is archived but not published
  • • Participants are not re-contacted (anonymous design prevents identification)
  • • Revised survey begins as new wave with clear version numbering

Historical Record: All survey pauses, revisions, and retirements are documented in Revision History with timestamps and justifications.

Data Management Plan

Storage Infrastructure

  • Location: Encrypted cloud storage (AWS S3, Mumbai region)
  • Encryption: AES-256 at rest, TLS 1.3 in transit
  • Backup: Daily automated backups with 30-day retention
  • Redundancy: Multi-region replication for disaster recovery

Access Controls

  • Raw Data Access: Layer A Data Custodian only (single role, audited)
  • Aggregate Data Access: Layer B Analysts (read-only, logged)
  • Authentication: Multi-factor authentication required for all access
  • Audit Logging: All data access logged with timestamp, user, and action
  • Review Frequency: Access logs reviewed monthly, anomalies investigated immediately

Retention Policy

  • Raw Responses: Retained indefinitely for longitudinal analysis and replication
  • Aggregate Data: Published datasets archived permanently with version control
  • Analysis Logs: Retained for 10 years minimum for audit and methodology verification
  • Deletion Conditions: Individual responses cannot be deleted (anonymous by design, cannot be identified)
  • Long-term Preservation: Annual exports to institutional repository for permanent archival

Data Sharing Policy

  • Public Sharing: Aggregate insights published openly on this site
  • Academic Requests: Anonymized aggregate datasets available to researchers upon request
  • Request Process: Submit research proposal, methodology, and intended use to research@fomofiles.in
  • Embargo Period: 6 months from collection before external sharing (internal analysis priority)
  • Prohibited Sharing: Raw individual responses never shared under any circumstances

This Data Management Plan is reviewed annually and updated as infrastructure or regulations change. All changes documented in Revision History.

Tools & Analysis

Statistical Methods

Descriptive statistics (means, medians, distributions), cross-tabulation by demographics, time-series comparison across waves. No predictive modeling or individual-level inference.

Qualitative Coding

Open-ended responses coded thematically. Multiple coders review for consistency. Participant language preserved in quotes. Analyst interpretation clearly separated.

Visualization Standards

Charts use muted academic colors, clear labels, and sample size notes. No decorative elements. Same variables maintain consistent colors across all charts.

Technology Stack & Workflow Risks

Current tooling is deliberately simple and manual. This section documents known limitations and mitigation strategies.

Current Stack (Phase 1)

  • Data Collection: Google Forms
    Limitations: Basic security, no advanced logic, single-account dependency
  • Data Storage: Google Sheets / Excel
    Limitations: ~1.79% cell error rate, no version control, fragile formulas, collaboration risks
  • Visualization: Datawrapper
    Limitations: Performance caps (~300 data points), requires internet, service dependency
  • Design/Publishing: Canva
    Limitations: Template consistency enforcement needed, accessibility checks required

Known Failure Points

  • Single Point of Failure: Google account compromise exposes all data
  • Human Error Cascade: Spreadsheet mistakes propagate undetected through formulas
  • No Audit Trail: Changes overwrite prior data with no rollback capability
  • Manual Sync Breaks: Form changes can break chart mappings without warning
  • Scale Limits: Very long surveys or large datasets will slow/break Forms and Sheets
  • Offline Exposure: Spreadsheets can be copied/emailed, creating security gaps

Mitigation Strategies (Current)

  • Access Control: Minimum necessary team members with Google account access
  • Manual Verification: Weekly spot-checks of spreadsheet formulas against known values
  • Backup Routine: Daily manual exports to encrypted local storage
  • Change Protocol: All form/sheet modifications logged in team changelog before implementation
  • Template Enforcement: Standardized Canva templates with accessibility checklist
  • Performance Monitoring: Track response times and data volumes to anticipate scale limits

Tool Disclosure Policy

Any use of automated or AI-assisted tools must be disclosed:

  • AI Transcription: If used for open-ended responses, disclosed in methodology notes with accuracy verification process
  • Automated Coding: Any algorithmic categorization disclosed with manual validation sample size
  • Chart Generation: Datawrapper settings and data transformations documented

When to Upgrade Stack

Triggers for moving beyond current tools:

  • • Survey responses exceed 1,000 per wave (Forms performance degradation)
  • • More than 3 team members need simultaneous data access (collaboration risk)
  • • Spreadsheet errors detected in >2% of published insights (quality threshold)
  • • Manual workflow requires >20 hours per week (unsustainable labor)
  • • Security incident or near-miss occurs (immediate upgrade required)

Philosophy: Simple tools are acceptable for slow-growth research if risks are known, documented, and actively mitigated. Complexity is added only when simplicity becomes a liability.

Survivability & Collaboration

For the project to endure and scale gradually, team processes and data practices must be robust. This section documents collaboration protocols, versioning strategies, and continuity safeguards.

Team Collaboration Protocols

Google Sheets and Datawrapper support live editing, enabling peer review and workload sharing. However, collaborative spreadsheets demand strict coordination to prevent data corruption.

  • Pre-Work Briefing:
    Every collaborator receives clear protocol document before accessing shared files. No one starts editing without instructions.
  • Edit Coordination:
    Team members announce edits in shared chat before making changes. Prevents simultaneous edits that overwrite each other's work.
  • Column/Row Protection:
    Deleting or moving columns requires second-person approval. Structural changes can damage entire sheet and hinder collaborators.
  • Formula Discipline:
    Keep formulas simple. Avoid hard-coded values. Document complex calculations in separate reference sheet.
  • Regular Check-ins:
    Weekly team review of data edits, chart updates, and workflow issues. Quick stand-up or async chat when data are actively edited.

Data Cleaning & Quality Control

Manual cleaning and coding (e.g., categorizing open-ended responses) is error-prone. Strict logging and dual verification mitigate risks.

  • Change Logging:
    Every data transformation logged in separate "Cleaning Log" sheet with: date, editor name, action taken, rationale.
  • Dual Verification:
    Two people independently check cleaned data against raw responses. Discrepancies discussed until consensus or documented as interpretive tension.
  • Formula Audits:
    Each added formula or filter increases error risk (~1.79% cell error rate in spreadsheets). Monthly spot-checks of formulas against known values.
  • Raw Data Preservation:
    Original survey responses never edited. All cleaning done in duplicate sheets. Raw data remains untouched for audit and replication.

Versioning & Backup Strategy

Current toolset lacks built-in version control. Without audit trails, tracing or reversing mistakes is difficult. Explicit versioning and backup policies are mandatory.

  • File Naming Convention:
    Append dates to critical files: Survey_Feb2025_v1.xlsx
    Version increments with each major change.
  • Google Drive Version History:
    Use Drive's built-in version history for critical docs. Review history monthly to identify unauthorized or accidental changes.
  • Periodic Snapshots:
    Monthly exports of all Sheets to encrypted local storage. Snapshots retained for 2 years minimum for rollback capability.
  • Change Log Maintenance:
    Lightweight changelog document: who did what and when. Helps trace errors and maintain trust in data.
  • Continuity Planning:
    If Google Forms/Sheets owner leaves or loses access, continuity suffers. All critical files stored in shared team drive (not personal accounts). Minimum two administrators with access.

Standard Operating Procedures (SOPs)

Written SOPs for each major task ensure consistency and reduce onboarding friction. Even bullet-point docs on shared drive can prevent coordination chaos.

  • SOP 1: Survey Setup
    Form creation checklist, question wording review, consent architecture validation, test submission verification.
  • SOP 2: Data Cleaning
    Raw data export, duplicate sheet creation, cleaning log initialization, dual verification process, final dataset approval.
  • SOP 3: Chart Production
    Data aggregation rules, Datawrapper import, chart-to-source verification, accessibility checks (alt-text, color contrast), publication approval.
  • SOP 4: Publication Workflow
    Peer review completion, limitation disclosure check, citation format verification, changelog update, public release.
  • SOP 5: Column Deletion Protocol
    Any deleted survey column must be reviewed by second person. Document rationale in changelog before deletion.

Security & Access Controls

  • Two-Factor Authentication:
    Required on all Google accounts used for research. Non-negotiable.
  • Limited Editing Rights:
    Form editing rights restricted to core team members only. View-only access for external collaborators.
  • Datawrapper Teams:
    Use team accounts (not personal) so charts and data aren't lost if someone's account changes or they leave project.
  • Secure File Sharing:
    Spreadsheets never emailed. All sharing via secure Drive links with expiration dates and access logging.

Scaling Considerations

Current setup is sustainable for small, slow-paced initiative. As volumes rise, manual steps will bottleneck the team.

  • Current Capacity:
    System handles ~500 responses/month with 2-3 active researchers. Manual workflow requires ~15 hours/week.
  • Onboarding Overhead:
    Each new team member requires 4-6 hours training on tools and conventions. SOPs reduce this to 2-3 hours.
  • When to Upgrade:
    Consider lightweight database or survey platform when: responses exceed 1,000/month, more than 3 simultaneous editors needed, manual workflow exceeds 20 hours/week, or error rate exceeds 2%.
  • Philosophy:
    Strict role definitions (who enters data, who reviews charts) and regular check-ins maintain quality. Complexity added only when simplicity becomes a liability.

Survivability Principle: The system can function long-term only with disciplined teamwork. Regular communication, documented procedures, and conservative safeguards ensure reliability as the project grows.

Lean Automation (Optional)

Without overhauling the manual-first stack, carefully tested light automations can reduce error risk and labor burden. These are optional enhancements, not requirements. All automation must be documented, tested, and reversible.

Automation Philosophy

Automation is added only when:

  • • Manual process becomes error-prone or unsustainable
  • • Automation reduces risk rather than introducing new complexity
  • • Team fully understands what the automation does
  • • Manual fallback remains available
  • • Automation is documented and version-controlled

Core Principle: Add process and documentation, not flashy tech. Preserve the project's ethos of silence and careful growth.

Candidate Automations (Phase 2+)

These are potential automations to consider when manual processes become bottlenecks. None are implemented in Phase 1.

  • Google Apps Script: Input Validation
    Automatically check survey responses for out-of-range values (e.g., age >25, negative spending). Flag anomalies for manual review rather than auto-rejecting.
    Risk: Script errors could block valid responses. Requires thorough testing and monitoring.
  • Datawrapper API: Chart Updates
    Use Datawrapper's API to programmatically update charts when source data changes, reducing manual re-import steps.
    Risk: API changes could break workflow. Requires version pinning and fallback to manual updates.
  • Scheduled Backups
    Automate daily exports of Google Sheets to encrypted local storage using Google Drive API or Apps Script.
    Risk: Backup script failure could go unnoticed. Requires monitoring and manual verification.
  • Formula Auditing Script
    Periodically scan spreadsheets for broken formulas, circular references, or hard-coded values. Generate audit report for manual review.
    Risk: False positives could create noise. Requires tuning and human judgment.
  • Response Deduplication
    Flag potential duplicate submissions based on timestamp clustering and response similarity. Manual review required before exclusion.
    Risk: Could incorrectly flag legitimate responses. Never auto-delete; always require human confirmation.

Automation Testing & Documentation Requirements

Before deploying any automation:

  • 1. Test on Sample Data:
    Run automation on historical data with known correct outputs. Verify results match manual process exactly.
  • 2. Document Logic:
    Write plain-language explanation of what automation does, what assumptions it makes, and what could go wrong.
  • 3. Version Control:
    Store automation scripts in version-controlled repository (e.g., GitHub) with change log.
  • 4. Monitoring Plan:
    Define how team will detect automation failures (e.g., weekly spot-checks, error alerts).
  • 5. Rollback Procedure:
    Document how to revert to manual process if automation fails. Manual fallback must always remain viable.
  • 6. Disclosure:
    Update methodology page to disclose automation use, including what's automated and how it's validated.

What NOT to Automate

Some processes must remain manual to preserve research integrity:

  • Qualitative Coding: Human judgment required for thematic analysis
  • Interpretation: Analyst insight cannot be algorithmic
  • Ethics Decisions: Stop-work authority and ethics review must be human
  • Publication Approval: Final peer review and limitation disclosure require judgment
  • Data Deletion: Any data removal must be manually reviewed and documented

Current Automation Status (Phase 1)

No automation currently deployed. All processes are manual with documented procedures. This section exists to guide future decisions if/when manual workflows become unsustainable.

Any future automation will be announced in Revision History with full documentation of what changed and why.

Guiding Principle: Automation serves the research, not the other way around. Light, tested, documented enhancements are acceptable. Complex systems that obscure methodology or introduce new failure modes are not.

Analysis Decision Trail

Every published insight includes documentation of how raw responses became conclusions. This audit trail ensures replicability and prevents cherry-picking.

Required Documentation for Each Insight

  • Source Questions: Which survey questions were used (exact wording)
  • Coding Scheme: How responses were categorized or scored
  • Aggregation Method: Mean, median, threshold, or frequency count
  • Exclusion Criteria: Incomplete responses, outliers, or data quality filters applied
  • Sample Size: Number of responses included in analysis
  • Interpretation Framework: Which lens or perspective was applied
  • Alternative Readings: Other possible interpretations considered

Analysis logs are maintained internally and available upon request for academic verification. See Research page for published papers with full methodology appendices.

Peer Review & Interpretation Oversight

All analysis undergoes multi-researcher review to catch bias, errors, and unsupported conclusions before publication.

Mandatory Peer Checks

  • Dual Coding (Qualitative Data):
    Open-ended responses coded independently by two researchers. Discrepancies discussed until consensus or documented as interpretive tension.
  • Chart-to-Source Verification:
    Before publication, second analyst verifies all charts against raw aggregate data. Checks for: correct aggregation, accurate labels, appropriate scale, sample size disclosure.
  • Interpretation Challenge:
    Draft insights reviewed by researcher not involved in initial analysis. Reviewer must identify: unsupported claims, alternative explanations, missing limitations.
  • Limitation Audit:
    Every insight must include documented limitations. Peer reviewer confirms limitations are honest and complete.

Review Frequency

Peer checks occur at multiple stages:

  • Pre-Analysis: Coding schemes reviewed before application
  • Mid-Analysis: Preliminary patterns discussed in team review
  • Pre-Publication: Final insights undergo full peer verification
  • Post-Publication: Quarterly audits of published insights against source data

Disagreement Resolution

When peer reviewers disagree on interpretation:

  • • Both interpretations documented in analysis log
  • • If unresolvable, insight published with explicit acknowledgment of interpretive tension
  • • Alternative readings included in "What We Might Be Missing" section
  • • Disagreement itself becomes data point about analytical uncertainty

No Single-Analyst Publication: Insights analyzed and verified by one person only are never published. Minimum two-researcher review is mandatory.

Bias & Limitations

All research has constraints. We document ours openly to prevent overinterpretation and guide appropriate use of insights.

Urban Skew

Sample overrepresents Tier 1 and Tier 2 cities. Rural Gen Z perspectives underrepresented. Insights may not generalize to non-urban populations.

Language Bias

Survey conducted in English. Excludes non-English speakers. May miss cultural nuances expressed in regional languages.

Platform Bias

Participants self-select by visiting this site. May attract more digitally engaged, research-interested individuals than general population.

Self-Reporting Limitations

All data is self-reported. Subject to recall bias, social desirability bias, and honest misreporting. Cannot verify accuracy of spending or mood claims.

For complete list of known gaps, see Known Unknowns.

Data Governance

Anonymization

No personal identifiers collected (no names, emails, IP addresses, device IDs). Responses cannot be linked to individuals. Anonymity is structural, not procedural.

Publication Standards

Only aggregate data published. Minimum cell size of 30 responses before publishing any demographic breakdown. Individual quotes anonymized and contextualized.

Internal Governance

Stop-Work Authority

Any team member may immediately halt activities that violate ethics, methodology, or data integrity. This authority supersedes all other priorities.

Activities That May Be Stopped:

  • • Data collection activities
  • • Analysis procedures
  • • Publication processes
  • • External collaborations

Conditions for Stop:

  • • Ethics violations identified
  • • Methodology compromise detected
  • • Participant harm risk discovered
  • • Data integrity threats observed

Process:

  1. Issue stop notice (verbal or written)
  2. Activity ceases immediately
  3. Team reviews within 24 hours
  4. Document decision in Ethics Case Archive
  5. Resume only after resolution and documentation

Data Silence Policy

Team may choose not to publish findings if publication risks harm, misinterpretation, or violates participant trust. Silence is documented with reasoning in internal logs.

Slow-Growth Compatibility Check

Before accepting partnerships, funding, or platform changes, team evaluates compatibility with core principles. Growth opportunities that compromise methodology or independence are declined.

Ethics Statement

Fomofiles operates under the principle that behavioral research must serve public understanding, not commercial exploitation. We commit to:

  • Never selling data or participant access
  • Never using insights for targeting or manipulation
  • Publishing methodology openly for scrutiny
  • Documenting ethical decisions transparently
  • Prioritizing participant dignity over engagement metrics

For documented ethical decisions and dilemmas, see Data Ethics Case Archive.

Informed Consent Framework

Before participating, all respondents review comprehensive consent information covering all required elements for ethical research.

Consent Architecture Components

Study Purpose: Understanding Gen Z behavioral patterns in India

Procedures: 10-question survey, 2-3 minutes, anonymous responses

Risks: Minimal risk (no personal identifiers collected, no sensitive topics)

Benefits: Contributes to public research record, informs policy and education

Confidentiality: Anonymous by design, no tracking, aggregate publication only

Voluntary Nature: Can refuse participation, skip questions, or close survey anytime

Contact Information: ethics@fomofiles.in for questions or concerns

Rights Statement: Explicit participant rights listed below

Participant Rights

You have the right to:

  • • Refuse participation without penalty or explanation
  • • Skip any question you prefer not to answer
  • • Close the survey at any time without submitting
  • • Contact the research team with questions or concerns
  • • Request information about how data is used (aggregate only)
  • • Access published insights and methodology documentation

What You Cannot Do

Due to anonymous design:

  • • Cannot view your previous responses (no account system)
  • • Cannot delete your response after submission (cannot be identified)
  • • Cannot receive individual results or feedback (aggregate analysis only)

Consent is obtained through scroll-to-proceed architecture: participants must scroll through all consent information before accessing survey questions. Proceeding to survey constitutes informed consent.

Citation Guidelines

Researchers, journalists, and educators may cite Fomofiles insights with proper attribution and context. For complete guidelines on proper use and prohibited misuse, see Citation & Misuse Guidelines.

Revision History

This methodology document is versioned and changes are tracked publicly. For complete revision history including rationale for changes, see Changelog.

Questions About Methodology?

For methodology questions: research@fomofiles.in

For ethics inquiries: ethics@fomofiles.in

For data access requests: Include research proposal, intended use, and institutional affiliation