Our partners
bighub-official-partners-logo

Helping enterprises leverage AI for a data-driven edge

We’ll transform your data into a competitive advantage. Since 2016, BigHub has been helping enterprises develop AI strategies, handle data engineering, and launch custom AI solutions.

stars-trusted-bighub
Trusted by 100+ businesses
Used by the world's leading companies
Tailor made AI applications

Looking for a custom AI solution? We provide end-to-end services with expertise in applied AI, including Gen AI features like knowledge bases and assistants. Our experts also specialize in machine learning for demand forecasting, cross-selling/upselling etc.

AI strategy and consulting

BigHub helps your company unlock the potential of AI by identifying the ideal applications, assessing opportunities, and creating a tailored strategy that delivers real business impact. We also take care of all compliance-related concerns associated with implementing the AI Act, handling everything on your behalf.

Data engineering services

What if you could have cost-effective, scalable solutions that grow with your business? We specialize in Enterprise data platforms, cloud infrastructure optimization, and strengthening Data engineering capabilities.

BigHub’s focus

AI's impact on your business is what matters to us

The uniqueness of collaborating with BigHub.

Return on investment

Business first

We deliver solutions that drive visible, measurable impact on your business — helping you increase revenue, accelerate growth, and minimize errors.
Modern data practices

Data & AI as Software

We approach data and AI as software – leveraging best practices such as API, DataOps, MLOps, or LLMOps.Additionally, we navigate complex corporate environments with ease.
Enduring commitment

Long-term partner

After implementing AI, we offer long-term support and help you evolve the AI strategy defined at the outset.
Typical use cases

Specializing in AI solutions across industries

Explore the specific challenges we resolve for clients across diverse sectors.

Logictics

Fraud detection in logistics

Leveraging AI, we help you detect suspicious shipments in real time.
Energy

Solution for unauthorized electricity theft

Our AI models analyze consumption patterns and identify suspicious usage.
Retail

Enhancing demand prediction in retail

AI can analyze your historical sales data, seasonality, and trends.
Insurance

Automation of manual insurance processes

With AI solution, we enable automation of insurance processes such as underwriting and claims handling.
Healthcare

Improving the precision of patient diagnostics in healthcare

We ensure that AI alerts doctors to high-risk patients in a timely manner.
Logictics

Fraud detection in logistics

Leveraging AI, we help you detect suspicious shipments in real time.
Energy

Solution for unauthorized electricity theft

Our AI models analyze consumption patterns and identify suspicious usage.
Retail

Enhancing demand prediction in retail

AI can analyze your historical sales data, seasonality, and trends.
Insurance

Automation of manual insurance processes

With AI solution, we enable automation of insurance processes such as underwriting and claims handling.
Healthcare

Improving the precision of patient diagnostics in healthcare

We ensure that AI alerts doctors to high-risk patients in a timely manner.
Testimonials

What clients value about BigHub

Read feedback from our trusted business partners.

Case studies

Discover how BigHub transforms businesses with AI in a wide range of fields

Actions speak louder than words. Explore tangible examples of our solutions.

Partners and certifications

Leveraging these partners, technologies, and certifications, we empower businesses to transform data into a competitive advantage.

Certified:
ISO/IEC 27001:2022
ISO/IEC 20000-1:2018
ISO 9001:2015

Get your first consultation free

Want to discuss the details with us? Fill out the short form below. We’ll get in touch shortly to schedule your free, no-obligation consultation.

Trusted by 100 + businesses
Thank you! Your submission has been received.
Oops! Something went wrong.
Blog

News from the world of BigHub and AI

We’ve packed valuable insights into articles — don’t miss out.

AI
0
min
read

How to build intelligent search: From full-text to optimized hybrid search

When we began building an advanced search system, we quickly discovered that traditional full-text search has serious limits. Users type shortcuts, make typos, or use synonyms that classic search won’t recognize. We also need the system to search not only entity names but their descriptions and related information. And more—people often search by context, sometimes across languages.This article explains how we built a hybrid search system that combines full-text search (BM25) with vector embeddings, and how we used hyperparameter search to tune scoring for the best possible user results.
The problem: Limits of traditional search

Classic full-text search based on algorithms like BM25 has several fundamental constraints:

1. Typos and variants

  • Users frequently submit queries with typos or alternate spellings.
  • Traditional search expects exact or near-exact text matches.

2. Title-only searching

  • Full-text search often targets specific fields (e.g., product or entity name).
  • If relevant information lives in a description or related entities, the system may miss it.

3. Missing semantic understanding

  • The system doesn’t understand synonyms or related concepts.
  • A query for “car” won’t find “automobile” or “vehicle,” even though they are the same concept.
  • Cross-lingual search is nearly impossible—a Czech query won’t retrieve English results.

4. Contextual search

  • Users often search by context, not exact names.
  • For example, “products by manufacturer X” should return all relevant products, even if the manufacturer name isn’t explicitly in the query.

The solution: Hybrid search with embeddings

The remedy is to combine two approaches: traditional full-text search (BM25) and vector embeddings for semantic search.

Vector embeddings for semantic understanding

Vector embeddings map text into a multi-dimensional space where semantically similar meanings sit close together. This enables:

  • Meaning-based retrieval: A query like “notebook” can match “laptop,” “portable computer,” or related concepts.
  • Cross-lingual search: A Czech query can find English results if they share meaning.
  • Contextual search: The system captures relationships between entities and concepts.
  • Whole-content search: Embeddings can represent the entire document, not just the title.
Why embeddings alone are not enough

Embeddings are powerful, but not sufficient on their own:

  • Typos: Small character changes can produce very different embeddings.
  • Exact matches: Sometimes we need precise string matching, where full-text excels.
  • Performance: Vector search can be slower than optimized full-text indexes.
A hybrid approach: BM25 + HNSW

The ideal solution blends both:

  • BM25 (Best Matching 25): A classic full-text algorithm that excels at exact matches and handling typos.
  • HNSW (Hierarchical Navigable Small World): An efficient nearest-neighbor algorithm for fast vector search.

Combining them yields the best of both worlds: the precision of full-text for exact matches and the semantic understanding of embeddings for contextual queries.

The challenge: Getting the ranking right

Finding relevant candidates is only step one. Equally important is ranking them well. Users typically click the first few results; poor ordering undermines usefulness.

Why simple “Sort by” is not enough

Sorting by a single criterion (e.g., date) fails because multiple factors matter simultaneously:

  • Relevance: How well the result matches the query (from both full-text and vector signals).
  • Business value: Items with higher margin may deserve a boost.
  • Freshness: Newer items are often more relevant.
  • Popularity: Frequently chosen items may be more interesting to users
Scoring functions: Combining multiple signals

Instead of a simple sort, you need a composite scoring system that blends:

  1. Full-text score: How well BM25 matches the query.
  2. Vector distance: Semantic similarity from embeddings.
  3. Scoring functions, such as:
    • Magnitude functions for margin/popularity (higher value → higher score).
    • Freshness functions for time (newer → higher score).
    • Other business metrics as needed.

The final score is a weighted combination of these signals. The hard part is that the right weights are not obvious—you must find them experimentally.

Hyperparameter search: Finding optimal weights

Tuning weights for full-text, vector embeddings, and scoring functions is critical to result quality. We use hyperparameter search to do this systematically.

Building a test dataset

A good test set is the foundation of successful hyperparameter search. We assemble a corpus of queries where we know the ideal outcomes:

  • Reference results: For each test query, a list of expected results in the right order.
  • Annotations: Each result labeled relevant/non-relevant, optionally with priority.
  • Representative coverage: Include diverse query types (exact matches, synonyms, typos, contextual queries).
Metrics for quality evaluation

To objectively judge quality, we compare actual results to references using standard metrics:

1. Recall (completeness)

  • Do results include everything they should?
  • Are all relevant items present?

2. Ranking quality (ordering)

  • Are results in the correct order?
  • Are the most relevant results at the top?

Common metrics include NDCG (Normalized Discounted Cumulative Gain), which captures both completeness and ordering. Other useful metrics are Precision@K (how many relevant items in the top K positions) and MRR (Mean Reciprocal Rank), which measures the position of the first relevant result.

Iterative optimization

Hyperparameter search proceeds iteratively:

  1. Set initial weights: Start with sensible defaults.
  2. Test combinations: Systematically vary:
    • Field weights for full-text (e.g., product title vs. description).
    • Weights for vector fields (embeddings from different document parts).
    • Boosts for scoring functions (margin, recency, popularity).
    • Aggregation functions (how to combine scoring functions).
  3. Evaluate: Run the test dataset for each combination and compute metrics.
  4. Select the best: Choose the parameter set with the strongest metrics.
  5. Refine: Narrow around the best region and repeat as needed.

This can be time-consuming, but it’s essential for optimal results. Automation lets you test hundreds or thousands of combinations to find the best.

Monitoring and continuous improvement

Even after tuning, ongoing monitoring and iteration are crucial.

Tracking user behavior

A key signal is whether users click the results they’re shown. If they skip the first result and click the third or fourth, your ranking likely needs work.

Track:

  • CTR (Click-through rate): How often users click.
  • Click position: Which rank gets the click (ideally the top results).
  • No-click queries: Queries with zero clicks may indicate poor results.
Analyzing problem cases

When you find queries where users avoid the top results:

  1. Log these cases: Save the query, returned results, and the clicked position.
  2. Diagnose: Why did the system rank poorly? Missing relevant items? Wrong ordering?
  3. Augment the test set: Add these cases to your evaluation corpus.
  4. Adjust weights/rules: Update weights or introduce new heuristics as needed.

This iterative loop ensures the system keeps improving and adapts to real user behavior.

Implementing on Azure: AI search and OpenAI embeddings

All of the above can be implemented effectively with Microsoft Azure.

Azure AI Search

Azure AI Search (formerly Azure Cognitive Search) provides:

  • Hybrid search: Native support for combining full-text (BM25) and vector search.
  • HNSW indexes: An efficient HNSW implementation for vector retrieval.
  • Scoring profiles: A flexible framework for custom scoring functions.
  • Text weights: Per-field weighting for full-text.
  • Vector weights: Per-field weighting for vector embeddings.

Scoring profiles can combine:

  • Magnitude scoring for numeric values (margin, popularity).
  • Freshness scoring for temporal values (created/updated dates).
  • Text weights for full-text fields.
  • Vector weights for embedding fields.
  • Aggregation functions to blend multiple scoring signals.
OpenAI embeddings

For embeddings, we use OpenAI models such as text-embedding-3-large:

  • High-quality embeddings: Strong multilingual performance, including Czech.
  • Consistent API: Straightforward integration with Azure AI Search.
  • Scalability: Handles high request volumes.

Multilingual capability makes these embeddings particularly suitable for Czech and other smaller languages.

Integration

Azure AI Search can directly use OpenAI embeddings as a vectorizer, simplifying integration. Define vector fields in the index that automatically use OpenAI to generate embeddings during document indexing.

News
0
min
read

Microsoft Ignite 2025: The shift from AI experiments to enterprise-grade agents

Microsoft Ignite 2025, an annual conference for developers, IT professionals and partners, hosted by Microsoft. The spotlight moved from generative AI demos to autonomous agents, cross-platform workflows, and governance frameworks that enable real-world enterprise adoption. This article summarises the most relevant takeaways with our approach what should done next.
1. AI agents move centre stage

Microsoft’s headline reveal, Agent 365, positions AI agents as the new operational layer of the digital workplace. It provides a central hub to register, monitor, secure, and coordinate agents across the organisation.

At the same time, Microsoft 365 Copilot introduced dedicated Word, Excel, and PowerPoint agents, capable of autonomously generating, restructuring, and analysing content based on business context.

Copilot and agents built to power the Company Source: Link
Why this matters

Enterprises are shifting from “asking AI questions” to “assigning AI work”. Agent-based architectures will gradually replace many single-purpose assistants.

What organisations can do
  • Identify workflows suitable for autonomous agents
  • Standardise agent behaviour and permissions
  • Start pilot deployments inside Microsoft 365 ecosystems

2. Integration and orchestration become non-negotiable

Microsoft emphasised interoperability through the Model Context Protocol (MCP). Agents across Teams, Microsoft 365, and third-party apps can now share context and execute coordinated multi-step workflows.


Why this matters

Real automation requires more than standalone copilots — it requires orchestration between tools, data sources, and departments.

What organisations can do
  • Map cross-app workflows
  • Connect productivity, CRM/ERP and operational platforms
  • Design agent ecosystems rather than isolated assistants

3. Governance and security move into the spotlight

As agents gain autonomy, Microsoft introduced governance capabilities such as:

  • visibility into permissions
  • behavioural monitoring
  • integration with Defender, Entra, and Purview
  • centralised policy control
  • data-loss prevention


Why this matters

AI at scale must be fully observable and compliant. Governance will become a foundational requirement for all agent deployments.

What organisations can do
  • Define who is allowed to create/modify agents
  • Establish audit and monitoring standards
  • Build guardrails before rolling out automation

Read the official Microsoft article with all security updates & news - Link

4. Windows, Cloud PCs, and the rise of the AI-enabled workspace

Microsoft presented Windows 11 and Windows 365 as key components of the AI-first workplace. Features include:

  • AI-enhanced Cloud PCs
  • support for shared and frontline devices
  • local agent inference on capable hardware
  • endpoint-level automation


Why this matters

Distributed teams gain consistent, secure work environments with native AI capabilities.

What organisations can do
  • Evaluate Cloud PC scenarios
  • Modernise workplace setups for agent-driven workflows
  • Explore AI-enabled devices for operational teams

5. AI infrastructure and Azure evolution

Ignite highlighted continued investment in Azure AI capabilities, including:

  • improved model hosting and versioning
  • hybrid CPU/GPU inference
  • faster deployment pipelines
  • more cost-efficient fine-tuning
  • enhanced governance for AI training data

Full report here - Link

Why this matters

Scalable data pipelines and model infrastructure remain essential foundations for any agent-driven environment.

What organisations can do
  • Update data architecture for AI-readiness
  • Implement vector indexing and retrieval pipelines
  • Optimise model hosting costs

6. Copilot Studio and plug-in ecosystem expand rapidly

Copilot Studio received major updates, transforming it into a central automation and integration hub. New capabilities include:

  • custom agent creation with visual logic
  • no-code multi-step workflows
  • plug-ins for internal APIs and line-of-business systems
  • improved grounding using enterprise data
  • expanded connectors for CRM/ERP/event platforms

Why this matters

Organisations can build specialised copilots and agents — connected to their internal systems and business logic.

What organisations can do
  • Develop domain-specific copilots
  • Use connectors to integrate existing systems
  • Leverage visual logic for quick experiments

7. Fabric + Azure AI integration

Microsoft Fabric now provides deeper AI readiness features:

  • tight integration with Azure AI Studio
  • automated pipelines for AI data preparation
  • vector indexing and RAG capabilities inside OneLake
  • enhanced lineage and governance
  • performance boosts for large-scale analytics

Why this matters

AI agents depend on clean, governed, real-time data. Microsoft states that Fabric now enables building unified data + AI environments more efficiently.

What organisations can do
  • Consolidate disparate data pipelines into Fabric
  • Implement vector search for internal knowledge retrieval
  • Build governed AI datasets with lineage tracking
What this means for companies

Across all announcements, one trend is consistent: AI is becoming an operational layer—not an add-on.

For organisations in finance, energy, logistics, retail, or event management, this brings clear implications:

  • It’s time to move from experimentation to real deployment.
  • Automated agents will replace many single-purpose copilots.
  • Governance frameworks must be in place before scaling.
  • Integration across apps, data sources, and workflows is essential.
  • AI will increasingly live inside productivity tools employees already use.
  • The competitive advantage will come from how well agents connect to business processes—not from which model is used.

BigHub is well-positioned to guide you with for this transition—through personalized strategy, architecture, implementation, and optimisation.

How enterprises should prepare for 2025–2026

Here are the next steps organisations should consider:

1. Map high-value workflows for agent automation

Identify repetitive, cross-team workflows where autonomous task execution delivers value.

2. Design your agent governance framework

Define roles, access boundaries, audit controls, and operational monitoring.

3. Prepare your data infrastructure

Ensure clean, accessible, governed data that agents can safely use.

4. Integrate your productivity tools

Leverage Teams, Microsoft 365, and MCP-compatible apps to reduce friction.

5. Start with a controlled pilot

Choose one business unit or workflow to test agent deployment under monitoring.

6. Plan for organisation-wide rollout

Once guardrails are validated, scale agents into more complex processes.

BigHub
0
min
read

From theory to practice: How BigHub prepares CVUT FJFI students for the world of data and AI

Data analysts and AI specialists are among the most sought-after professionals today. Companies are looking for people who understand data, can leverage cloud technologies, and know how to apply machine learning to real-world problems. Yet university teaching often remains theoretical. Students learn algorithms and mathematical principles but lack the know-how to use them in practice.
Bridging academia and real-world practice is key

At the Faculty of Nuclear Sciences and Physical Engineering of CVUT (FJFI), we are changing that. Since the 2021/2022 academic year, BigHub has been teaching full-semester courses that connect academia with the real world of data. And it’s not just lectures—students get hands-on experience with real technologies in a business-like environment, guided by professionals who deal with such projects every day.

What brought us to FJFI

BigHub has a personal connection to CVUT FJFI. Many of us—including CEO Karel Šimánek, COO Ing. Tomáš Hubínek, and more than ten other colleagues—studied there ourselves. We know the faculty produces top-tier mathematicians, physicists, and engineers. But we also know that these students often lack insight into how data and AI function in business contexts.

That’s why we decided to change it. Not as a recruitment campaign, but as a long-term contribution to Czech education. We want students to see real examples, try modern tools, and be better prepared for their careers.

Two courses, two semesters
18AAD – Applied Data Analysis (summer semester)

The first course launched in the 2021/2022 academic year, led by Ing. Tomáš Hubínek. Its goal is to give students an overview of how large-scale data work looks in practice. Topics include:

  • data organization and storage,
  • frameworks for big data computation,
  • graph analysis,
  • cloud services,
  • basics of AI and ML.

Strong emphasis is placed on practical exercises. Students work in Microsoft Azure, explore different technologies, and have room for discussion. Selected lectures also feature BigHub experts who share insights from real projects.

18BIG – Data in Business (winter semester)

In 2024, we added a second course that builds on 18AAD. It is taught by doc. Ing. Jan Kučera, CSc. and doc. Ing. Petr Pokorný, Ph.D. The course goes deeper and focuses on:

  • data governance and data management in organizations,
  • integration architectures,
  • data platforms and AI readiness,
  • best practices from real-world projects.

While 18AAD shows what can be done with data, 18BIG demonstrates how it actually works inside companies.

Above-average student interest

Elective courses at FJFI usually attract only a few students. Our courses, however, enroll 20–35 students every year—an above-average number for the faculty.

Feedback is consistent: students appreciate the practical focus, open discussions, and the chance to ask professionals about real-world situations. For many, it’s their first encounter with technologies actually used in business.

Beyond the classroom

Our involvement doesn’t end with teaching. Together with the Department of Software Engineering, we’ve helped revise curricula and graduate profiles, enabling the faculty to respond more flexibly to what companies in the data and AI fields really need. This improves the quality of education across the entire faculty, not just for students who take our electives.

It’s not about recruitment

Sometimes, a student later joins BigHub — but that’s not the goal. The goal is to ensure graduates aren’t surprised by how data work really looks. We want them to have broader, more practical knowledge and hands-on experience with modern tools. It’s our way of giving back to the institution that shaped us and contributing to the Czech tech ecosystem as a whole.

Collaboration with FJFI goes beyond teaching. Since BigHub’s founding, we’ve supported the student union and regularly participated in the faculty’s Dean’s Cup sports event, playing futsal, beach volleyball, and more. This year, we also submitted several grant applications together and hope to soon collaborate on joint technical projects. We believe a strong community and informal connections between students and professionals are just as important as textbook knowledge.

What’s next?

Our cooperation with CVUT FJFI is long-term. Courses 18AAD and 18BIG will continue, and we are exploring ways to expand their scope. We see that students crave practical experience and that bridging academia with real-world practice truly works. If this helps improve the quality of data and AI projects in Czech companies, it will be the best proof that our effort is worthwhile.