The Most Common Mobile Apps Mistakes

The mobile app market is saturated with competition. Trends turn over quickly, but no niche can last very long without several competitors jumping onto the bandwagon. These conditions result in a high failure rate across the board for the mobile app market. Only 20% of downloaded apps see users return after the first use, whereas 3% of apps remain in use after a month.

If any part of an app is undesirable, or slow to get the hang of, users are more likely to install a new one, rather than stick it out with the imperfect product. Nothing is wasted for the consumer when disposing of an app - except for the efforts of the designers and developers, that is. So, why is it that so many apps fail? Is this a predictable phenomenon that app designers and developers should accept? For clients, is this success rate acceptable? What does it take to bring your designs into the top 3% of prosperous apps?

The common mistakes span from failing to maintain consistency throughout the lifespan of an app, to attracting users in the first place. How can apps be designed with intuitive simplicity, without becoming repetitive and boring? How can an app offer pleasing details, without losing sight of a greater purpose? Most apps live and die in the first few days, so here are the top ten most common mistakes that designers can avoid.

Only 3% of mobile apps are in use after being downloaded.

Only 3% of mobile apps are in use after being downloaded.

Common Mistake #1: A Poor First Impression

Often the first use, or first day with an app, is the most critical period to hook a potential user. The first impression is so critical that it could be an umbrella point for the rest of this top ten. If anything goes wrong, or appears confusing or boring, potential users are quickly disinterested. Although, the proper balance for first impressions is tricky to handle. In some cases, a lengthy onboarding, or process to discover necessary features can bores users. Yet, an instantly stimulating app may disregard the need for a proper tutorial, and promote confusion. Find the balance between an app that is immediately intuitive, but also introduces the users to the most exciting, engaging features quickly. Keep in mind that when users are coming to your app, they’re seeing it for the first time. Go through a proper beta testing process to learn how others perceive your app from the beginning. What seems obvious to the design team, may not be for newcomers.

Improper Onboarding

Onboarding is the step by step process of introducing a user to your app. Although it can be a good way to get someone quickly oriented, onboarding can also be a drawn out process that stands in the way of your users and their content. Often these tutorials are too long, and are likely swiped through blindly.

Sometimes, users have seen your app used in public or elsewhere, such that they get the point and just want to jump in. So, allow for a sort of quick exit strategy to avoid entirely blocking out the app upon its first use. To ensure that the onboarding process is in fact effective, consider which values this can communicate and how. The onboarding process should demonstrate the value of the app in order to hook a user, rather than just an explanation.

Go easy on the intro animation

Some designers address the issue of a good first impression with gripping intro animations to dazzle new users. But, keep in mind that every time someone wants to run the app, they’re going to have to sit through the same thing over and over. If the app serves a daily function, then this will tire your users quickly. Ten seconds of someone’s day for a logo to swipe across the screen and maybe spin a couple times don’t really seem worth it after a while.

Common Mistake #2: Designing an App Without Purpose

Avoid entering the design process without succinct intentions. Apps are often designed and developed in order to follow trends, rather than to solve a problem, fill a niche, or offer a distinct service. What is the ambition for the app? For the designer and their team, the sense of purpose will affect every step of a project. This sensibility will guide each decision from the branding or marketing of an app, to the wireframe format, and button aesthetic. If the purpose is clear, each piece of the app will communicate and function as a coherent whole. Therefore, have the design and development team continually consider their decisions within a greater goal. As the project progresses, the initial ambition may change. This is okay, as long as the vision remains coherent.

Conveying this vision to your potential users means that they will understand what value the app brings to their life. Thus, this vision is an important thing to communicate in a first impression. The question becomes how quickly can you convince users of your vision for the app? How it will improve a person’s life, or provide some sort of enjoyment or comfort. If this ambition is conveyed quickly, then as long as your app is in fact useful, it will make it into the 3%.

Often joining a pre-existing market, or app niche, means that there are apps to study while designing your own. Thus, be careful how you choose to ‘re-purpose’ what is already out there. Study the existing app market, rather than skimming over it. Then, improve upon existing products with intent, rather than thoughtlessly imitating.

Common Mistake #3: Missing Out On UX Design Mapping

Be careful not to skip over a thoughtful planning of an app’s UX architecture before jumping into design work. Even before getting to a wireframing stage, the flow and structure of an app should be mapped out. Designers are often too excited to produce aesthetics and details. This results in a culture of designers who generally under appreciate UX, and the necessary logic or navigation within an app. Slow down. Sketch out the flow of the app first before worrying too much about the finer brush strokes. Often apps fail from an overarching lack of flow and organization, rather than imperfect details. However, once the design process takes off always keep the big picture in mind. The details and aesthetic should then clearly evoke the greater concept.

Common Mistake #4: Disregarding App Development Budget

As soon as the basis of the app is sketched, this is a good time to get a budget from the development team. This way you don’t reach the end of the project and suddenly need to start cutting critical features. As your design career develops, always take note of the average costs of constructing your concepts so that your design thinking responds to economic restraints. Budgets should be useful design constraints to work within.

Many failed apps try to cram too many features in from launch.

Many failed apps try to cram too many features in from launch.

Common Mistake #5: Cramming in Design Features

Hopefully, rigorous wireframing will make the distinction between necessary and excessive functions clear. The platform is already the ultimate swiss army knife, so your app doesn’t need to be. Not only will cramming an app with features lead to a likely disorienting User Experience, but an overloaded app will also be difficult to market. If the use of the app is difficult to explain in a concise way, it’s likely trying to do too much. Paring down features is always hard, but it’s necessary. Often, the best strategy might be to gain trust in the beginning with a single or few features, then later in the life of the app can new ones be ‘tested’. This way, the additional features are less likely to interfere with the crucial first few days of an apps’ life.

Common Mistake #6: Dismissing App Context

Although the conditions of most design offices practically operate within a vacuum, app designers must be aware of wider contexts. Although purpose and ambition are important, they become irrelevant if not directed within the proper context. Remember that although you and your design team may know your app very well, and find its interfacing obvious, this may not be the case for first time users, or different demographics.

Consider the immediate context or situation in which the app is intended to be used. Given the social situation, how long might the person expect to be on the app for? What else might be helpful for them to stumble upon given the circumstance? For example, UBER’s interface excels at being used very quickly. This means that for the most part, there isn’t much room for other content. This is perfect because when a user is out with friends and needing to book a ride, your conversation is hardly interrupted in the process. UBER hides a lot of support content deep within the app, but it only appears once the scenario calls for it.

Who is the target audience for the app? How might the type of user affect how the design of the app? Perhaps, consider that an app targeted for a younger user may be able to take more liberties in assuming a certain level of intuition from the user. Whereas, many functions may need to be pointed out for a less tech savvy user. Is your app meant to be accessed quickly and for a short period of time? Or, is this an app with lots of content that allows users to stay a while? How will the design convey this form of use?

A good app design should consider the context in which it is used.

A good ap

p design should consider the context in which it is used.

Common Mistake #7: Underestimating Crossing Platforms

Often apps are developed quickly as a response to changing markets or advancing competitors. This often results in web content being dragged into the mobile platform. A constant issue, which you’d think would be widely understood by now, is that often apps and other mobile content make poor transitions between the desktop, or mobile platforms. No longer can mobile design get away with scaling down web content in the hope of getting a business quickly into the mobile market. The web to mobile transition doesn’t just mean scaling everything down, but also being able to work with less. Functions, navigation and content must all be conveyed with a more minimal strategy. Another common issue appears when an app developing team aspires to release a product simultaneously on all platforms, and through different app stores. This often results in poor compatibility, or a generally buggy, unpolished app.The gymnastics of balancing multiple platforms may be too much to add onto the launch of an app. However, it doesn’t hurt to sometimes take it slowly with one OS at a time, and iron out the major issues, before worrying about compatibility between platforms.

Common Mistake #8: Overcomplicating App Design

The famous architect Mies Van der Rohe once said, “It’s better to be good than to be unique”. Ensure that your design is meeting the brief before you start breaking the box or adding flourishes. When a designer finds themselves adding things in order to make a composition more appealing or exciting, these choices will likely lack much value. Continue to ask throughout the design process, how much can I remove? Instead of designing additively, design reductively. What isn’t needed? This method is directed as much towards content, concept and function as it is aesthetics. Over complexity is often a result of a design unnecessarily breaking conventions. Several symbols and interfaces are standard within our visual and tactile language. Will your product really benefit from reworking these standards? Standard icons have proven themselves to be universally intuitive. Thus, they are often the quickest way to provide visual cues without cluttering a screen. Don’t let your design flourishes get in the way of the actual content, or function of the app. Often, apps are not given enough white space. The need for white space is a graphic concept that has transcended both digital and print, thus it shouldn’t be underrated. Give elements on the screen room to breath so that all of the work you put into navigation and UX can be felt.

The app design process can be reductive, rather than additive.

The app design process can be reductive, rather than additive.

Common Mistake #9: Design Inconsistencies

To the point on simplicity, if a design is going to introduce new standards, they have to at least be consistent across the app. Each new function or piece of content doesn’t necessarily have to be an opportunity to introduce a new design concept. Are texts uniformly formatted? Do UI elements behave in predictable, yet pleasing ways throughout the app? Design consistency must find the balance between existing within common visual language, as well as avoiding being aesthetically stagnant. The balance between intuitive consistency and boredom is a fine line.

Common Mistake #10: Under Utilizing App Beta Testing

All designers should analyze the use of their apps with some sort of feedback loop in order to learn what is and isn’t working. A common mistake in testing is for a team to do their beta testing in-house. You need to bring in fresh eyes in order to really dig into the drafts of the app. Send out an ad for beta testers and work with a select audience before going public. This can be a great way to iron out details, edit down features, and find what’s missing. Although, beta testing can be time consuming, it may be a better alternative to developing an app that flops. Anticipate that testing often takes 8 weeks for some developers to do it properly. Avoid using friends or colleagues as testers as they may not criticize the app with the honesty that you need. Using app blogs or website to review your app is another way to test the app in a public setting without a full launch. If you’re having a hard time paring down features for your app, this is a good opportunity to see what elements matter or not.

The app design market is a battleground, so designing products which are only adequate just isn’t enough. Find a way to hook users from the beginning - communicate, and demonstrate the critical values and features as soon as you can. To be able to do this, your design team must have a coherent vision of what the app is hoping to achieve. In order to establish this ambition, a rigorous story-boarding process can iron out what is and isn’t imperative. Consider which types of users your app may best fit with. Then refine and refine until absolutely nothing else can be taken away from the project without it falling apart.

This article was written by KENT MUNDLE, Toptal Technical Editor

Rise Of Automated Trading: Machines Trading S&P 500

Nowadays, more than 60 percent of trading activities with different assets (such as stocks, index futures, commodities) are not made by “human being” traders anymore, instead relying on automated trading. There are specialized programs based on particular algorithms that automatically buy and sell assets over different markets, meant to achieve a positive return in the long run.

In this article, I’m going to show you how to predict, with good accuracy, how the next trade should be placed to get a positive gain. For this example, as the underlying asset to trade, I selected the S&P 500 index, the weighted average of 500 US companies with bigger capitalization. A very simple strategy to implement is to buy the S&P 500 index when Wall Street Exchange starts trading, at 9:30 AM, and selling it at the closing session at 4:00 PM Eastern Time. If the closing price of the index is higher than the opening price, there is a positive gain, whereas a negative gain would be achieved if the closing price is lower than the opening price. So the question is: how do we know if the trading session will end up with a closing price higher than opening price? Machine Learning is a powerful tool to achieve such a complex task, and it can be a useful tool to support us with the trading decision.

Machine Learning is the new frontier of many useful real life applications. Financial trading is one of these, and it’s used very often in this sector. An important concept about Machine Learning is that we do not need to write code for every kind of possible rules, such as pattern recognition. This is because every model associated with Machine Learning learns from the data itself, and then can be later used to predict unseen new data.

Machine Learning is the new frontier of many useful real life applications

Machine Learning is the new frontier of many useful real life applications

Disclaimer: The purpose of this article is to show how to train Machine Learning methods, and in the provided code examples not every function is explained. This article is not intended to let one copy and paste all the code and run the same provided tests, as some details are missing that were out of the scope the article. Also, base knowledge of Python is required. The main intention of the article is to show an example of how machine learning may be effective to predict buys and sells in the financial sector. However, trade with real money means to have many other skills, such as money management and risk management. This article is just a small piece of the “big picture”.

Building Your First Financial Data Automated Trading Program

So, you want to create your first program to analyze financial data and predict the right trade? Let me show you how. I will be using Python for Machine Learning code, and we will be using historical data from Yahoo Finance service. As mentioned before, historical data is necessary to train the model before making our predictions.

To begin, we need to install:

Note that only a part of GraphLab is open source, the SFrame, so to use the entire library we need a license. There is a 30 day free license and a non-commercial license for students or those one participating in Kaggle competitions. From my point of view, GraphLab Create is a very intuitive and easy to use library to analyze data and train Machine Learning models.

Digging in the Python Code

Let’s dig in with some Python code to see how to download financial data from the Internet. I suggest using IPython notebook to test the following code, because IPython has many advantages compared to a traditional IDE, especially when we need to combine source code, execution code, table data and charts together on the same document. For a brief explanation to use IPython notebook, please look at the Introduction to IPython Notebook article.

So, let’s create a new IPython notebook and write some code to download historical prices of S&P 500 index. Note, if you prefer to use other tools, you can start with a new Python project in your preferred IDE.

import graphlab as gl
from __future__ import division
from datetime import datetime
from yahoo_finance import Share
# download historical prices of S&P 500 index
today = datetime.strftime(datetime.today(), "%Y-%m-%d")
stock = Share('^GSPC') # ^GSPC is the Yahoo finance symbol to refer S&P 500 index
# we gather historical quotes from 2001-01-01 up to today
hist_quotes = stock.get_historical('2001-01-01', today)
# here is how a row looks like
hist_quotes[0]
{'Adj_Close': '2091.580078',
'Close': '2091.580078',
'Date': '2016-04-22',
'High': '2094.320068',
'Low': '2081.199951',
'Open': '2091.48999',
'Symbol': '%5eGSPC',
'Volume': '3790580000'}

Here, hist_quotes is a list of dictionaries, and each dictionary object is a trading day with OpenHighLowCloseAdj_closeVolumeSymbol and Date values. During each trading day, the price usually changes starting from the opening price Open to the closing price Close, and hitting a maximum and a minimum valueHigh and Low. We need to read through it and create lists of each of the most relevant data. Also, data must be ordered by the most recent values at first, so we need to reverse it:

l_date = []
l_open = []
l_high = []
l_low = []
l_close = []
l_volume = []
# reverse the list
hist_quotes.reverse()
for quotes in hist_quotes:
l_date.append(quotes['Date'])
l_open.append(float(quotes['Open']))
l_high.append(float(quotes['High']))
l_low.append(float(quotes['Low']))
l_close.append(float(quotes['Close']))
l_volume.append(int(quotes['Volume']))

We can pack all downloaded quotes into an SFrame object, which is a highly scalable column based data frame, and it is compressed. One of the advantages is that it can also be larger than the amount of RAM because it is disk-backed. You can check the documentation to learn more about SFrame.

So, let’s store and then check the historical data:

qq = gl.SFrame({'datetime' : l_date, 
'open' : l_open, 
'high' : l_high, 
'low' : l_low, 
'close' : l_close, 
'volume' : l_volume})
# datetime is a string, so convert into datetime object
qq['datetime'] = qq['datetime'].apply(lambda x:datetime.strptime(x, '%Y-%m-%d'))
# just to check if data is sorted in ascending mode
qq.head(3)

close datetime high low open volume
1283.27 2001-01-02 00:00:00 1320.28 1276.05 1320.28 1129400000
1347.56 2001-01-03 00:00:00 1347.76 1274.62 1283.27 1880700000
1333.34 2001-01-04 00:00:00 1350.24 1329.14 1347.56 2131000000

Now we can save data to disk with the SFrame method save, as follows:

qq.save(“SP500_daily.bin”)
# once data is saved, we can use the following instruction to retrieve it 
qq = gl.SFrame(“SP500_daily.bin/”)

Let’s See What the S&P 500 Looks Like

To see how the loaded S&P 500 data will look like, we can use the following code:

import matplotlib.pyplot as plt
%matplotlib inline # only for those who are using IPython notebook
plt.plot(qq['close'])

The output of the code is the following graph:

Read the full article by Andrea Nalon, a Toptal freelance developer here. 

Python Best Practices and Tips by Toptal Developers

This resource contains a collection of Python best practices and Python tips provided by our Toptal network members. As such, this page will be updated on a regular basis to include additional information and cover emerging Python techniques. This is a community driven project, so you are encouraged to contribute as well, and we are counting on your feedback.

Python is a high level language used in many development areas, like web development (Django, Flask), data analysis (SciPy, scikit-learn), desktop UI (wxWidgets, PyQt) and system administration (Ansible, OpenStack). The main advantage of Python is development speed. Python comes with rich standard library, a lot of 3rd party libraries and clean syntax. All this allows a developer to focus on the problem they want to solve, and not on the language details or reinventing the wheel.

Check out the Toptal resource pages for additional information on Python. There is a Python hiring guide, Python job descriptioncommon Python mistakes, and Python interview questions.

Be Consistent About Indentation in the Same Python File.

Indentation level in Python is really important, and mixing tabs and spaces is not a smart, nor recommended practice. To go even further, Python 3 will simply refuse to interpret mixed file, while in Python 2 the interpretation of tabs is as if it is converted to spaces using 8-space tab stops. So while executing, you may have no clue at which indentation level a specific line is being considered.

For any code you think someday someone else will read or use, to avoid confusion you should stick with PEP-8, or your team-specific coding style. PEP-8 strongly discourage mixing tabs and spaces in the same file.

For further information, check out this Q&A on StackExchange:

  1. The first downside is that it quickly becomes a mess

… Formatting should be the task of the IDE. Developers have already enough work to care about the size of tabs, how much spaces will an IDE insert, etc. The code should be formatted correctly, and displayed correctly on other configurations, without forcing developers to think about it.

Also, remember this:

Furthermore, it can be a good idea to avoid tabs altogether, because the semantics of tabs are not very well-defined in the computer world, and they can be displayed completely differently on different types of systems and editors. Also, tabs often get destroyed or wrongly converted during copy-paste operations, or when a piece of source code is inserted into a web page or other kind of markup code.

 
This post originally appeared in Toptal
 

How to Create a Simple Python WebSocket Server Using Tornado

With the increase in popularity of real-time web applications, WebSockets have become a key technology in their implementation. The days where you had to constantly press the reload button to receive updates from the server are long gone. Web applications that want to provide real-time updates no longer have to poll the server for changes - instead, servers push changes down the stream as they happen. Robust web frameworks have begun supporting WebSockets out of the box. Ruby on Rails 5, for example, took it even further and added support for action cables.

In the world of Python, many popular web frameworks exist. Frameworks such as Django provide nearly everything necessary to build web applications, and anything that it lacks can be made up with one of the thousands of plugins available for Django. However, due to the way Python or most of its web frameworks work, handling long lived connections can quickly become a nightmare. The threaded model and global interpreter lock are often considered to be the achilles heel of Python.

But all of that has started to change. With certain new features of Python 3 and frameworks that already exist for Python, such as Tornado, handling long lived connections is a challenge no more. Tornado provides web server capabilities in Python that is specifically useful in handling long-lived connections.

 
Read the full article in Toptal 

Tackle The Most Complex Code First by Writing Tests That Matter

There are a lot of discussions, articles, and blogs around the topic of code quality. People say – use Test Driven techniques! Tests are a “must have” to start any refactoring! That’s all cool, but it’s 2016 and there is a massive volume of products and code bases still in production that were created ten, fifteen, or even twenty years ago. It’s no secret that a lot of them have legacy code with low test coverage.

While I’d like to be always at the leading, or even bleeding, edge of the technology world – engaged with new cool projects and technologies – unfortunately it’s not always possible and often I have to deal with old systems. I like to say that when you develop from scratch, you act as a creator, mastering new matter. But when you’re working on legacy code, you’re more like a surgeon – you know how the system works in general, but you never know for sure whether the patient will survive your “operation”. And since it’s legacy code, there are not many up to date tests for you to rely on. This means that very frequently one of the very first steps is to cover it with tests. More precisely, not merely to provide coverage, but to develop a test coverage strategy.

Coupling and Cyclomatic Complexity: Metrics for Smarter Test Coverage

Forget 100% coverage. Test smarter by identifying classes that are more likely to break.

Basically, what I needed to determine was what parts (classes / packages) of the system we needed to cover with tests in the first place, where we needed unit tests, where integration tests would be more helpful etc. There are admittedly many ways to approach this type of analysis and the one that I’ve used may not be the best, but it’s kind of an automatic approach. Once my approach is implemented, it takes minimal time to actually do the analysis itself and, what is more important, it brings some fun into legacy code analysis.

The main idea here is to analyse two metrics – coupling (i.e., afferent coupling, or CA) and complexity (i.e. cyclomatic complexity).

The first one measures how many classes use our class, so it basically tells us how close a particular class is to the heart of the system; the more classes there are that use our class, the more important it is to cover it with tests.

On the other hand, if a class is very simple (e.g. contains only constants), then even if it’s used by many other parts of the system, it’s not nearly as important to create a test for. Here is where the second metric can help. If a class contains a lot of logic, the Cyclomatic complexity will be high.

The same logic can also be applied in reverse; i.e., even if a class is not used by many classes and represents just one particular use case, it still makes sense to cover it with tests if its internal logic is complex.

There is one caveat though: let’s say we have two classes – one with the CA 100 and complexity 2 and the other one with the CA 60 and complexity 20. Even though the sum of the metrics is higher for the first one we should definitely cover the second one first. This is because the first class is being used by a lot of other classes, but is not very complex. On the other hand, the second class is also being used by a lot of other classes but is relatively more complex than the first class.

To summarize: we need to identify classes with high CA and Cyclomatic complexity. In mathematical terms, a fitness function is needed that can be used as a rating – f(CA,Complexity) – whose values increase along with CA and Complexity.

Generally speaking, the classes with the smallest differences between the two metrics should be given the highest priority for test coverage.

Finding tools to calculate CA and Complexity for the whole code base, and provide a simple way to extract this information in CSV format, proved to be a challenge. During my search, I came across two tools that are free so it would be unfair not to mention them:

A Bit Of Math

The main problem here is that we have two criteria – CA and Cyclomatic complexity – so we need to combine them and convert into one scalar value. If we had a slightly different task – e.g., to find a class with the worst combination of our criteria – we would have a classical multi-objective optimization problem:

We would need to find a point on the so called Pareto front (red in the picture above). What is interesting about the Pareto set is that every point in the set is a solution to the optimization task. Whenever we move along the red line we need to make a compromise between our criteria – if one gets better the other one gets worse. This is called Scalarization and the final result depends on how we do it.

There are a lot of techniques that we can use here. Each has its own pros and cons. However, the most popular ones are linear scalarizing and the one based on an reference point. Linear is the easiest one. Our fitness function will look like a linear combination of CA and Complexity:

f(CA, Complexity) = A×CA + B×Complexity

where A and B are some coefficients.

The point which represents a solution to our optimization problem will lie on the line (blue in the picture below). More precisely, it will be at the intersection of the blue line and red Pareto front. Our original problem is not exactly an optimization problem. Rather, we need to create a ranking function. Let’s consider two values of our ranking function, basically two values in our Rank column:

R1 = A∗CA + B∗Complexity and R2 = A∗CA + B∗Complexity

Both of the formulas written above are equations of lines, moreover these lines are parallel. Taking more rank values into consideration we’ll get more lines and therefore more points where the Pareto line intersects with the (dotted) blue lines. These points will be classes corresponding to a particular rank value.

Unfortunately, there is an issue with this approach. For any line (Rank value), we’ll have points with very small CA and very big Complexity (and visa versa) lying on it. This immediately puts points with a big difference between metric values in the top of the list which is exactly what we wanted to avoid.

The other way to do the scalarizing is based on the reference point. Reference point is a point with the maximum values of both criteria:

(max(CA), max(Complexity))

The fitness function will be the distance between the Reference point and the data points:

f(CA,Complexity) = √((CA−CA )2 + (Complexity−Complexity)2)

We can think about this fitness function as a circle with the center at the reference point. The radius in this case is the value of the Rank. The solution to the optimization problem will be the point where the circle touches the Pareto front. The solution to the original problem will be sets of points corresponding to the different circle radii as shown in the following picture (parts of circles for different ranks are shown as dotted blue curves):

This approach deals better with extreme values but there are still two issues: First – I’d like to have more points near the reference points to better overcome the problem that we’ve faced with linear combination. Second – CA and Cyclomatic complexity are inherently different and have different values set, so we need to normalize them (e.g. so that all the values of both metrics would be from 1 to 100).

Here is a small trick that we can apply to solve the first issue – instead of looking at the CA and Cyclomatic Complexity, we can look at their inverted values. The reference point in this case will be (0,0). To solve the second issue, we can just normalize metrics using minimum value. Here is how it looks:

Inverted and normalized complexity – NormComplexity:

(1 + min(Complexity)) / (1 + Complexity)∗100

Inverted and normalized CA – NormCA:

(1 + min(CA)) / (1+CA)∗100

Note: I added 1 to make sure that there is no division by 0.

The following picture shows a plot with the inverted values:

Final Ranking

We are now coming to the last step – calculating the rank. As mentioned, I’m using the reference point method, so the only thing that we need to do is to calculate the length of the vector, normalize it, and make it ascend with the importance of a unit test creation for a class. Here is the final formula:

Rank(NormComplexity , NormCA) = 100 − √(NormComplexity2 + NormCA2) / √2

More Statistics

There is one more thought that I’d like to add, but let’s first have a look at some statistics. Here is a histogram of the Coupling metrics:

What is interesting about this picture is the number of classes with low CA (0-2). Classes with CA 0 are either not used at all or are top level services. These represent API endpoints, so it’s fine that we have a lot of them. But classes with CA 1 are the ones that are directly used by the endpoints and we have more of these classes than endpoints. What does this mean from architecture / design perspective?

In general, it means that we have a kind of script oriented approach – we script every business case separately (we can’t really reuse the code as business cases are too diverse). If that is the case, then it’s definitely a code smell and we need to do refactoring. Otherwise, it means the cohesion of our system is low, in which case we also need refactoring, but architectural refactoring this time.

Additional useful information we can get from the histogram above is that we can completely filter out classes with low coupling (CA in {0,1}) from the list of the classes eligible for coverage with unit tests. The same classes, though, are good candidates for the integration / functional tests.

You can find all the scripts and resources that I have used in this GitHub repository: ashalitkin/code-base-stats.

Does It Always Work?

Not necessarily. First of all it’s all about static analysis, not runtime. If a class is linked from many other classes it can be a sign that it’s heavily used, but it’s not always true. For example, we don’t know whether the functionality is really heavily used by end users. Second, if the design and the quality of the system is good enough, then most likely different parts / layers of it are decoupled via interfaces so static analysis of the CA will not give us a true picture. I guess it’s one of the main reasons why CA is not that popular in tools like Sonar. Fortunately, it’s totally fine for us since, if you remember, we are interested in applying this specifically to old ugly code bases.

In general, I’d say that runtime analysis would give much better results, but unfortunately it’s much more costly, time consuming, and complex, so our approach is a potentially useful and lower cost alternative.

This article was written by Andrey Shalitkin, a Toptal Java developer.